-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Unstructured data grpo training #3441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Is your dataset following the format which grpo trainer accepts (standard or conversational)? |
It is in conversational format and contains system, user prompt and completion. Is this suitable for grpo? |
Also I use another llm api to set the accuracy function, Is this appropriate? |
I believe Yes! As long as it follows the Conversational format, should be fine for training with grpo! |
I can interpret your sentence in two ways;
Could you please help me figuring which one you mean? if none, can you elaborate on it please? |
Thank you for your reply. I use the same LLM API to evaluate the accuracy score of all the answers of the model. I think the regular function may not be suitable for evaluating the quality of the model's answers in my task. Is this reasonable? Or do I need to find a suitable rule to define the accuracy function? |
without knowing the nature of your task, it's completely impossible to say. What's more, you seem to be asking questions that are more scientific than technical, and considering that the literature on this subject is still very young, I doubt anyone will be able to give you a definitive answer. The best you can do is try and share your results. |
I am currently using my own dataset to perform GRPO in a vertical field. It is an unstructured training task, and the goal is to make it an assistant for scientific research projects. But now it keeps showing gradient overflow. What is the reason? Is it related to my hardware? Four v10032G, can only support sdpa, fp16
The text was updated successfully, but these errors were encountered: