Unstructured data grpo training #3441

yuyuhua918 · 2025-05-13T13:54:12Z

I am currently using my own dataset to perform GRPO in a vertical field. It is an unstructured training task, and the goal is to make it an assistant for scientific research projects. But now it keeps showing gradient overflow. What is the reason? Is it related to my hardware? Four v10032G, can only support sdpa, fp16

shirinyamani · 2025-05-14T19:22:48Z

Is your dataset following the format which grpo trainer accepts (standard or conversational)?
dataset format

yuyuhua918 · 2025-05-15T01:33:52Z

Is your dataset following the format which grpo trainer accepts (standard or conversational)? dataset format

It is in conversational format and contains system, user prompt and completion. Is this suitable for grpo？

yuyuhua918 · 2025-05-15T01:42:22Z

Is your dataset following the format which grpo trainer accepts (standard or conversational)? dataset format

Also I use another llm api to set the accuracy function, Is this appropriate?

shirinyamani · 2025-05-15T20:53:12Z

Is your dataset following the format which grpo trainer accepts (standard or conversational)? dataset format

It is in conversational format and contains system, user prompt and completion. Is this suitable for grpo？

I believe Yes! As long as it follows the Conversational format, should be fine for training with grpo!

shirinyamani · 2025-05-15T20:57:22Z

Is your dataset following the format which grpo trainer accepts (standard or conversational)? dataset format

Also I use another llm api to set the accuracy function, Is this appropriate?

I can interpret your sentence in two ways;

Calling a different LLM API to compute the accuracy func for each of the outputs?
Using another LLM to create the accuracy_func more specialized to your target than the standard?

Could you please help me figuring which one you mean? if none, can you elaborate on it please?

yuyuhua918 · 2025-05-16T08:15:27Z

Is your dataset following the format which grpo trainer accepts (standard or conversational)? dataset format

Also I use another llm api to set the accuracy function, Is this appropriate?

I can interpret your sentence in two ways;

Calling a different LLM API to compute the accuracy func for each of the outputs?

Using another LLM to create the accuracy_func more specialized to your target than the standard?

Could you please help me figuring which one you mean? if none, can you elaborate on it please?

Thank you for your reply. I use the same LLM API to evaluate the accuracy score of all the answers of the model. I think the regular function may not be suitable for evaluating the quality of the model's answers in my task. Is this reasonable? Or do I need to find a suitable rule to define the accuracy function?

qgallouedec · 2025-05-20T01:12:39Z

regular function may not be suitable for evaluating the quality of the model's answers in my task. Is this reasonable?

without knowing the nature of your task, it's completely impossible to say. What's more, you seem to be asking questions that are more scientific than technical, and considering that the literature on this subject is still very young, I doubt anyone will be able to give you a definitive answer. The best you can do is try and share your results.

github-actions bot added 🏋 GRPO Related to GRPO 🐛 bug Something isn't working labels May 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unstructured data grpo training #3441

Unstructured data grpo training #3441

yuyuhua918 commented May 13, 2025

shirinyamani commented May 14, 2025

Uh oh!

yuyuhua918 commented May 15, 2025

Uh oh!

yuyuhua918 commented May 15, 2025

Uh oh!

shirinyamani commented May 15, 2025

Uh oh!

shirinyamani commented May 15, 2025

Uh oh!

yuyuhua918 commented May 16, 2025

Uh oh!

qgallouedec commented May 20, 2025

Uh oh!

Unstructured data grpo training #3441

Unstructured data grpo training #3441

Comments

yuyuhua918 commented May 13, 2025

shirinyamani commented May 14, 2025

Uh oh!

yuyuhua918 commented May 15, 2025

Uh oh!

yuyuhua918 commented May 15, 2025

Uh oh!

shirinyamani commented May 15, 2025

Uh oh!

shirinyamani commented May 15, 2025

Uh oh!

yuyuhua918 commented May 16, 2025

Uh oh!

qgallouedec commented May 20, 2025

Uh oh!