Open
Description
when I use DPO to train 7B starcoder, OOM happened, i used 16 A100 ,zero3
used TRL and transformers, When the code runs to AutoModelForCausalLM.from_pretrained
, OOM happened. but qwencoder don't have this trouble. Are there any special settings in the model's structure that are not suitable for DPO (Direct Policy Optimization)?
code is
`parser = HfArgumentParser(DPOTrainingArguments)
args = parser.parse_args_into_dataclasses()[0]
down_file(args.model_path, args.pretrained_model)
model = AutoModelForCausalLM.from_pretrained(args.model_path)
model_ref = AutoModelForCausalLM.from_pretrained(args.model_path)
tokenizer = AutoTokenizer.from_pretrained(args.model_path)
dpo_trainer = MyDPOTrainer(
model,
model_ref,
args=args,
....
)
dpo_trainer.train()
`
Metadata
Metadata
Assignees
Labels
No labels