zero3   DPO   starcoder  OOM 

when I use DPO to train 7B starcoder, OOM happened,  i used 16 A100 ,zero3 
used  TRL and transformers, When the code runs to `AutoModelForCausalLM.from_pretrained` , OOM happened. but qwencoder don't have this trouble.  Are there any special settings in the model's structure that are not suitable for DPO (Direct Policy Optimization)?
code is 
`parser = HfArgumentParser(DPOTrainingArguments)
  args = parser.parse_args_into_dataclasses()[0]
  down_file(args.model_path, args.pretrained_model)
  
  model = AutoModelForCausalLM.from_pretrained(args.model_path)
  model_ref = AutoModelForCausalLM.from_pretrained(args.model_path)
  tokenizer = AutoTokenizer.from_pretrained(args.model_path)

  dpo_trainer = MyDPOTrainer(
        model,
        model_ref,
        args=args,
        ....
    )


    dpo_trainer.train()
`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

zero3 DPO starcoder OOM #161

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

zero3 DPO starcoder OOM #161

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions