Closed
Description
accelerate/src/accelerate/utils/deepspeed.py
Lines 264 to 279 in 7013365
When using DeepSpeed, accelerator.backward() calls DeepSpeed engine.backward() and engine.step().
When gradient_accumulation_steps > 1, the accumulation boundaries of accelerator and DeepSpeed might be different.
DeepSpeed engine.step() has its own gradient_accumulation_boundary state which is not synchronized with Accelerate's sync_gradients state.
If this issue does exist, then the Trainer in transformers likely has the same problem.
Metadata
Metadata
Assignees
Labels
No labels