Skip to content

There might be a bug when using DeepSpeed with gradient_accumulation_steps > 1 #3552

Closed
@hiawui

Description

@hiawui

def backward(self, loss, **kwargs):
# runs backpropagation and handles mixed precision
self.engine.backward(loss, **kwargs)
# Deepspeed's `engine.step` performs the following operations:
# - gradient accumulation check
# - gradient clipping
# - optimizer step
# - zero grad
# - checking overflow
# - lr_scheduler step (only if engine.lr_scheduler is not None)
self.engine.step()
# and this plugin overrides the above calls with no-ops when Accelerate runs under
# Deepspeed, but allows normal functionality for non-Deepspeed cases thus enabling a simple
# training loop that works transparently under many training regimes.

When using DeepSpeed, accelerator.backward() calls DeepSpeed engine.backward() and engine.step().

When gradient_accumulation_steps > 1, the accumulation boundaries of accelerator and DeepSpeed might be different.

https://github.com/deepspeedai/DeepSpeed/blob/master/deepspeed/runtime/engine.py#L2362-L2378

Image

DeepSpeed engine.step() has its own gradient_accumulation_boundary state which is not synchronized with Accelerate's sync_gradients state.
If this issue does exist, then the Trainer in transformers likely has the same problem.

https://github.com/huggingface/transformers/blob/main/src/transformers/trainer.py#L2511-L2513

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions