Fix: `no_grad` with AMP bug #20921

baskrahmer · 2025-06-20T13:39:03Z

Fixes #20644

Note however that this would affect performance for other users, so the question is whether it is worth optimizing for this edge case that is fundamentally a torch bug.

cc @Borda

📚 Documentation preview 📚: https://pytorch-lightning--20921.org.readthedocs.build/en/20921/

for more information, see https://pre-commit.ci

Borda · 2025-06-23T07:14:18Z

src/lightning/pytorch/plugins/precision/amp.py

+        return torch.autocast(
+            self.device, dtype=(torch.bfloat16 if self.precision == "bf16-mixed" else torch.half), cache_enabled=False
+        )


Suggested change

return torch.autocast(

self.device, dtype=(torch.bfloat16 if self.precision == "bf16-mixed" else torch.half), cache_enabled=False

)

dtype = torch.bfloat16 if self.precision == "bf16-mixed" else torch.half

return torch.autocast(self.device, dtype=dtype, cache_enabled=False)

Borda · 2025-06-23T07:30:37Z

Note however that this would affect performance for other users, so the question is whether it is worth optimizing for this edge case that is fundamentally a torch bug.

Then se shall report it and offer a fix in torch
Then, if it is accepted and released, we shall have a version switch in our codebase, so newer Torch versions won't need this compared to the old one. Does it...

BTW, have you measured the performance drop?
cc: @lantiga

baskrahmer · 2025-06-23T07:56:57Z

@Borda it is a long-standing issue in torch. I can try to make a fix if I have some time, but I think it could be complex.

But I agree with you that it should be fixed in torch ideally. Just wanted to open this PR to showcase what a workaround on our end would look like. Shall I close it?

I haven't measured the performance drop since it will vary strongly across architectures and probably also hardware setups.

baskrahmer added 2 commits June 20, 2025 12:24

Disable cache for torch.autocast in amp

7d45eff

Add a test

6c8572b

github-actions bot added the pl Generic label for PyTorch Lightning package label Jun 20, 2025

pre-commit

d18fb08

baskrahmer force-pushed the fix/no-grad-amp-bug branch from 08508b6 to d18fb08 Compare June 20, 2025 13:46

baskrahmer and others added 4 commits June 20, 2025 15:47

Merge branch 'master' into fix/no-grad-amp-bug

b912377

Revert import change

600280b

Format test

14ae4d8

[pre-commit.ci] auto fixes from pre-commit.com hooks

064caf7

for more information, see https://pre-commit.ci

baskrahmer mentioned this pull request Jun 20, 2025

Computation graph not being built #20644

Open

baskrahmer marked this pull request as ready for review June 20, 2025 15:33

baskrahmer requested review from lantiga, Borda, tchaton, justusschock and ethanwharris as code owners June 20, 2025 15:33

baskrahmer added 2 commits June 20, 2025 18:28

Only test for bf16-mixed

70023a1

Merge branch 'master' into fix/no-grad-amp-bug

7d61691

Borda reviewed Jun 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: `no_grad` with AMP bug #20921

Fix: `no_grad` with AMP bug #20921

Uh oh!

baskrahmer commented Jun 20, 2025 •

edited

Loading

Uh oh!

Borda Jun 23, 2025

Uh oh!

Borda commented Jun 23, 2025

Uh oh!

baskrahmer commented Jun 23, 2025

Uh oh!

Uh oh!

Fix: no_grad with AMP bug #20921

Are you sure you want to change the base?

Fix: no_grad with AMP bug #20921

Uh oh!

Conversation

baskrahmer commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Borda Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

Borda commented Jun 23, 2025

Uh oh!

baskrahmer commented Jun 23, 2025

Uh oh!

Uh oh!

Fix: `no_grad` with AMP bug #20921

Fix: `no_grad` with AMP bug #20921

baskrahmer commented Jun 20, 2025 •

edited

Loading