-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Issues: deepspeedai/DeepSpeed
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Functorch support: RuntimeError: In order to use an autograd.Function with functorch transforms
#7323
opened Jun 1, 2025 by
ifiaposto
Trainer.train(resume_from_checkpoint=...) fails when using auto tensor parallel
bug
Something isn't working
training
#7320
opened May 29, 2025 by
Peter-Chou
Does DeepSpeed support the training of large-scale recommender systems (RecSys)?
#7318
opened May 29, 2025 by
Lenan22
[BUG]RuntimeError: Distributed package doesn't have NCCL built in. Training on huawei NPU
bug
Something isn't working
training
#7312
opened May 26, 2025 by
whisper0055
[BUG] DeepCompile: MemoryProfiling error /pytorch/build/aten/src/ATen/RegisterCUDA.cpp:7280: SymIntArrayRef expected to contain only concrete integers
bug
Something isn't working
training
#7311
opened May 26, 2025 by
unavailableun
[BUG] Installation doesn't work (Unable to import torch, pre-compiling ops will be disabled.)
install
Installation and package dependencies
windows
Questions or PRs relating to running DeepSpeed on Windows
#7286
opened May 15, 2025 by
EugeoSynthesisThirtyTwo
[BUG] DeepSpeed Inference Error for Llama 3 Models AssertionError: Merging tensors is not allowed here!
bug
Something isn't working
inference
#7282
opened May 14, 2025 by
tokestermw
[BUG]Install deepspeed on the npu machine, and an error is reported during verification
bug
Something isn't working
training
#7281
opened May 14, 2025 by
lgy1027
[BUG] how to achieve hybrid data and pipeline parallelism?
bug
Something isn't working
training
#7280
opened May 12, 2025 by
Malena-yy
[BUG] Deepspeed-Inference: support AutoTP for Llama-4 models
bug
Something isn't working
inference
#7277
opened May 10, 2025 by
songdezhao
[BUG] Install to Windows - fatal error LNK1181
bug
Something isn't working
windows
Questions or PRs relating to running DeepSpeed on Windows
#7276
opened May 9, 2025 by
lokanaft
[BUG] Qwen3: model loading failed when using meta device
bug
Something isn't working
inference
#7275
opened May 9, 2025 by
songdezhao
[REQUEST] New Integration - NeptuneMonitor
enhancement
New feature or request
#7274
opened May 8, 2025 by
LeoRoccoBreedt
[BUG]"DeepSpeedZeRoOffload missing '_restore_from_bit16_weights' method when loading checkpoints"
bug
Something isn't working
inference
#7272
opened May 6, 2025 by
calliope-pro
[REQUEST] Equivalent of FSDP ignore_params or ignore_modules for DeepSpeed Zero 3
enhancement
New feature or request
#7271
opened May 6, 2025 by
kmehant
[BUG] Something isn't working
training
Assert Error: assert buffer.grad is not None
& RuntimeError: element 1 of tensors does not require grad and does not have a grad_fn
During pipeline parallelism
bug
#7270
opened May 3, 2025 by
mmkjj
[BUG][autotuner.py:700:model_info_profile_run] The model is not runnable with DeepSpeed with error = (
bug
Something isn't working
training
#7269
opened May 3, 2025 by
SeekPoint
[BUG] Hanging problem when LoRA sft Qwen2.5Omni with multi-turn video-audio samples with ds-z3 or ds-z3-offload
bug
Something isn't working
training
#7264
opened Apr 30, 2025 by
Luffy-ZY-Wang
[BUG] - Multiple 5090s failing on deepspeed.initialize()
bug
Something isn't working
training
#7261
opened Apr 29, 2025 by
Oruli
[BUG] AttributeError: 'UnembedParameter' object has no attribute 'dtype'
bug
Something isn't working
inference
#7260
opened Apr 29, 2025 by
lambda7xx
[BUG] AutoTP fails for Qwen2.5 models when tp size > 2
bug
Something isn't working
training
#7259
opened Apr 28, 2025 by
HollowMan6
Previous Next
ProTip!
Add no:assignee to see everything that’s not assigned.