Skip to content

[BE]: Update cusparselt to 0.7.1 #155232

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

Skylion007
Copy link
Collaborator

Needed to support sparse operations on Blackwell, and implements new features for the library. Also optimizes library sizes vs 0.7

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Jun 5, 2025
Copy link

pytorch-bot bot commented Jun 5, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155232

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures, 2 Unrelated Failures

As of commit 6e68752 with merge base da1f898 (image):

NEW FAILURES - The following jobs have failed:

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@Skylion007 Skylion007 force-pushed the skylion007/update-cusparselt-0-7-1 branch from d78ec5a to 4686a22 Compare June 5, 2025 16:22
@Skylion007 Skylion007 requested review from eqy, albanD, atalman and nWEIdia June 5, 2025 16:41
@Skylion007 Skylion007 marked this pull request as ready for review June 5, 2025 16:47
@Skylion007 Skylion007 requested review from a team and jeffdaily as code owners June 5, 2025 16:47
@Skylion007 Skylion007 added the better-engineering Relatively self-contained tasks for better engineering contributors label Jun 5, 2025
@Skylion007 Skylion007 requested review from ngimel and malfet June 5, 2025 16:51
@Skylion007 Skylion007 force-pushed the skylion007/update-cusparselt-0-7-1 branch 2 times, most recently from cd8626d to e238d23 Compare June 6, 2025 16:13
@Skylion007
Copy link
Collaborator Author

@nWEIdia Any thoughts here?

@nWEIdia
Copy link
Collaborator

nWEIdia commented Jun 6, 2025

@nWEIdia Any thoughts here?

This may have implications (and potentially complications) on the binary size increase for the upcoming v2.8.
Could you please help figure out the size increase? It would also be great to be educated how the community is using the cusparselt library.
If the binary size increase is acceptable, I think we can merge this PR. But I think @tinglvv is also working on cuda 12.9.1 update, which may (or may not) affect the binary size. So I would prefer investigating this after cuda 12.9.1 update, especially on the binary size front.

cc @atalman @malfet @ptrblck @tinglvv

@Skylion007
Copy link
Collaborator Author

Skylion007 commented Jun 6, 2025

@nWEIdia Any thoughts here?

This may have implications (and potentially complications) on the binary size increase for the upcoming v2.8. Could you please help figure out the size increase? It would also be great to be educated how the community is using the cusparselt library. If the binary size increase is acceptable, I think we can merge this PR. But I think @tinglvv is also working on cuda 12.9.1 update, which may (or may not) affect the binary size. So I would prefer investigating this after cuda 12.9.1 update, especially on the binary size front.

cc @atalman @malfet @ptrblck @tinglvv

We dynamically link cuSparseLT and NVidia distributes it as a separate wheel so it shouldn't have any signficant binary sizes increases in our wheel

@nWEIdia
Copy link
Collaborator

nWEIdia commented Jun 6, 2025

@nWEIdia Any thoughts here?

This may have implications (and potentially complications) on the binary size increase for the upcoming v2.8. Could you please help figure out the size increase? It would also be great to be educated how the community is using the cusparselt library. If the binary size increase is acceptable, I think we can merge this PR. But I think @tinglvv is also working on cuda 12.9.1 update, which may (or may not) affect the binary size. So I would prefer investigating this after cuda 12.9.1 update, especially on the binary size front.
cc @atalman @malfet @ptrblck @tinglvv

We dynamically link cuSparseLT and distribute it as a separate wheel so it shouldn't have any signficant binary sizes increases in our wheel

Ah yes, just recalled that separation part, thanks! In this case it LGTM.

@tinglvv
Copy link
Collaborator

tinglvv commented Jun 6, 2025

LGTM if CI is green. Will also upgrade cusparseLt for cuda 12.9 builds in my PRs.

@Skylion007
Copy link
Collaborator Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 6, 2025
@Skylion007 Skylion007 closed this Jun 6, 2025
@Skylion007 Skylion007 reopened this Jun 6, 2025
@Skylion007
Copy link
Collaborator Author

Just need @atalman to upload the binaries this one.

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Approvers from one of the following sets are needed:

  • OSS CI (alband, dagitses, pytorch/pytorch-dev-infra)
  • superuser (pytorch/metamates)
  • Core Reviewers (mruberry, lezcano, Skylion007, ngimel, peterbell10, ...)
  • Core Maintainers (soumith, gchanan, ezyang, dzhulgakov, malfet, ...)
Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@Skylion007
Copy link
Collaborator Author

@pytorchbot rebase

@Skylion007 Skylion007 added the ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR label Jun 7, 2025
@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased skylion007/update-cusparselt-0-7-1 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout skylion007/update-cusparselt-0-7-1 && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the skylion007/update-cusparselt-0-7-1 branch from e238d23 to eb24121 Compare June 7, 2025 14:28
@@ -165,6 +165,7 @@ if [[ $CUDA_VERSION == 12* ]]; then
'$ORIGIN/../../nvidia/curand/lib'
'$ORIGIN/../../nvidia/cusolver/lib'
'$ORIGIN/../../nvidia/cusparse/lib'
'$ORIGIN/../../nvidia/cusparselt/lib'
Copy link
Collaborator Author

@Skylion007 Skylion007 Jun 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nvidia moved the cusparselt library to under /nvidia/ from 0.6.3 to 0.7

@malfet
Copy link
Contributor

malfet commented Jun 9, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@malfet
Copy link
Contributor

malfet commented Jun 9, 2025

@pytorchbot merge -i

thatgeeman pushed a commit to thatgeeman/pytorch-docathon that referenced this pull request Jun 15, 2025
Needed to support sparse operations on Blackwell, and implements new features for the library. Also optimizes library sizes vs 0.7

Pull Request resolved: pytorch#155232
Approved by: https://github.com/nWEIdia, https://github.com/malfet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
better-engineering Relatively self-contained tasks for better engineering contributors ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR ciflow/trunk Trigger trunk jobs on your pull request Merged open source topic: not user facing topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants