[BE]: Update cusparselt to 0.7.1 #155232

Skylion007 · 2025-06-05T16:17:29Z

Needed to support sparse operations on Blackwell, and implements new features for the library. Also optimizes library sizes vs 0.7

pytorch-bot · 2025-06-05T16:17:33Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155232

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures, 2 Unrelated Failures

As of commit 6e68752 with merge base da1f898 ():

NEW FAILURES - The following jobs have failed:

linux-binary-manywheel / manywheel-py3_10-rocm6_4-build / build (gh)
Process completed with exit code 1.
linux-binary-manywheel / manywheel-py3_11-rocm6_4-build / build (gh)
Process completed with exit code 1.
linux-binary-manywheel / manywheel-py3_12-rocm6_4-build / build (gh)
Process completed with exit code 1.
linux-binary-manywheel / manywheel-py3_13-rocm6_4-build / build (gh)
Process completed with exit code 1.
linux-binary-manywheel / manywheel-py3_13t-rocm6_4-build / build (gh)
Process completed with exit code 1.
linux-binary-manywheel / manywheel-py3_9-rocm6_4-build / build (gh)
Process completed with exit code 1.

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

docker-builds / docker-build (linux.12xlarge, pytorch-linux-jammy-py3-clang12-executorch) (gh) (related job)
Final attempt failed. Child_process exited with error code 1
pull / linux-jammy-py3-clang12-executorch / build (gh) (#150261)
Final attempt failed. Child_process exited with error code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Skylion007 · 2025-06-06T16:20:17Z

@nWEIdia Any thoughts here?

nWEIdia · 2025-06-06T16:26:32Z

@nWEIdia Any thoughts here?

This may have implications (and potentially complications) on the binary size increase for the upcoming v2.8.
Could you please help figure out the size increase? It would also be great to be educated how the community is using the cusparselt library.
If the binary size increase is acceptable, I think we can merge this PR. But I think @tinglvv is also working on cuda 12.9.1 update, which may (or may not) affect the binary size. So I would prefer investigating this after cuda 12.9.1 update, especially on the binary size front.

cc @atalman @malfet @ptrblck @tinglvv

Skylion007 · 2025-06-06T16:35:52Z

@nWEIdia Any thoughts here?

This may have implications (and potentially complications) on the binary size increase for the upcoming v2.8. Could you please help figure out the size increase? It would also be great to be educated how the community is using the cusparselt library. If the binary size increase is acceptable, I think we can merge this PR. But I think @tinglvv is also working on cuda 12.9.1 update, which may (or may not) affect the binary size. So I would prefer investigating this after cuda 12.9.1 update, especially on the binary size front.

cc @atalman @malfet @ptrblck @tinglvv

We dynamically link cuSparseLT and NVidia distributes it as a separate wheel so it shouldn't have any signficant binary sizes increases in our wheel

nWEIdia · 2025-06-06T16:38:18Z

@nWEIdia Any thoughts here?

This may have implications (and potentially complications) on the binary size increase for the upcoming v2.8. Could you please help figure out the size increase? It would also be great to be educated how the community is using the cusparselt library. If the binary size increase is acceptable, I think we can merge this PR. But I think @tinglvv is also working on cuda 12.9.1 update, which may (or may not) affect the binary size. So I would prefer investigating this after cuda 12.9.1 update, especially on the binary size front.
cc @atalman @malfet @ptrblck @tinglvv

We dynamically link cuSparseLT and distribute it as a separate wheel so it shouldn't have any signficant binary sizes increases in our wheel

Ah yes, just recalled that separation part, thanks! In this case it LGTM.

tinglvv · 2025-06-06T16:53:10Z

LGTM if CI is green. Will also upgrade cusparseLt for cuda 12.9 builds in my PRs.

Skylion007 · 2025-06-06T16:55:17Z

@pytorchbot merge

Skylion007 · 2025-06-06T16:56:43Z

Just need @atalman to upload the binaries this one.

pytorchmergebot · 2025-06-06T16:57:15Z

Merge failed

Reason: Approvers from one of the following sets are needed:

OSS CI (alband, dagitses, pytorch/pytorch-dev-infra)
superuser (pytorch/metamates)
Core Reviewers (mruberry, lezcano, Skylion007, ngimel, peterbell10, ...)
Core Maintainers (soumith, gchanan, ezyang, dzhulgakov, malfet, ...)

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

Skylion007 · 2025-06-07T14:26:55Z

@pytorchbot rebase

pytorchmergebot · 2025-06-07T14:28:25Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-06-07T14:28:28Z

Successfully rebased skylion007/update-cusparselt-0-7-1 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout skylion007/update-cusparselt-0-7-1 && git pull --rebase)

Skylion007 · 2025-06-08T15:28:37Z

.ci/manywheel/build_cuda.sh

@@ -165,6 +165,7 @@ if [[ $CUDA_VERSION == 12* ]]; then
            '$ORIGIN/../../nvidia/curand/lib'
            '$ORIGIN/../../nvidia/cusolver/lib'
            '$ORIGIN/../../nvidia/cusparse/lib'
+            '$ORIGIN/../../nvidia/cusparselt/lib'


Nvidia moved the cusparselt library to under /nvidia/ from 0.6.3 to 0.7

.ci/manywheel/build_cuda.sh

malfet · 2025-06-09T17:50:14Z

@pytorchbot merge

pytorchmergebot · 2025-06-09T17:52:58Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-06-09T17:53:17Z

Merge failed

Reason: 6 jobs have failed, first few of them are: linux-binary-manywheel / manywheel-py3_9-rocm6_4-build / build, linux-binary-manywheel / manywheel-py3_12-rocm6_4-build / build, linux-binary-manywheel / manywheel-py3_13-rocm6_4-build / build, linux-binary-manywheel / manywheel-py3_10-rocm6_4-build / build, linux-binary-manywheel / manywheel-py3_13t-rocm6_4-build / build

Details for Dev Infra team

Raised by workflow job

malfet · 2025-06-09T17:53:52Z

@pytorchbot merge -i

pytorchmergebot · 2025-06-09T17:55:39Z

Merge started

Your change will be merged while ignoring the following 8 checks: docker-builds / docker-build (linux.12xlarge, pytorch-linux-jammy-py3-clang12-executorch), .github/workflows/pull.yml / linux-jammy-py3-clang12-executorch / build, linux-binary-manywheel / manywheel-py3_9-rocm6_4-build / build, linux-binary-manywheel / manywheel-py3_12-rocm6_4-build / build, linux-binary-manywheel / manywheel-py3_13-rocm6_4-build / build, linux-binary-manywheel / manywheel-py3_10-rocm6_4-build / build, linux-binary-manywheel / manywheel-py3_13t-rocm6_4-build / build, linux-binary-manywheel / manywheel-py3_11-rocm6_4-build / build

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Needed to support sparse operations on Blackwell, and implements new features for the library. Also optimizes library sizes vs 0.7 Pull Request resolved: pytorch#155232 Approved by: https://github.com/nWEIdia, https://github.com/malfet

pytorch-bot bot added the topic: not user facing topic category label Jun 5, 2025

Skylion007 force-pushed the skylion007/update-cusparselt-0-7-1 branch from d78ec5a to 4686a22 Compare June 5, 2025 16:22

pytorchbot added the open source label Jun 5, 2025

Skylion007 requested review from eqy, albanD, atalman and nWEIdia June 5, 2025 16:41

Skylion007 marked this pull request as ready for review June 5, 2025 16:47

Skylion007 requested review from a team and jeffdaily as code owners June 5, 2025 16:47

Skylion007 added the better-engineering Relatively self-contained tasks for better engineering contributors label Jun 5, 2025

Skylion007 requested review from ngimel and malfet June 5, 2025 16:51

Skylion007 force-pushed the skylion007/update-cusparselt-0-7-1 branch 2 times, most recently from cd8626d to e238d23 Compare June 6, 2025 16:13

nWEIdia approved these changes Jun 6, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 6, 2025

Skylion007 closed this Jun 6, 2025

Skylion007 reopened this Jun 6, 2025

pytorchmergebot added the merging label Jun 6, 2025

pytorchmergebot removed the merging label Jun 6, 2025

Skylion007 added the ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR label Jun 7, 2025

[BE]: Update cusparselt to 0.7.1

eb24121

pytorchmergebot force-pushed the skylion007/update-cusparselt-0-7-1 branch from e238d23 to eb24121 Compare June 7, 2025 14:28

Is RPATH missing?

6e68752

Skylion007 commented Jun 8, 2025

View reviewed changes

tinglvv mentioned this pull request Jun 9, 2025

Add almalinux CUDA 12.9 docker build, required for magma build #155340

Closed

nWEIdia reviewed Jun 9, 2025

View reviewed changes

.ci/manywheel/build_cuda.sh Show resolved Hide resolved

malfet approved these changes Jun 9, 2025

View reviewed changes

pytorchmergebot added the merging label Jun 9, 2025

pytorchmergebot removed the merging label Jun 9, 2025

pytorchmergebot added the merging label Jun 9, 2025

pytorchmergebot closed this in 3863bbb Jun 9, 2025

pytorchmergebot added Merged and removed merging labels Jun 9, 2025

[BE]: Update cusparselt to 0.7.1 #155232

[BE]: Update cusparselt to 0.7.1 #155232

Uh oh!

Conversation

Skylion007 commented Jun 5, 2025

Uh oh!

pytorch-bot bot commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155232

❌ 6 New Failures, 2 Unrelated Failures

Uh oh!

Skylion007 commented Jun 6, 2025

Uh oh!

nWEIdia commented Jun 6, 2025

Uh oh!

Skylion007 commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nWEIdia commented Jun 6, 2025

Uh oh!

tinglvv commented Jun 6, 2025

Uh oh!

Skylion007 commented Jun 6, 2025

Uh oh!

Skylion007 commented Jun 6, 2025

Uh oh!

pytorchmergebot commented Jun 6, 2025

Merge failed

Uh oh!

Skylion007 commented Jun 7, 2025

Uh oh!

pytorchmergebot commented Jun 7, 2025

Uh oh!

pytorchmergebot commented Jun 7, 2025

Uh oh!

Skylion007 Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

malfet commented Jun 9, 2025

Uh oh!

pytorchmergebot commented Jun 9, 2025

Merge started

Uh oh!

pytorchmergebot commented Jun 9, 2025

Merge failed

Uh oh!

malfet commented Jun 9, 2025

Uh oh!

pytorchmergebot commented Jun 9, 2025

Merge started

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 5, 2025 •

edited

Loading

Skylion007 commented Jun 6, 2025 •

edited

Loading

Skylion007 Jun 8, 2025 •

edited

Loading