Skip to content

Migrate to new cubecl multi tensor handle changes #3136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

wingertge
Copy link
Contributor

Pull Request Template

Checklist

  • Confirmed that cargo run-checks command has been executed.
  • Made sure the book is up to date with changes in this PR.

Related Issues/PRs

Update PR for tracel-ai/cubecl#661

Changes

Updates to new matmul signature, fixes some bugs in quantize kernel

Testing

Tests pass

@wingertge wingertge changed the title Migrate to new cubecl changes Migrate to new cubecl multi tensor handle changes May 2, 2025
Copy link
Member

@laggui laggui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the current limitations with the quantization kernels!

Just to give you a bit more (historical) context (might help fill in the blanks lol):

The current implementation dates back to a long time ago, it hasn't changed much since the first draft that came around the initial release of CubeCL. It was also written specifically with wgpu in mind which only supported f32 and u32 (hence the packing / unpacking) because the CUDA backend was barely a thing around this time lol.

I was just discussing this recently with Max and the plan was to add int8 support. This is exactly in line with your changes, and the additions in cubecl will make things a lot easier. Thank you 🙏

(from the linked cubecl PR)
Adds multi-tensor allocations so quantization params can be allocated along with strided tensors. The multiple tensors are allocated into the same buffer, but with separate strides.

This is awesome! It makes a lot more sense. The initial works actually dealt with multiple handles on the burn-ir side, but it was messy. And there was no easy way to do this at the time for cubecl. Tensor operations and transformations (e.g., layout changes) will be easier to handle with such a separation, especially as we add more quantization levels (e.g., per-block).

TL;DR: thanks for contribution(s) 😄 the changes LGTM

let cube_count = calculate_cube_count_elemwise(num_elems / line_size as usize, cube_dim);

match scheme {
QuantizationScheme::PerTensor(QuantizationMode::Symmetric, QuantizationType::QInt8) => {
Copy link
Member

@laggui laggui May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will change with #3042 as the quantization scheme will be more like a config struct.

But no worries, we'll make sure that this PR is merged first so you can avoid the possible conflict hell 😅

/edit: sorry, I guess I was wrong. I'll fix the conflicts now that it was merged on cubecl!

Copy link
Member

@laggui laggui May 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw I merged conflicts earlier to bring this up to date since we introduced some conflicts on main 🙂

On CUDA matmul and quantization work, but dequantization not yet (hasn't been modified to match the changes, so that's expected - we also need to keep the quantized output scale around).

On wgpu there seem to be alignment issues, e.g.

  thread 'tests::cube::kernel::quantization::tests::should_quantize_dequantize_symmetric_multiple' panicked at /Users/admin/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wgpu-25.0.0/src/backend/wgpu_core.rs:1223:26:
  wgpu error: Validation Error
  
  Caused by:
    In Device::create_bind_group
      Buffer offset 520 does not respect device's requested `min_storage_buffer_offset_alignment` limit 256

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah it sems like the align for copy is incorrect for offsets, we need to align to the min storage buffer offset instead. I'll have to fix that in cubecl.

@laggui laggui mentioned this pull request May 9, 2025
2 tasks
Copy link

codecov bot commented May 12, 2025

Codecov Report

Attention: Patch coverage is 55.50661% with 202 lines in your changes missing coverage. Please review.

Project coverage is 82.10%. Comparing base (ecddf9e) to head (6216a46).
Report is 49 commits behind head on main.

Files with missing lines Patch % Lines
...es/burn-cubecl/src/kernel/quantization/quantize.rs 21.18% 93 Missing ⚠️
.../burn-cubecl/src/kernel/quantization/dequantize.rs 20.00% 68 Missing ⚠️
crates/burn-cubecl/src/kernel/contiguous.rs 19.23% 21 Missing ⚠️
crates/burn-cubecl/src/kernel/matmul/base.rs 0.00% 6 Missing ⚠️
crates/burn-cubecl/src/tensor/base.rs 88.63% 5 Missing ⚠️
...tes/burn-cubecl/src/kernel/quantization/qtensor.rs 0.00% 2 Missing ⚠️
crates/burn-cubecl/src/ops/base.rs 90.90% 2 Missing ⚠️
crates/burn-tensor/src/tensor/element/base.rs 0.00% 2 Missing ⚠️
crates/burn-cubecl-fusion/src/shared/tensor.rs 0.00% 1 Missing ⚠️
crates/burn-cubecl/src/ops/qtensor.rs 99.00% 1 Missing ⚠️
... and 1 more

❌ Your patch check has failed because the patch coverage (55.50%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3136      +/-   ##
==========================================
- Coverage   82.20%   82.10%   -0.10%     
==========================================
  Files         962      963       +1     
  Lines      122541   122833     +292     
==========================================
+ Hits       100732   100854     +122     
- Misses      21809    21979     +170     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@wingertge wingertge marked this pull request as ready for review May 12, 2025 19:25
Copy link
Contributor

This PR has been marked as stale because it has not been updated for over a month

@github-actions github-actions bot added stale The issue or pr has been open for too long and removed stale The issue or pr has been open for too long labels Jun 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants