Skip to content

Support u32 indices in HashJoinExec #16179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Dandandan opened this issue May 24, 2025 · 3 comments
Open

Support u32 indices in HashJoinExec #16179

Dandandan opened this issue May 24, 2025 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@Dandandan
Copy link
Contributor

Dandandan commented May 24, 2025

Is your feature request related to a problem or challenge?

Currently we save indices to the batch always as u64 in the HashTable and in the next Vec.
If we have less than u32:MAX (4.2B, i.e. most of the time) items in the build sidre, we can store them as u32 - which should make it fit more easily in the CPU cache.

Describe the solution you'd like

  • Implement optimization to store indices as u32 if possible.
  • Run benchmarks

Describe alternatives you've considered

No response

Additional context

No response

@jonathanc-n
Copy link
Contributor

take

@jonathanc-n
Copy link
Contributor

jonathanc-n commented May 26, 2025

@Dandandan For this, is the preferable solution to create a generic parameter, and when the hash join exec/ stream is created, based on the size of the build side we can assign the generic that value? or an enum could also possibly work? What do you think?

@Dandandan
Copy link
Contributor Author

@Dandandan For this, is the preferable solution to create a generic parameter, and when the hash join exec/ stream is created, based on the size of the build side we can assign the generic that value? or an enum could also possibly work? What do you think?

I think we'll likely have to:

  • add a generic type to JoinHashMap<T>
  • make a Box<dyn JoinHashMap> based on the num_rows (num_rows <= u32::MAX => u32 else u64)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants