Skip to content

Create version of LexicographicalComparator that compares fixed number of columns (~ -15%) #7530

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 20, 2025

Conversation

Dandandan
Copy link
Contributor

@Dandandan Dandandan commented May 20, 2025

Which issue does this PR close?

Closes #7531

Rationale for this change

Helps the compiler optimize the code.

lexsort (f32, f32) 2^10 time:   [27.356 µs 27.408 µs 27.478 µs]
                        change: [-14.744% -14.479% -14.183%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe

lexsort (f32, f32) 2^12 time:   [126.85 µs 127.12 µs 127.48 µs]
                        change: [-18.023% -16.578% -15.558%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

lexsort (f32, f32) nulls 2^10
                        time:   [28.484 µs 28.540 µs 28.617 µs]
                        change: [-12.483% -12.066% -11.487%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) high mild
  3 (3.00%) high severe

lexsort (f32, f32) nulls 2^12
                        time:   [131.92 µs 132.41 µs 132.98 µs]
                        change: [-13.547% -12.884% -12.255%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

lexsort (bool, bool) 2^12
                        time:   [63.203 µs 63.744 µs 64.269 µs]
                        change: [-21.048% -19.908% -18.651%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low mild
  6 (6.00%) high mild

lexsort (bool, bool) nulls 2^12
                        time:   [81.474 µs 82.081 µs 82.698 µs]
                        change: [-15.395% -14.487% -13.577%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

lexsort (f32, f32) 2^12 limit 10
                        time:   [21.673 µs 21.805 µs 22.024 µs]
                        change: [-23.179% -22.869% -22.513%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

lexsort (f32, f32) 2^12 limit 100
                        time:   [23.409 µs 23.435 µs 23.463 µs]
                        change: [-21.842% -21.631% -21.449%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

lexsort (f32, f32) 2^12 limit 1000
                        time:   [46.207 µs 46.322 µs 46.517 µs]
                        change: [-18.231% -17.913% -17.639%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) high mild
  1 (1.00%) high severe

lexsort (f32, f32) 2^12 limit 2^12
                        time:   [126.26 µs 126.40 µs 126.55 µs]
                        change: [-16.076% -15.776% -15.496%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

lexsort (f32, f32) nulls 2^12 limit 10
                        time:   [46.130 µs 46.411 µs 46.704 µs]
                        change: [-22.598% -22.052% -21.523%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

lexsort (f32, f32) nulls 2^12 limit 100
                        time:   [47.388 µs 47.680 µs 47.976 µs]
                        change: [-21.878% -21.258% -20.598%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild

lexsort (f32, f32) nulls 2^12 limit 1000
                        time:   [51.974 µs 52.394 µs 52.828 µs]
                        change: [-21.723% -21.073% -20.451%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

lexsort (f32, f32) nulls 2^12 limit 2^12
                        time:   [131.29 µs 131.71 µs 132.27 µs]
                        change: [-12.588% -12.314% -12.051%] (p = 0.00 < 0.05)
                        Performance has improved.

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label May 20, 2025
@Dandandan Dandandan changed the title Create version of LexicographicalComparator that compares fixed number of columns Create version of LexicographicalComparator that compares fixed number of columns (~ -15%) May 20, 2025
@Dandandan Dandandan requested a review from Copilot May 20, 2025 08:14
Copilot

This comment was marked as outdated.

@Dandandan Dandandan requested a review from Copilot May 20, 2025 08:26
Copilot

This comment was marked as outdated.

@alamb
Copy link
Contributor

alamb commented May 20, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing unroll_sort (b4fe5af) to 741121b diff
BENCH_NAME=lexsort
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench lexsort
BENCH_FILTER=
BENCH_BRANCH_NAME=unroll_sort
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented May 20, 2025

🤖: Benchmark completed

Details

group                                                                                                            main                                   unroll_sort
-----                                                                                                            ----                                   -----------
lexsort_rows([dict(100,str_opt(50)), dict(100,str_opt(50)), dict(100,str_opt(50)), str(16)]): 32768              1.00      4.1±0.03ms        ? ?/sec    1.00      4.1±0.01ms        ? ?/sec
lexsort_rows([dict(100,str_opt(50)), dict(100,str_opt(50)), dict(100,str_opt(50)), str(16)]): 4096               1.00    404.8±4.10µs        ? ?/sec    1.00    406.2±1.13µs        ? ?/sec
lexsort_rows([dict(100,str_opt(50)), dict(100,str_opt(50)), dict(100,str_opt(50)), str_opt(50)]): 32768          1.02      4.6±0.05ms        ? ?/sec    1.00      4.6±0.02ms        ? ?/sec
lexsort_rows([dict(100,str_opt(50)), dict(100,str_opt(50)), dict(100,str_opt(50)), str_opt(50)]): 4096           1.00    422.2±1.68µs        ? ?/sec    1.01    427.4±0.77µs        ? ?/sec
lexsort_rows([dict(100,str_opt(50)), dict(100,str_opt(50))]): 32768                                              1.00   1216.2±4.78µs        ? ?/sec    1.01   1228.3±2.88µs        ? ?/sec
lexsort_rows([dict(100,str_opt(50)), dict(100,str_opt(50))]): 4096                                               1.00    133.7±0.39µs        ? ?/sec    1.00    133.8±0.86µs        ? ?/sec
lexsort_rows([i32, i32_list, str(16)]): 32768                                                                    1.01      5.3±0.02ms        ? ?/sec    1.00      5.2±0.02ms        ? ?/sec
lexsort_rows([i32, i32_list, str(16)]): 4096                                                                     1.01    545.7±3.36µs        ? ?/sec    1.00    542.0±1.87µs        ? ?/sec
lexsort_rows([i32, i32_opt]): 32768                                                                              1.00      2.6±0.01ms        ? ?/sec    1.01      2.6±0.01ms        ? ?/sec
lexsort_rows([i32, i32_opt]): 4096                                                                               1.00    261.1±0.53µs        ? ?/sec    1.00    261.9±0.39µs        ? ?/sec
lexsort_rows([i32, str(16)]): 32768                                                                              1.00      2.9±0.01ms        ? ?/sec    1.00      2.9±0.01ms        ? ?/sec
lexsort_rows([i32, str(16)]): 4096                                                                               1.00    286.2±1.40µs        ? ?/sec    1.01    288.7±0.67µs        ? ?/sec
lexsort_rows([i32, str_list(4)]): 32768                                                                          1.11     21.0±0.39ms        ? ?/sec    1.00     18.9±0.42ms        ? ?/sec
lexsort_rows([i32, str_list(4)]): 4096                                                                           1.01   1108.4±8.00µs        ? ?/sec    1.00   1099.7±8.33µs        ? ?/sec
lexsort_rows([i32, str_list_opt(4)]): 32768                                                                      1.04     18.2±0.39ms        ? ?/sec    1.00     17.5±0.39ms        ? ?/sec
lexsort_rows([i32, str_list_opt(4)]): 4096                                                                       1.01   1095.4±5.38µs        ? ?/sec    1.00  1081.7±10.87µs        ? ?/sec
lexsort_rows([i32, str_opt(16)]): 32768                                                                          1.01      3.1±0.01ms        ? ?/sec    1.00      3.0±0.01ms        ? ?/sec
lexsort_rows([i32, str_opt(16)]): 4096                                                                           1.00    310.0±0.73µs        ? ?/sec    1.00    309.5±0.57µs        ? ?/sec
lexsort_rows([i32_list_opt, i32_opt]): 32768                                                                     1.01      5.1±0.02ms        ? ?/sec    1.00      5.0±0.02ms        ? ?/sec
lexsort_rows([i32_list_opt, i32_opt]): 4096                                                                      1.01    541.3±2.95µs        ? ?/sec    1.00    537.3±0.86µs        ? ?/sec
lexsort_rows([i32_opt, dict(100,str_opt(50))]): 32768                                                            1.00      2.7±0.01ms        ? ?/sec    1.00      2.7±0.01ms        ? ?/sec
lexsort_rows([i32_opt, dict(100,str_opt(50))]): 4096                                                             1.01    273.3±2.04µs        ? ?/sec    1.00    271.6±1.84µs        ? ?/sec
lexsort_rows([i32_opt, i32_list]): 32768                                                                         1.00      4.7±0.01ms        ? ?/sec    1.00      4.7±0.01ms        ? ?/sec
lexsort_rows([i32_opt, i32_list]): 4096                                                                          1.00    513.0±0.82µs        ? ?/sec    1.00    511.0±3.77µs        ? ?/sec
lexsort_rows([i32_opt, i32_list_opt, str_opt(50)]): 32768                                                        1.01      5.5±0.03ms        ? ?/sec    1.00      5.4±0.04ms        ? ?/sec
lexsort_rows([i32_opt, i32_list_opt, str_opt(50)]): 4096                                                         1.01    544.6±1.66µs        ? ?/sec    1.00    539.1±1.68µs        ? ?/sec
lexsort_rows([i32_opt, i32_list_opt]): 32768                                                                     1.01      5.0±0.02ms        ? ?/sec    1.00      5.0±0.01ms        ? ?/sec
lexsort_rows([i32_opt, i32_list_opt]): 4096                                                                      1.00    549.2±1.48µs        ? ?/sec    1.00    546.7±2.56µs        ? ?/sec
lexsort_rows([str_list(4), i32]): 32768                                                                          1.09     20.7±0.61ms        ? ?/sec    1.00     19.1±0.38ms        ? ?/sec
lexsort_rows([str_list(4), i32]): 4096                                                                           1.01   1052.6±9.26µs        ? ?/sec    1.00   1041.7±7.23µs        ? ?/sec
lexsort_rows([str_list_opt(4), i32]): 32768                                                                      1.07     18.0±0.42ms        ? ?/sec    1.00     16.8±0.37ms        ? ?/sec
lexsort_rows([str_list_opt(4), i32]): 4096                                                                       1.01   1014.1±7.37µs        ? ?/sec    1.00  1001.2±12.25µs        ? ?/sec
lexsort_rows([str_opt(16), str(16), str_opt(16), str_opt(16), str_opt(16)]): 32768                               1.01      5.4±0.03ms        ? ?/sec    1.00      5.3±0.02ms        ? ?/sec
lexsort_rows([str_opt(16), str(16), str_opt(16), str_opt(16), str_opt(16)]): 4096                                1.01    538.5±1.00µs        ? ?/sec    1.00    534.4±1.42µs        ? ?/sec
lexsort_rows([str_opt(16), str(16)]): 32768                                                                      1.01      3.5±0.02ms        ? ?/sec    1.00      3.4±0.01ms        ? ?/sec
lexsort_rows([str_opt(16), str(16)]): 4096                                                                       1.01    348.9±0.41µs        ? ?/sec    1.00    346.5±0.64µs        ? ?/sec
lexsort_rows([str_opt(16), str_opt(50), str(16)]): 32768                                                         1.00      4.3±0.02ms        ? ?/sec    1.01      4.4±0.03ms        ? ?/sec
lexsort_rows([str_opt(16), str_opt(50), str(16)]): 4096                                                          1.00    410.2±1.51µs        ? ?/sec    1.00    410.3±0.89µs        ? ?/sec
lexsort_to_indices([dict(100,str_opt(50)), dict(100,str_opt(50)), dict(100,str_opt(50)), str(16)]): 32768        1.00     10.1±0.03ms        ? ?/sec    1.01     10.3±0.05ms        ? ?/sec
lexsort_to_indices([dict(100,str_opt(50)), dict(100,str_opt(50)), dict(100,str_opt(50)), str(16)]): 4096         1.00    923.5±3.45µs        ? ?/sec    1.06    974.9±5.72µs        ? ?/sec
lexsort_to_indices([dict(100,str_opt(50)), dict(100,str_opt(50)), dict(100,str_opt(50)), str_opt(50)]): 32768    1.01     10.8±0.06ms        ? ?/sec    1.00     10.6±0.06ms        ? ?/sec
lexsort_to_indices([dict(100,str_opt(50)), dict(100,str_opt(50)), dict(100,str_opt(50)), str_opt(50)]): 4096     1.00    928.7±3.01µs        ? ?/sec    1.05    971.8±1.77µs        ? ?/sec
lexsort_to_indices([dict(100,str_opt(50)), dict(100,str_opt(50))]): 32768                                        1.01   1151.2±4.60µs        ? ?/sec    1.00   1142.9±3.61µs        ? ?/sec
lexsort_to_indices([dict(100,str_opt(50)), dict(100,str_opt(50))]): 4096                                         1.01    124.9±0.35µs        ? ?/sec    1.00    124.0±0.36µs        ? ?/sec
lexsort_to_indices([i32, i32_list, str(16)]): 32768                                                              1.21      2.4±0.00ms        ? ?/sec    1.00   1965.3±8.71µs        ? ?/sec
lexsort_to_indices([i32, i32_list, str(16)]): 4096                                                               1.19    235.8±0.60µs        ? ?/sec    1.00    197.9±0.67µs        ? ?/sec
lexsort_to_indices([i32, i32_opt]): 32768                                                                        1.22      2.4±0.01ms        ? ?/sec    1.00  1936.8±10.20µs        ? ?/sec
lexsort_to_indices([i32, i32_opt]): 4096                                                                         1.18    235.1±2.01µs        ? ?/sec    1.00    199.2±0.28µs        ? ?/sec
lexsort_to_indices([i32, str(16)]): 32768                                                                        1.22      2.4±0.00ms        ? ?/sec    1.00  1940.9±13.40µs        ? ?/sec
lexsort_to_indices([i32, str(16)]): 4096                                                                         1.22    237.2±0.34µs        ? ?/sec    1.00    195.0±0.37µs        ? ?/sec
lexsort_to_indices([i32, str_list(4)]): 32768                                                                    1.23      2.4±0.00ms        ? ?/sec    1.00  1942.0±47.44µs        ? ?/sec
lexsort_to_indices([i32, str_list(4)]): 4096                                                                     1.21    236.1±0.31µs        ? ?/sec    1.00    194.5±0.40µs        ? ?/sec
lexsort_to_indices([i32, str_list_opt(4)]): 32768                                                                1.23      2.4±0.00ms        ? ?/sec    1.00   1938.1±4.98µs        ? ?/sec
lexsort_to_indices([i32, str_list_opt(4)]): 4096                                                                 1.22    236.3±0.70µs        ? ?/sec    1.00    194.2±0.49µs        ? ?/sec
lexsort_to_indices([i32, str_opt(16)]): 32768                                                                    1.23      2.4±0.00ms        ? ?/sec    1.00   1934.6±2.86µs        ? ?/sec
lexsort_to_indices([i32, str_opt(16)]): 4096                                                                     1.21    236.6±0.41µs        ? ?/sec    1.00    195.3±0.22µs        ? ?/sec
lexsort_to_indices([i32_list_opt, i32_opt]): 32768                                                               1.02      7.1±0.02ms        ? ?/sec    1.00      7.0±0.01ms        ? ?/sec
lexsort_to_indices([i32_list_opt, i32_opt]): 4096                                                                1.03    745.4±1.37µs        ? ?/sec    1.00    721.5±0.98µs        ? ?/sec
lexsort_to_indices([i32_opt, dict(100,str_opt(50))]): 32768                                                      1.09      3.4±0.01ms        ? ?/sec    1.00      3.1±0.00ms        ? ?/sec
lexsort_to_indices([i32_opt, dict(100,str_opt(50))]): 4096                                                       1.10    348.7±0.58µs        ? ?/sec    1.00    316.7±0.63µs        ? ?/sec
lexsort_to_indices([i32_opt, i32_list]): 32768                                                                   1.09      4.2±0.01ms        ? ?/sec    1.00      3.8±0.01ms        ? ?/sec
lexsort_to_indices([i32_opt, i32_list]): 4096                                                                    1.10    427.0±0.72µs        ? ?/sec    1.00    387.6±0.71µs        ? ?/sec
lexsort_to_indices([i32_opt, i32_list_opt, str_opt(50)]): 32768                                                  1.03      4.8±0.01ms        ? ?/sec    1.00      4.7±0.02ms        ? ?/sec
lexsort_to_indices([i32_opt, i32_list_opt, str_opt(50)]): 4096                                                   1.05    487.7±0.98µs        ? ?/sec    1.00    466.1±1.10µs        ? ?/sec
lexsort_to_indices([i32_opt, i32_list_opt]): 32768                                                               1.07      4.5±0.01ms        ? ?/sec    1.00      4.2±0.01ms        ? ?/sec
lexsort_to_indices([i32_opt, i32_list_opt]): 4096                                                                1.07    464.1±0.81µs        ? ?/sec    1.00    433.2±3.38µs        ? ?/sec
lexsort_to_indices([str_list(4), i32]): 32768                                                                    1.16     11.8±0.24ms        ? ?/sec    1.00     10.1±0.12ms        ? ?/sec
lexsort_to_indices([str_list(4), i32]): 4096                                                                     1.09    784.6±3.06µs        ? ?/sec    1.00    720.5±1.74µs        ? ?/sec
lexsort_to_indices([str_list_opt(4), i32]): 32768                                                                1.14     13.4±0.41ms        ? ?/sec    1.00     11.7±0.18ms        ? ?/sec
lexsort_to_indices([str_list_opt(4), i32]): 4096                                                                 1.07    936.8±4.48µs        ? ?/sec    1.00    875.9±4.61µs        ? ?/sec
lexsort_to_indices([str_opt(16), str(16), str_opt(16), str_opt(16), str_opt(16)]): 32768                         1.09      6.2±0.01ms        ? ?/sec    1.00      5.7±0.01ms        ? ?/sec
lexsort_to_indices([str_opt(16), str(16), str_opt(16), str_opt(16), str_opt(16)]): 4096                          1.09    602.1±1.61µs        ? ?/sec    1.00    552.3±0.94µs        ? ?/sec
lexsort_to_indices([str_opt(16), str(16)]): 32768                                                                1.10      6.2±0.02ms        ? ?/sec    1.00      5.7±0.02ms        ? ?/sec
lexsort_to_indices([str_opt(16), str(16)]): 4096                                                                 1.12    604.0±1.33µs        ? ?/sec    1.00    541.1±4.90µs        ? ?/sec
lexsort_to_indices([str_opt(16), str_opt(50), str(16)]): 32768                                                   1.09      6.3±0.02ms        ? ?/sec    1.00      5.8±0.02ms        ? ?/sec
lexsort_to_indices([str_opt(16), str_opt(50), str(16)]): 4096                                                    1.09    605.7±1.38µs        ? ?/sec    1.00    557.5±1.11µs        ? ?/sec

@alamb
Copy link
Contributor

alamb commented May 20, 2025

Benchmarks look pretty good to me 👍

@Dandandan Dandandan merged commit 0b75873 into apache:main May 20, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create version of LexicographicalComparator that compares fixed number of columns
2 participants