Skip to content

Update tpch, clickbench, sort_tpch to mark failed queries #16182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ding-young
Copy link
Contributor

@ding-young ding-young commented May 24, 2025

Which issue does this PR close?

Rationale for this change

To track whether certain queries fail on limited memory, we need to continue executing benchmark queries even when previous one fails. Also, it would be better to mark & visualize the failure with compare.py.

What changes are included in this PR?

  • tpch, clickbench, sort_tpch executes next query when current query fails on any iteration. (It will not execute the next iteration of failed query)
  • In terminal, Error message is printed as [], and it will print the ids of failed queries.
  • New field success added to output json, updated compare.py to compare the elapsed time of successful results, not failed ones.

Are these changes tested?

Since it's benchmark suite, I tested it manually.

// Sample output in terminal
Query 18 failed: Resources exhausted: Additional allocation failed with top memory consumers (across reservations) as:
  HashJoinInput[1]#1239(can spill: false) consumed 9.9 MB,
  HashJoinInput[3]#1246(can spill: false) consumed 9.7 MB,
  HashJoinInput[2]#1251(can spill: false) consumed 9.6 MB,
  HashJoinInput[0]#1235(can spill: false) consumed 9.2 MB,
  HashJoinInput[0]#1266(can spill: false) consumed 2.6 MB.
Error: Failed to allocate additional 665.3 KB for HashJoinInput[0] with 9.2 MB already allocated for this reservation - 278.5 KB remain available for the total pool
Query 19 iteration 0 took 584.6 ms and returned 1 rows
Query 19 iteration 1 took 546.7 ms and returned 1 rows
Query 19 avg time: 565.66 ms
Query 20 iteration 0 took 587.3 ms and returned 186 rows
Query 20 iteration 1 took 551.7 ms and returned 186 rows
Query 20 avg time: 569.48 ms
Query 21 iteration 0 took 784.4 ms and returned 411 rows
Query 21 iteration 1 took 773.0 ms and returned 411 rows
Query 21 avg time: 778.71 ms
Query 22 iteration 0 took 295.4 ms and returned 7 rows
Query 22 iteration 1 took 299.7 ms and returned 7 rows
Query 22 avg time: 297.55 ms
Failed Queries: 4, 7, 9, 10, 13, 16, 18
// compare.py 
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  results ┃  results ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 409.33ms │ 433.54ms │  1.06x slower │
│ QQuery 2     │ 343.53ms │ 323.37ms │ +1.06x faster │
│ QQuery 3     │ 499.94ms │ 491.93ms │     no change │
│ QQuery 4     │     FAIL │ 350.74ms │  incomparable │
│ QQuery 5     │ 529.90ms │ 542.79ms │     no change │
│ QQuery 6     │ 242.10ms │ 253.38ms │     no change │
│ QQuery 7     │     FAIL │ 528.23ms │  incomparable │
│ QQuery 8     │ 634.55ms │ 638.16ms │     no change │
│ QQuery 9     │     FAIL │ 689.83ms │  incomparable │
│ QQuery 10    │     FAIL │ 601.86ms │  incomparable │
│ QQuery 11    │ 205.05ms │ 205.76ms │     no change │
│ QQuery 12    │ 421.38ms │ 418.29ms │     no change │
│ QQuery 13    │     FAIL │ 418.92ms │  incomparable │
│ QQuery 14    │ 346.10ms │ 345.50ms │     no change │
│ QQuery 15    │ 452.58ms │ 448.31ms │     no change │
│ QQuery 16    │     FAIL │ 256.87ms │  incomparable │
│ QQuery 17    │ 762.77ms │ 627.49ms │ +1.22x faster │
│ QQuery 18    │     FAIL │ 830.38ms │  incomparable │
│ QQuery 19    │ 546.70ms │ 534.35ms │     no change │
│ QQuery 20    │ 551.65ms │ 465.91ms │ +1.18x faster │
│ QQuery 21    │ 772.99ms │ 789.56ms │     no change │
│ QQuery 22    │ 295.41ms │ 301.68ms │     no change │
└──────────────┴──────────┴──────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary      ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (results)   │ 7013.98ms │
│ Total Time (results)   │ 6820.03ms │
│ Average Time (results) │  467.60ms │
│ Average Time (results) │  454.67ms │
│ Queries Faster         │         3 │
│ Queries Slower         │         1 │
│ Queries with No Change │        11 │
│ Queries with Failure   │         7 │
└────────────────────────┴───────────┘

Are there any user-facing changes?

@ding-young ding-young force-pushed the memory-limit-bench branch 2 times, most recently from 2a9d7a8 to e484c52 Compare May 29, 2025 14:39
@ding-young ding-young marked this pull request as ready for review May 29, 2025 14:51
@2010YOUY01 2010YOUY01 requested a review from Copilot May 31, 2025 04:49
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

The PR updates several benchmark modules to mark failed queries and continue executing subsequent queries when a failure occurs. Key changes include:

  • Adding a new "success" field to track query failure status.
  • Updating benchmark execution in tpch, sort_tpch, imdb, and clickbench to mark failures instead of aborting.
  • Modifying compare.py to exclude failed queries from average time calculations and reporting the list of failed queries.

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
benchmarks/src/util/run.rs Added "success" field and mark_failed method to track query results.
benchmarks/src/util/mod.rs Re-exported QueryResult to support new benchmark result structure.
benchmarks/src/tpch/run.rs Updated query execution loop to handle failures and record failed query IDs.
benchmarks/src/sort_tpch.rs Adjusted benchmark loop to mark failed queries and print failure details.
benchmarks/src/imdb/run.rs Applied similar failure handling as in tpch and sort_tpch modules.
benchmarks/src/clickbench.rs Refactored benchmark_query logic to include failure handling and iterations abstraction.
benchmarks/compare.py Updated average time calculations and display to exclude failed queries.

Copy link
Contributor

@2010YOUY01 2010YOUY01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this is quite helpful for us to work towards robust memory-limited execution. We need this feature since now many queries are failing under memory limit, if they can stably finish in the future we can consider making it a CI test.

I tested tpch/clickbench/sort_tpch, and first two can run as expected, sort_tpch benchmark seems need a fix in the runner 🤔

Benchmark commands:

cargo run --profile release-nonlto --bin tpch -- benchmark datafusion --iterations 5 --path /Users/yongting/Code/datafusion/benchmarks/data/tpch_sf10 --prefer_hash_join true --format parquet -o /Users/yongting/Code/datafusion/benchmarks/results/pr-16182/tpch_sf1.json --memory-limit 1G

cargo run --profile release-nonlto --bin dfbench -- clickbench --iterations 5 --path /Users/yongting/Code/datafusion/benchmarks/data/hits.parquet --queries-path /Users/yongting/Code/datafusion/benchmarks/queries/clickbench/queries.sql -o /Users/yongting/Code/datafusion/benchmarks/results/pr-16182/clickbench_1.json --memory-limit 1G

 cargo run --profile release-nonlto --bin dfbench -- sort-tpch --iterations 5 --path /Users/yongting/Code/datafusion/benchmarks/data/tpch_sf10 -o /Users/yongting/Code/datafusion/benchmarks/results/pr-16182/sort_tpch.json --memory-limit 2G

sort_tpch's error message

ongting@Mac ~/C/d/benchmarks (pr-16182 *)> cargo run --profile release-nonlto --bin dfbench -- sort-tpch --iterations 5 --path /Users/yongting/Code/datafusion/benchmarks/data/tpch_sf10 -o /Users/yongting/Code/datafusion/benchmarks/results/pr-16182/sort_tpch.json --memory-limit 2G
    Finished `release-nonlto` profile [optimized] target(s) in 0.12s
     Running `/Users/yongting/Code/datafusion/target/release-nonlto/dfbench sort-tpch --iterations 5 --path /Users/yongting/Code/datafusion/benchmarks/data/tpch_sf10 -o /Users/yongting/Code/datafusion/benchmarks/results/pr-16182/sort_tpch.json --memory-limit 2G`
Q1 iteration 0 took 1645.0 ms and returned 59986052 rows
Q1 iteration 1 took 1674.9 ms and returned 59986052 rows
Q1 iteration 2 took 1635.0 ms and returned 59986052 rows
Q1 iteration 3 took 1672.7 ms and returned 59986052 rows
Q1 iteration 4 took 1604.2 ms and returned 59986052 rows
Q1 avg time: 1646.32 ms
Q2 iteration 0 took 1282.9 ms and returned 59986052 rows
Q2 iteration 1 took 1286.9 ms and returned 59986052 rows
Q2 iteration 2 took 1313.8 ms and returned 59986052 rows
Q2 iteration 3 took 1300.1 ms and returned 59986052 rows
Q2 iteration 4 took 1290.9 ms and returned 59986052 rows
Q2 avg time: 1294.90 ms
Q3 iteration 0 took 6326.5 ms and returned 59986052 rows
Q3 iteration 1 took 6289.8 ms and returned 59986052 rows
Q3 iteration 2 took 6223.8 ms and returned 59986052 rows
Q3 iteration 3 took 6329.6 ms and returned 59986052 rows
Q3 iteration 4 took 6333.3 ms and returned 59986052 rows
Q3 avg time: 6300.60 ms
Q4 iteration 0 took 1574.2 ms and returned 59986052 rows
Q4 iteration 1 took 1578.4 ms and returned 59986052 rows
Q4 iteration 2 took 1586.4 ms and returned 59986052 rows
Q4 iteration 3 took 1608.0 ms and returned 59986052 rows
Q4 iteration 4 took 1578.5 ms and returned 59986052 rows
Q4 avg time: 1585.13 ms

thread 'main' panicked at /Users/yongting/Code/datafusion/benchmarks/src/sort_tpch.rs:312:32:
called `Result::unwrap()` on an `Err` value: ResourcesExhausted("Additional allocation failed with top memory consumers (across reservations) as:\n  ExternalSorterMerge[2]#586(can spill: false) consumed 182.5 MB,\n  ExternalSorterMerge[10]#602(can spill: false) consumed 179.3 MB,\n  ExternalSorterMerge[5]#592(can spill: false) consumed 176.1 MB,\n  ExternalSorterMerge[4]#590(can spill: false) consumed 175.3 MB,\n  ExternalSorterMerge[0]#582(can spill: false) consumed 173.0 MB.\nError: Failed to allocate additional 248.1 KB for ExternalSorterMerge[8] with 0.0 B already allocated for this reservation - 37.8 KB remain available for the total pool")

Other than that I left some minor suggestions, looking forward to your feedback!

@ding-young ding-young force-pushed the memory-limit-bench branch from e484c52 to 3d5ab62 Compare May 31, 2025 08:28
@@ -294,7 +297,7 @@ impl RunOpt {

let mut stream = execute_stream(physical_plan.clone(), state.task_ctx())?;
while let Some(batch) = stream.next().await {
row_count += batch.unwrap().num_rows();
row_count += batch?.num_rows();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced unwrap() with ? so that sort_tpch does not terminate early. Thanks for catching that !

@ding-young
Copy link
Contributor Author

Thank you for review. I’ve applied your suggestions, and I’ll run it a few more times and check the output display once again before requesting a final review. If there are any other benchmarks I didn’t touch that would benefit from similar changes, please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update tpch, clickbench, sort_tpch to mark failed queries
2 participants