-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Update tpch, clickbench, sort_tpch to mark failed queries #16182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
2a9d7a8
to
e484c52
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
The PR updates several benchmark modules to mark failed queries and continue executing subsequent queries when a failure occurs. Key changes include:
- Adding a new "success" field to track query failure status.
- Updating benchmark execution in tpch, sort_tpch, imdb, and clickbench to mark failures instead of aborting.
- Modifying compare.py to exclude failed queries from average time calculations and reporting the list of failed queries.
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
benchmarks/src/util/run.rs | Added "success" field and mark_failed method to track query results. |
benchmarks/src/util/mod.rs | Re-exported QueryResult to support new benchmark result structure. |
benchmarks/src/tpch/run.rs | Updated query execution loop to handle failures and record failed query IDs. |
benchmarks/src/sort_tpch.rs | Adjusted benchmark loop to mark failed queries and print failure details. |
benchmarks/src/imdb/run.rs | Applied similar failure handling as in tpch and sort_tpch modules. |
benchmarks/src/clickbench.rs | Refactored benchmark_query logic to include failure handling and iterations abstraction. |
benchmarks/compare.py | Updated average time calculations and display to exclude failed queries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this is quite helpful for us to work towards robust memory-limited execution. We need this feature since now many queries are failing under memory limit, if they can stably finish in the future we can consider making it a CI test.
I tested tpch/clickbench/sort_tpch, and first two can run as expected, sort_tpch
benchmark seems need a fix in the runner 🤔
Benchmark commands:
cargo run --profile release-nonlto --bin tpch -- benchmark datafusion --iterations 5 --path /Users/yongting/Code/datafusion/benchmarks/data/tpch_sf10 --prefer_hash_join true --format parquet -o /Users/yongting/Code/datafusion/benchmarks/results/pr-16182/tpch_sf1.json --memory-limit 1G
cargo run --profile release-nonlto --bin dfbench -- clickbench --iterations 5 --path /Users/yongting/Code/datafusion/benchmarks/data/hits.parquet --queries-path /Users/yongting/Code/datafusion/benchmarks/queries/clickbench/queries.sql -o /Users/yongting/Code/datafusion/benchmarks/results/pr-16182/clickbench_1.json --memory-limit 1G
cargo run --profile release-nonlto --bin dfbench -- sort-tpch --iterations 5 --path /Users/yongting/Code/datafusion/benchmarks/data/tpch_sf10 -o /Users/yongting/Code/datafusion/benchmarks/results/pr-16182/sort_tpch.json --memory-limit 2G
sort_tpch
's error message
ongting@Mac ~/C/d/benchmarks (pr-16182 *)> cargo run --profile release-nonlto --bin dfbench -- sort-tpch --iterations 5 --path /Users/yongting/Code/datafusion/benchmarks/data/tpch_sf10 -o /Users/yongting/Code/datafusion/benchmarks/results/pr-16182/sort_tpch.json --memory-limit 2G
Finished `release-nonlto` profile [optimized] target(s) in 0.12s
Running `/Users/yongting/Code/datafusion/target/release-nonlto/dfbench sort-tpch --iterations 5 --path /Users/yongting/Code/datafusion/benchmarks/data/tpch_sf10 -o /Users/yongting/Code/datafusion/benchmarks/results/pr-16182/sort_tpch.json --memory-limit 2G`
Q1 iteration 0 took 1645.0 ms and returned 59986052 rows
Q1 iteration 1 took 1674.9 ms and returned 59986052 rows
Q1 iteration 2 took 1635.0 ms and returned 59986052 rows
Q1 iteration 3 took 1672.7 ms and returned 59986052 rows
Q1 iteration 4 took 1604.2 ms and returned 59986052 rows
Q1 avg time: 1646.32 ms
Q2 iteration 0 took 1282.9 ms and returned 59986052 rows
Q2 iteration 1 took 1286.9 ms and returned 59986052 rows
Q2 iteration 2 took 1313.8 ms and returned 59986052 rows
Q2 iteration 3 took 1300.1 ms and returned 59986052 rows
Q2 iteration 4 took 1290.9 ms and returned 59986052 rows
Q2 avg time: 1294.90 ms
Q3 iteration 0 took 6326.5 ms and returned 59986052 rows
Q3 iteration 1 took 6289.8 ms and returned 59986052 rows
Q3 iteration 2 took 6223.8 ms and returned 59986052 rows
Q3 iteration 3 took 6329.6 ms and returned 59986052 rows
Q3 iteration 4 took 6333.3 ms and returned 59986052 rows
Q3 avg time: 6300.60 ms
Q4 iteration 0 took 1574.2 ms and returned 59986052 rows
Q4 iteration 1 took 1578.4 ms and returned 59986052 rows
Q4 iteration 2 took 1586.4 ms and returned 59986052 rows
Q4 iteration 3 took 1608.0 ms and returned 59986052 rows
Q4 iteration 4 took 1578.5 ms and returned 59986052 rows
Q4 avg time: 1585.13 ms
thread 'main' panicked at /Users/yongting/Code/datafusion/benchmarks/src/sort_tpch.rs:312:32:
called `Result::unwrap()` on an `Err` value: ResourcesExhausted("Additional allocation failed with top memory consumers (across reservations) as:\n ExternalSorterMerge[2]#586(can spill: false) consumed 182.5 MB,\n ExternalSorterMerge[10]#602(can spill: false) consumed 179.3 MB,\n ExternalSorterMerge[5]#592(can spill: false) consumed 176.1 MB,\n ExternalSorterMerge[4]#590(can spill: false) consumed 175.3 MB,\n ExternalSorterMerge[0]#582(can spill: false) consumed 173.0 MB.\nError: Failed to allocate additional 248.1 KB for ExternalSorterMerge[8] with 0.0 B already allocated for this reservation - 37.8 KB remain available for the total pool")
Other than that I left some minor suggestions, looking forward to your feedback!
e484c52
to
3d5ab62
Compare
@@ -294,7 +297,7 @@ impl RunOpt { | |||
|
|||
let mut stream = execute_stream(physical_plan.clone(), state.task_ctx())?; | |||
while let Some(batch) = stream.next().await { | |||
row_count += batch.unwrap().num_rows(); | |||
row_count += batch?.num_rows(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced unwrap()
with ?
so that sort_tpch does not terminate early. Thanks for catching that !
Thank you for review. I’ve applied your suggestions, and I’ll run it a few more times and check the output display once again before requesting a final review. If there are any other benchmarks I didn’t touch that would benefit from similar changes, please let me know. |
Which issue does this PR close?
Rationale for this change
To track whether certain queries fail on limited memory, we need to continue executing benchmark queries even when previous one fails. Also, it would be better to mark & visualize the failure with
compare.py
.What changes are included in this PR?
tpch
,clickbench
,sort_tpch
executes next query when current query fails on any iteration. (It will not execute the next iteration of failed query)success
added to output json, updatedcompare.py
to compare the elapsed time of successful results, not failed ones.Are these changes tested?
Since it's benchmark suite, I tested it manually.
Are there any user-facing changes?