-
Notifications
You must be signed in to change notification settings - Fork 14
feat: allow re-using a database across benchmark runs #1174
Conversation
@coryan I debated introducing a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the feature, thanks for implementing it. The code could be a smidge cleaner, made some suggestions below, WDYT?
Reviewed 3 of 3 files at r1.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @mr-salty)
google/cloud/spanner/benchmarks/multiple_rows_cpu_benchmark.cc, line 73 at r1 (raw file):
virtual std::string AdditionalDdlStatement() = 0; virtual Status SetUp(Config const& config, cs::Database const& database, bool fill) = 0;
This change fills slightly odd to me, basically in all cases passing fill == false
has the same effect as not calling the function... Maybe we can just not call the function?
google/cloud/spanner/benchmarks/multiple_rows_cpu_benchmark.cc, line 160 at r1 (raw file):
auto db = create_future.get(); if (!db) { if (user_specified_database &&
FYI: alternatively we could do a admin_client.GetDatabase()
if user_specified_database==true
and only create the database if that fails. Both are fine by me, just something for you to consider.
google/cloud/spanner/benchmarks/multiple_rows_cpu_benchmark.cc, line 1501 at r1 (raw file):
std::string AdditionalDdlStatement() override { return {}; } Status SetUp(Config const&, cs::Database const&, bool fill) override { fill_ = fill;
I do not think you need to save the value here. The main reason to have RunAllExperiment
is to smoke test all the benchmarks in the CI builds, so calling it with true
all the time is good enough.
google/cloud/spanner/benchmarks/single_row_throughput_benchmark.cc, line 48 at r1 (raw file):
virtual void SetUp(Config const& config, cloud_spanner::Database const& database, bool fill) = 0;
Same comment I think fill==false
really has the same effect as not calling the function.
Codecov Report
@@ Coverage Diff @@
## master #1174 +/- ##
==========================================
- Coverage 95.71% 95.68% -0.03%
==========================================
Files 175 175
Lines 13527 13567 +40
==========================================
+ Hits 12947 12982 +35
- Misses 580 585 +5
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 3 files reviewed, 3 unresolved discussions (waiting on @coryan)
google/cloud/spanner/benchmarks/multiple_rows_cpu_benchmark.cc, line 73 at r1 (raw file):
Previously, coryan (Carlos O'Ryan) wrote…
This change fills slightly odd to me, basically in all cases passing
fill == false
has the same effect as not calling the function... Maybe we can just not call the function?
Done. (see comment in the other file)
google/cloud/spanner/benchmarks/multiple_rows_cpu_benchmark.cc, line 160 at r1 (raw file):
Previously, coryan (Carlos O'Ryan) wrote…
FYI: alternatively we could do a
admin_client.GetDatabase()
ifuser_specified_database==true
and only create the database if that fails. Both are fine by me, just something for you to consider.
yeah, I thought about it but this seemed simpler to me.
google/cloud/spanner/benchmarks/multiple_rows_cpu_benchmark.cc, line 1501 at r1 (raw file):
Previously, coryan (Carlos O'Ryan) wrote…
I do not think you need to save the value here. The main reason to have
RunAllExperiment
is to smoke test all the benchmarks in the CI builds, so calling it withtrue
all the time is good enough.
It still seems better to propagate it, just in case people use all
with manual tests. I reworked it so just keep track of whether SetUp
was called and call the experiment SetUp
only if so.
google/cloud/spanner/benchmarks/single_row_throughput_benchmark.cc, line 48 at r1 (raw file):
Previously, coryan (Carlos O'Ryan) wrote…
Same comment I think
fill==false
really has the same effect as not calling the function.
Yeah, I did this file first, and didn't look close enough at what SetUp()
was doing, so I passed the parameter to skip part of it, and essentially did the same thing in the multiple_rows one for "futurproofing". But, I see even in this case the code outside if (fill)
is not necessary, so I will just take your suggestion and not call SetUp()
at all.
If the user specifies a non-empty `--database <database_name>` on the command line, it will be re-used across runs. The database will be created and populated if it doesn't already exist (i.e. on the first run) but it will not be dropped at the end of the run as it normally would be. This saves a lot of setup time if you're running the benchmark repeatedly. Otherwise, the behavior is unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 3 files reviewed, 4 unresolved discussions (waiting on @coryan)
google/cloud/spanner/benchmarks/multiple_rows_cpu_benchmark.cc, line 182 at r2 (raw file):
exit_status = EXIT_FAILURE; } else { status = experiment->Run(config, database);
oops, I completely didn't mean to put this inside the 'if' - let me fix it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 3 files reviewed, 3 unresolved discussions (waiting on @coryan)
google/cloud/spanner/benchmarks/multiple_rows_cpu_benchmark.cc, line 182 at r2 (raw file):
Previously, mr-salty (Todd Derr) wrote…
oops, I completely didn't mean to put this inside the 'if' - let me fix it!
fixed (and tested)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this will be a really useful thing to have.
Reviewed 1 of 2 files at r2, 1 of 1 files at r3.
Reviewable status:complete! all files reviewed, all discussions resolved
yeah, it really helped me immensely when debugging that deadlock issue the other night... well, luckily (for my motivation to do this) it wasn't until later that I realized I could have used a much smaller |
…gle-cloud-cpp-spanner#1174) * feat: allow re-using a database across benchmark runs If the user specifies a non-empty `--database <database_name>` on the command line, it will be re-used across runs. The database will be created and populated if it doesn't already exist (i.e. on the first run) but it will not be dropped at the end of the run as it normally would be. This saves a lot of setup time if you're running the benchmark repeatedly. Otherwise, the behavior is unchanged. * address review comments. * the experiments need to run when using an existing database
If the user specifies a non-empty
--database <database_name>
on the commandline, it will be re-used across runs. The database will be created and
populated if it doesn't already exist (i.e. on the first run) but it
will not be dropped at the end of the run as it normally would be. This
saves a lot of setup time if you're running the benchmark repeatedly.
Otherwise, the behavior is unchanged.
This change is