Skip to content
This repository was archived by the owner on Dec 8, 2021. It is now read-only.

feat: allow re-using a database across benchmark runs #1174

Merged
merged 3 commits into from
Jan 9, 2020
Merged

feat: allow re-using a database across benchmark runs #1174

merged 3 commits into from
Jan 9, 2020

Conversation

mr-salty
Copy link
Contributor

@mr-salty mr-salty commented Jan 8, 2020

If the user specifies a non-empty --database <database_name> on the command
line, it will be re-used across runs. The database will be created and
populated if it doesn't already exist (i.e. on the first run) but it
will not be dropped at the end of the run as it normally would be. This
saves a lot of setup time if you're running the benchmark repeatedly.

Otherwise, the behavior is unchanged.


This change is Reviewable

@mr-salty mr-salty requested a review from coryan January 8, 2020 23:55
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Jan 8, 2020
@mr-salty
Copy link
Contributor Author

mr-salty commented Jan 9, 2020

@coryan I debated introducing a--reuse_database flag and having the program use a non-random name, I don't have too much of a preference either way, LMK if you feel differently.

Copy link
Contributor

@coryan coryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the feature, thanks for implementing it. The code could be a smidge cleaner, made some suggestions below, WDYT?

Reviewed 3 of 3 files at r1.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @mr-salty)


google/cloud/spanner/benchmarks/multiple_rows_cpu_benchmark.cc, line 73 at r1 (raw file):

  virtual std::string AdditionalDdlStatement() = 0;
  virtual Status SetUp(Config const& config, cs::Database const& database,
                       bool fill) = 0;

This change fills slightly odd to me, basically in all cases passing fill == false has the same effect as not calling the function... Maybe we can just not call the function?


google/cloud/spanner/benchmarks/multiple_rows_cpu_benchmark.cc, line 160 at r1 (raw file):

  auto db = create_future.get();
  if (!db) {
    if (user_specified_database &&

FYI: alternatively we could do a admin_client.GetDatabase() if user_specified_database==true and only create the database if that fails. Both are fine by me, just something for you to consider.


google/cloud/spanner/benchmarks/multiple_rows_cpu_benchmark.cc, line 1501 at r1 (raw file):

  std::string AdditionalDdlStatement() override { return {}; }
  Status SetUp(Config const&, cs::Database const&, bool fill) override {
    fill_ = fill;

I do not think you need to save the value here. The main reason to have RunAllExperiment is to smoke test all the benchmarks in the CI builds, so calling it with true all the time is good enough.


google/cloud/spanner/benchmarks/single_row_throughput_benchmark.cc, line 48 at r1 (raw file):

  virtual void SetUp(Config const& config,
                     cloud_spanner::Database const& database, bool fill) = 0;

Same comment I think fill==false really has the same effect as not calling the function.

@codecov
Copy link

codecov bot commented Jan 9, 2020

Codecov Report

Merging #1174 into master will decrease coverage by 0.02%.
The diff coverage is 66.12%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1174      +/-   ##
==========================================
- Coverage   95.71%   95.68%   -0.03%     
==========================================
  Files         175      175              
  Lines       13527    13567      +40     
==========================================
+ Hits        12947    12982      +35     
- Misses        580      585       +5
Impacted Files Coverage Δ
...ogle/cloud/spanner/benchmarks/benchmarks_config.cc 96.26% <0%> (-0.91%) ⬇️
.../spanner/benchmarks/multiple_rows_cpu_benchmark.cc 92.62% <64.86%> (-0.43%) ⬇️
...nner/benchmarks/single_row_throughput_benchmark.cc 92.73% <70.83%> (-0.8%) ⬇️
google/cloud/spanner/internal/transaction_impl.h 100% <0%> (ø) ⬆️
google/cloud/spanner/transaction.h 100% <0%> (ø) ⬆️
...anner/integration_tests/client_integration_test.cc 98.47% <0%> (ø) ⬆️
google/cloud/spanner/value.h 92.8% <0%> (+0.1%) ⬆️
google/cloud/spanner/samples/samples.cc 86.63% <0%> (+0.19%) ⬆️
google/cloud/spanner/client.cc 97.41% <0%> (+0.86%) ⬆️
...ud/spanner/integration_tests/client_stress_test.cc 83.33% <0%> (+1.75%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fbdab30...560fe62. Read the comment docs.

Copy link
Contributor Author

@mr-salty mr-salty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 3 files reviewed, 3 unresolved discussions (waiting on @coryan)


google/cloud/spanner/benchmarks/multiple_rows_cpu_benchmark.cc, line 73 at r1 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

This change fills slightly odd to me, basically in all cases passing fill == false has the same effect as not calling the function... Maybe we can just not call the function?

Done. (see comment in the other file)


google/cloud/spanner/benchmarks/multiple_rows_cpu_benchmark.cc, line 160 at r1 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

FYI: alternatively we could do a admin_client.GetDatabase() if user_specified_database==true and only create the database if that fails. Both are fine by me, just something for you to consider.

yeah, I thought about it but this seemed simpler to me.


google/cloud/spanner/benchmarks/multiple_rows_cpu_benchmark.cc, line 1501 at r1 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

I do not think you need to save the value here. The main reason to have RunAllExperiment is to smoke test all the benchmarks in the CI builds, so calling it with true all the time is good enough.

It still seems better to propagate it, just in case people use all with manual tests. I reworked it so just keep track of whether SetUp was called and call the experiment SetUp only if so.


google/cloud/spanner/benchmarks/single_row_throughput_benchmark.cc, line 48 at r1 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

Same comment I think fill==false really has the same effect as not calling the function.

Yeah, I did this file first, and didn't look close enough at what SetUp() was doing, so I passed the parameter to skip part of it, and essentially did the same thing in the multiple_rows one for "futurproofing". But, I see even in this case the code outside if (fill) is not necessary, so I will just take your suggestion and not call SetUp() at all.

If the user specifies a non-empty `--database <database_name>` on the command
line, it will be re-used across runs. The database will be created and
populated if it doesn't already exist (i.e. on the first run) but it
will not be dropped at the end of the run as it normally would be.  This
saves a lot of setup time if you're running the benchmark repeatedly.

Otherwise, the behavior is unchanged.
Copy link
Contributor Author

@mr-salty mr-salty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 3 files reviewed, 4 unresolved discussions (waiting on @coryan)


google/cloud/spanner/benchmarks/multiple_rows_cpu_benchmark.cc, line 182 at r2 (raw file):

      exit_status = EXIT_FAILURE;
    } else {
      status = experiment->Run(config, database);

oops, I completely didn't mean to put this inside the 'if' - let me fix it!

Copy link
Contributor Author

@mr-salty mr-salty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 3 files reviewed, 3 unresolved discussions (waiting on @coryan)


google/cloud/spanner/benchmarks/multiple_rows_cpu_benchmark.cc, line 182 at r2 (raw file):

Previously, mr-salty (Todd Derr) wrote…

oops, I completely didn't mean to put this inside the 'if' - let me fix it!

fixed (and tested)

Copy link
Contributor

@coryan coryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this will be a really useful thing to have.

Reviewed 1 of 2 files at r2, 1 of 1 files at r3.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@mr-salty mr-salty merged commit fd06727 into googleapis:master Jan 9, 2020
@mr-salty mr-salty deleted the bm_reuse branch January 9, 2020 22:34
@mr-salty
Copy link
Contributor Author

mr-salty commented Jan 9, 2020

yeah, it really helped me immensely when debugging that deadlock issue the other night... well, luckily (for my motivation to do this) it wasn't until later that I realized I could have used a much smaller --table_size to reduce the overhead ;) but this is still better regardless.

devjgm pushed a commit to devjgm/google-cloud-cpp that referenced this pull request May 7, 2020
…gle-cloud-cpp-spanner#1174)

* feat: allow re-using a database across benchmark runs

If the user specifies a non-empty `--database <database_name>` on the command
line, it will be re-used across runs. The database will be created and
populated if it doesn't already exist (i.e. on the first run) but it
will not be dropped at the end of the run as it normally would be.  This
saves a lot of setup time if you're running the benchmark repeatedly.

Otherwise, the behavior is unchanged.

* address review comments.

* the experiments need to run when using an existing database
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants