Skip to content
This repository was archived by the owner on Dec 8, 2021. It is now read-only.

feat: add Managed Backup feature #1358

Merged
merged 17 commits into from
Mar 12, 2020
Merged

feat: add Managed Backup feature #1358

merged 17 commits into from
Mar 12, 2020

Conversation

tmatsuo
Copy link
Contributor

@tmatsuo tmatsuo commented Mar 11, 2020

This is a squashed commit for Spanner Managed backup feature. The code is once reviewed internally.


This change is Reviewable

@tmatsuo tmatsuo requested a review from coryan March 11, 2020 20:45
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Mar 11, 2020
@codecov
Copy link

codecov bot commented Mar 11, 2020

Codecov Report

Merging #1358 into master will decrease coverage by 0.15%.
The diff coverage is 92.9%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1358      +/-   ##
==========================================
- Coverage   94.72%   94.57%   -0.16%     
==========================================
  Files         178      185       +7     
  Lines       13813    15133    +1320     
==========================================
+ Hits        13085    14312    +1227     
- Misses        728      821      +93
Impacted Files Coverage Δ
...le/cloud/spanner/internal/database_admin_logging.h 100% <ø> (ø) ⬆️
...e/cloud/spanner/internal/database_admin_metadata.h 100% <ø> (ø) ⬆️
google/cloud/spanner/database_admin_connection.h 100% <ø> (ø)
google/cloud/spanner/database_admin_client.h 100% <ø> (ø) ⬆️
google/cloud/spanner/internal/time_utils_test.cc 100% <100%> (ø)
google/cloud/spanner/internal/time_utils.h 100% <100%> (ø)
...oud/spanner/mocks/mock_database_admin_connection.h 94.11% <100%> (+5.22%) ⬆️
...ud/spanner/internal/database_admin_logging_test.cc 100% <100%> (ø) ⬆️
google/cloud/spanner/backup.cc 100% <100%> (ø)
...e/cloud/spanner/testing/mock_database_admin_stub.h 100% <100%> (ø) ⬆️
... and 39 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4252adc...32ba4e1. Read the comment docs.

@tmatsuo
Copy link
Contributor Author

tmatsuo commented Mar 11, 2020

The code coverage is low probably because I didn't add unit tests for database_admin_client. Adding them now.

// Tests for Backup are taking long time. To run them, set
// RUN_SLOW_INTEGRATION_TESTS environment variable to yes.
if (run_slow_integration_tests == "yes") {
auto backup_future = client.CreateBackup(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want to check if these need a !emulator guard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Added the guard.

google::cloud::spanner_testing::RandomBackupName(generator);

std::cout << "\nRunning spanner_create_backup sample\n";
CreateBackup(database_admin_client, project_id, instance_id, database_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want to check if these need a !emulator guard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, added too

Copy link
Contributor

@coryan coryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack on the code coverage.

I know this code was reviewed before, but I found a number of documentation nits and some opportunities for refactoring in the samples.

Did you notice the UBSAN build failure? Worthy of further research I would think.

The API checks have a false positive, but also a real API breakage. We should consider using non-pure virtuals for the new functions in DatabaseAdminConnection /cc: @devjgm FYI

Reviewed 28 of 28 files at r1.
Reviewable status: all files reviewed, 26 unresolved discussions (waiting on @tmatsuo)


google/cloud/spanner/backup.h, line 31 at r1 (raw file):

 * This class identifies a Cloud Spanner Backup
 *
 * A Cloud Spanner backup is identified by `Instance` and `backup_id`.

maybe: "by an Instance and a backup_id.


google/cloud/spanner/database_admin_client.h, line 216 at r1 (raw file):

  /**
   * Create a new database by restoring from a completed backup. The new

Please move the long description to its own paragraph:

/**
 * Create a new database by .... backup.
 *
 * The new database must be ...

google/cloud/spanner/database_admin_client.h, line 230 at r1 (raw file):

  /**
   * Create a new database by restoring from a completed backup. The new

As above, split the description.


google/cloud/spanner/database_admin_client.h, line 355 at r1 (raw file):

   * belong to the `[a-z0-9_-]` character set.
   *
   * The expire_time must be at least 6 hours and at most 366 days from the time

nit: @p expire_time


google/cloud/spanner/database_admin_client.h, line 356 at r1 (raw file):

   *
   * The expire_time must be at least 6 hours and at most 366 days from the time
   * the CreateBackup request is processed.

nit: backticks around CreateBackup, and if you use CreateBackup() (with the parenthesis) Doxygen with auto link.


google/cloud/spanner/database_admin_client.h, line 371 at r1 (raw file):

   * Retrieve metadata information about a Backup.
   *
   * @par Idempotency

We did not describe the idempotency of the previous functions, and we should, please fix.


google/cloud/spanner/database_admin_client.h, line 382 at r1 (raw file):

  /**
   * Deletes a pending or completed Backup.

Please describe the idempotency of this operation.


google/cloud/spanner/database_admin_client.h, line 405 at r1 (raw file):

   * @param filter A filter expression that filters backups listed in the
   *  response. See [this
   * documentation](https://cloud.google.com/spanner/docs/reference/rpc/google.spanner.admin.database.v1#google.spanner.admin.database.v1.ListBackupsRequest)

consider moving that link to a link definition (I think those work)


google/cloud/spanner/database_admin_client.h, line 414 at r1 (raw file):

  /**
   * Update backup's expire_time.

nit: @p expire_time so it renders correctly and links to the argument.


google/cloud/spanner/database_admin_client.h, line 459 at r1 (raw file):

   */
  ListBackupOperationsRange ListBackupOperations(Instance in,
                                                 std::string filter = "");

nit: maybe std::string filter = {} but I think the compiler optimizes both to calling the default constructor.


google/cloud/spanner/database_admin_connection.h, line 50 at r1 (raw file):

/**
 * An input range to stream all the backup operations in a Cloud Spanner

I think "all the backup operations" is not completely accurate as you can provide filters? Maybe "stream backup operations in Cloud Spanner", the "a Cloud Spanner" is also odd;


google/cloud/spanner/database_admin_connection.h, line 65 at r1 (raw file):

/**
 * An input range to stream all the database operations in a Cloud Spanner

Same comments as above.


google/cloud/spanner/database_admin_connection.h, line 184 at r1 (raw file):

    std::chrono::system_clock::time_point expire_time;
  };
  /// Wrap the arguments for `GetBackup()`.

nit: insert blank line


google/cloud/spanner/database_admin_connection.h, line 189 at r1 (raw file):

    std::string backup_full_name;
  };
  /// Wrap the arguments for `DeleteBackup()`.

nit: insert blank line


google/cloud/spanner/database_admin_connection.h, line 194 at r1 (raw file):

    std::string backup_full_name;
  };
  /// Wrap the arguments for `ListBackups()`.

nit: insert blank line


google/cloud/spanner/database_admin_connection.h, line 200 at r1 (raw file):

    std::string filter;
  };
  struct RestoreDatabaseParams {

nit: insert blank line


google/cloud/spanner/database_admin_connection.h, line 200 at r1 (raw file):

    std::string filter;
  };
  struct RestoreDatabaseParams {

Also, missing comment for this class.


google/cloud/spanner/database_admin_connection.h, line 206 at r1 (raw file):

    std::string backup_full_name;
  };
  /// Wrap the arguments for `UpdateBackup()`.

nit: insert blank line


google/cloud/spanner/database_admin_connection.h, line 210 at r1 (raw file):

    google::spanner::admin::database::v1::UpdateBackupRequest request;
  };
  /// Wrap the arguments for `ListBackupOperations()`.

nit: insert blank line


google/cloud/spanner/database_admin_connection.h, line 216 at r1 (raw file):

    std::string filter;
  };
  /// Wrap the arguments for `ListDatabaseOperations()`.

nit: insert blank line


google/cloud/spanner/database_admin_connection.cc, line 162 at r1 (raw file):

    gcsa::ListDatabasesRequest request;
    request.set_parent(p.instance.FullName());
    request.clear_page_token();

This change seems unrelated?


google/cloud/spanner/internal/time_utils.h, line 36 at r1 (raw file):

  if (!ts) {
    return Status(StatusCode::kInvalidArgument,
                  "Given time_point is too extreme");

nit: this path is never tested, is ts a StatusOr<T>? If so, maybe we can just return its status?


google/cloud/spanner/samples/samples.cc, line 748 at r1 (raw file):

// [END spanner_cancel_backup_create]

void CreateBackupAndCancelCommand(std::vector<std::string> const& argv) {

FYI: I think you have enough of these functions that take the same " " parameters that you could refactor them and save some code. Up to you.


google/cloud/spanner/samples/samples.cc, line 775 at r1 (raw file):

//! [list-backups] [END spanner_list_backups]

void ListBackupsCommand(std::vector<std::string> const& argv) {

I think you can use the make_database_command_entry helper for this one.


google/cloud/spanner/samples/samples.cc, line 806 at r1 (raw file):

//! [list-backup-operations] [END spanner_list_backup_operations]

void ListBackupOperationsCommand(std::vector<std::string> const& argv) {

I think you can use the make_database_command_entry helper for this one.

Copy link
Contributor Author

@tmatsuo tmatsuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I addressed the review comments, and also added unit tests for the new functions. PTAL

Reviewable status: 20 of 30 files reviewed, 14 unresolved discussions (waiting on @coryan and @devbww)


google/cloud/spanner/backup.h, line 31 at r1 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

maybe: "by an Instance and a backup_id.

Done.


google/cloud/spanner/database_admin_client.h, line 216 at r1 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

Please move the long description to its own paragraph:

/**
 * Create a new database by .... backup.
 *
 * The new database must be ...

Done.


google/cloud/spanner/database_admin_client.h, line 230 at r1 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

As above, split the description.

Done.


google/cloud/spanner/database_admin_client.h, line 371 at r1 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

We did not describe the idempotency of the previous functions, and we should, please fix.

Done.


google/cloud/spanner/database_admin_client.h, line 382 at r1 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

Please describe the idempotency of this operation.

Done.


google/cloud/spanner/database_admin_client.h, line 405 at r1 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

consider moving that link to a link definition (I think those work)

Done.


google/cloud/spanner/database_admin_connection.h, line 50 at r1 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

I think "all the backup operations" is not completely accurate as you can provide filters? Maybe "stream backup operations in Cloud Spanner", the "a Cloud Spanner" is also odd;

Done.


google/cloud/spanner/database_admin_connection.h, line 65 at r1 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

Same comments as above.

Done.


google/cloud/spanner/database_admin_connection.h, line 200 at r1 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

Also, missing comment for this class.

Done.


google/cloud/spanner/database_admin_connection.cc, line 162 at r1 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

This change seems unrelated?

Reverted


google/cloud/spanner/integration_tests/database_admin_integration_test.cc, line 175 at r1 (raw file):

Previously, devbww (Bradley White) wrote…

You probably want to check if these need a !emulator guard.

Added the guard


google/cloud/spanner/internal/time_utils.h, line 36 at r1 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

nit: this path is never tested, is ts a StatusOr<T>? If so, maybe we can just return its status?

Done


google/cloud/spanner/samples/samples.cc, line 775 at r1 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

I think you can use the make_database_command_entry helper for this one.

The number of arguments is unfortunately different.


google/cloud/spanner/samples/samples.cc, line 806 at r1 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

I think you can use the make_database_command_entry helper for this one.

Done.


google/cloud/spanner/samples/samples.cc, line 2581 at r1 (raw file):

Previously, devbww (Bradley White) wrote…

You probably want to check if these need a !emulator guard.

Added the guard

@tmatsuo
Copy link
Contributor Author

tmatsuo commented Mar 12, 2020

Oops, DatabaseAdminClientTest.CreateBackupCancel is not fixed yet. I may have misunderstood why it's failing.

Copy link
Contributor

@coryan coryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some test failures that are kind of flaky, can you take a look before we merge? The code looks good otherwise.

Did we decide about the API breakage? Fixing the baseline is necessary, but we also may need to change the documentation for this bug and/or change the new member functions *Connection to not be pure virtual.

Reviewed 10 of 10 files at r2.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @devbww and @tmatsuo)


google/cloud/spanner/database_admin_connection.h, line 282 at r2 (raw file):

  /// RPC.
  virtual future<StatusOr<google::spanner::admin::database::v1::Backup>>
      CreateBackup(CreateBackupParams) = 0;

To avoid the API breakage we can should make this function (and all the other new member functions in this class) virtual, but provide an implementation in the .cc file that returns Status(StatusCode::kUnimplemented, "not implemented"). We do not need to do that for classes in the internal namespace.


google/cloud/spanner/database_admin_connection_test.cc, line 710 at r2 (raw file):

        return make_status_or(op);
      });
  EXPECT_CALL(*mock, CancelOperation(_, _))

I think this test has a race condition, this mock is never called if GetOperation() is called and returns "quickly"... see below for an idea on how to fix it.


google/cloud/spanner/database_admin_connection_test.cc, line 717 at r2 (raw file):

        return google::cloud::Status();
      });
  EXPECT_CALL(*mock, GetOperation(_, _))

Change the code here to be something like:

  promise<void> release_get_operation;
  EXPECT_CALL(*mock, GetOperation(_, _))
      .WillOnce([&release_get_operation](grpc::ClientContext&,
                         google::longrunning::GetOperationRequest const& r) {
        EXPECT_EQ("test-operation-name", r.name());
        release_get_operation.get_future().get();
        google::longrunning::Operation op;
        op.set_name(r.name());
        op.set_done(false);                    // note that we will ask for at least one more call...
        gcsa::Backup backup;
        backup.set_name("test-backup");
        op.mutable_response()->PackFrom(backup);
        return make_status_or(op);
      })
      .WillOnce([&release_get_operation](grpc::ClientContext&,
                         google::longrunning::GetOperationRequest const& r) {
        EXPECT_EQ("test-operation-name", r.name());
        release_get_operation.get_future().get();
        google::longrunning::Operation op;
        op.set_name(r.name());
        op.set_done(true);                    // and succeed the second time
        gcsa::Backup backup;
        backup.set_name("test-backup");
        op.mutable_response()->PackFrom(backup);
        return make_status_or(op);
      })
;

that will stop this function from returning until you call release_get_operation.set_value(), which we can do after cancel() is called...


google/cloud/spanner/database_admin_connection_test.cc, line 733 at r2 (raw file):

  Database dbase("test-project", "test-instance", "test-db");
  auto fut = conn->CreateBackup({dbase, "test-backup", {}});
  fut.cancel();

So this becomes:

fut.cancel();
release_get_operation.set_value();

google/cloud/spanner/internal/time_utils.h, line 35 at r2 (raw file):

  auto const ts = MakeTimestamp(time_point);
  if (!ts) {
    return ts.status();

consider: if (!ts) return std::move(ts).status(); to avoid a copy and shorter code.

Copy link
Contributor Author

@tmatsuo tmatsuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I changed the new functions to virtual (not pure virtual) functions. I think I also fixed the flaky test. PTAL

Reviewable status: 25 of 30 files reviewed, 7 unresolved discussions (waiting on @coryan and @devbww)


google/cloud/spanner/database_admin_connection.h, line 282 at r2 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

To avoid the API breakage we can should make this function (and all the other new member functions in this class) virtual, but provide an implementation in the .cc file that returns Status(StatusCode::kUnimplemented, "not implemented"). We do not need to do that for classes in the internal namespace.

Done.


google/cloud/spanner/database_admin_connection_test.cc, line 710 at r2 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

I think this test has a race condition, this mock is never called if GetOperation() is called and returns "quickly"... see below for an idea on how to fix it.

Yeah, I also came up with using promise :)


google/cloud/spanner/database_admin_connection_test.cc, line 717 at r2 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

Change the code here to be something like:

  promise<void> release_get_operation;
  EXPECT_CALL(*mock, GetOperation(_, _))
      .WillOnce([&release_get_operation](grpc::ClientContext&,
                         google::longrunning::GetOperationRequest const& r) {
        EXPECT_EQ("test-operation-name", r.name());
        release_get_operation.get_future().get();
        google::longrunning::Operation op;
        op.set_name(r.name());
        op.set_done(false);                    // note that we will ask for at least one more call...
        gcsa::Backup backup;
        backup.set_name("test-backup");
        op.mutable_response()->PackFrom(backup);
        return make_status_or(op);
      })
      .WillOnce([&release_get_operation](grpc::ClientContext&,
                         google::longrunning::GetOperationRequest const& r) {
        EXPECT_EQ("test-operation-name", r.name());
        release_get_operation.get_future().get();
        google::longrunning::Operation op;
        op.set_name(r.name());
        op.set_done(true);                    // and succeed the second time
        gcsa::Backup backup;
        backup.set_name("test-backup");
        op.mutable_response()->PackFrom(backup);
        return make_status_or(op);
      })
;

that will stop this function from returning until you call release_get_operation.set_value(), which we can do after cancel() is called...

Done.


google/cloud/spanner/database_admin_connection_test.cc, line 733 at r2 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

So this becomes:

fut.cancel();
release_get_operation.set_value();

Done.


google/cloud/spanner/internal/time_utils.h, line 35 at r2 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

consider: if (!ts) return std::move(ts).status(); to avoid a copy and shorter code.

Done.

@tmatsuo
Copy link
Contributor Author

tmatsuo commented Mar 12, 2020

It's weird, but with a longer maximum delay, the test became stable.

@tmatsuo
Copy link
Contributor Author

tmatsuo commented Mar 12, 2020

Hmm I can not reproduce the failure with RUNS_PER_TEST=100, but apparently this doesn't solve the issue on kokoro.
Now I'm trying to explicitly verify the expectations on the mock. This version runs green 100 times locally.

@tmatsuo
Copy link
Contributor Author

tmatsuo commented Mar 12, 2020

I ended up trying to suppress the leak.

Copy link
Contributor

@coryan coryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 5 files at r3, 1 of 1 files at r5.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @devbww)


google/cloud/spanner/database_admin_connection_test.cc, line 704 at r5 (raw file):

  auto mock = std::make_shared<MockDatabaseAdminStub>();
  // Suppress a false leak.
  Mock::AllowLeak(mock.get());

FYI: I know what is going on... the leak is because the thread created by client is still running at the end of this test function, and has a reference to mock. Suppressing the leak is probably the best you can do. You may want to add a TODO() for #127 because that would remove the need for this.

@devjgm
Copy link
Contributor

devjgm commented Mar 12, 2020

The API checks have a false positive, but also a real API breakage. We should consider using non-pure virtuals for the new functions in DatabaseAdminConnection /cc: @devjgm FYI

I'm fine w/ the use of non-pure virtual functions as you have them now.

That said, I'll offer an opinion that I think it's a shame that all the other methods are pure virtual and these new ones are not. That leaves a seemingly arbitrary difference in the code that may be confusing to readers. Why are some methods pure and others not? Yes, the reason is to avoid breaking folks who subclass the connection, but really, I think that's a usecase that we'd rather not even support.

I think the only reason we have virtual methods at all is to enable gmock. And in that case, we provide our own mock subclass that users should use. And if users are using our mock connection, then the addition of a new pure-virtual method in the connection base class wouldn't break anyone.

So, one possible alternative is that we could document our "connection" interfaces saying that they are NOT intended for users to create their own arbitrary subclasses. They exist for the purpose of using gmock, and callers should use our provided gmock subclasses in their own tests. This will avoid them breaking as we add new methods.

@coryan
Copy link
Contributor

coryan commented Mar 12, 2020

The API checks have a false positive, but also a real API breakage. We should consider using non-pure virtuals for the new functions in DatabaseAdminConnection /cc: @devjgm FYI

I'm fine w/ the use of non-pure virtual functions as you have them now.

I think we should move this discussion to a bug or feature request, as we both seem to be Okay with the code as-is.

That said, I'll offer an opinion that I think it's a shame that all the other methods are pure virtual and these new ones are not.

Agree.

That leaves a seemingly arbitrary difference in the code that may be confusing to readers. Why are some methods pure and others not? Yes, the reason is to avoid breaking folks who subclass the connection, but really, I think that's a usecase that we'd rather not even support.

I think the only reason we have virtual methods at all is to enable gmock.

Or any other mocking tool presumably? Applications may use FakeIt:

https://github.com/eranpeer/FakeIt

or FSeam:

http://freeyoursoul.online/fseam-a-mocking-framework-that-requires-no-change-in-code-part-2/

or something else we do not know about. They may have also decorated a Connection to integrate with OpenCensus, or OpenTelemetry (I have a hunch this is the perfect place to inject spans).

All of which is to say: we do not know how things are used, and I think there are legitimate uses for inheriting from this classes, a blanket prohibition seems less than ideal.

Copy link
Contributor

@coryan coryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r6.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @devbww)

@tmatsuo tmatsuo merged commit 88adb64 into googleapis:master Mar 12, 2020
@tmatsuo tmatsuo deleted the backup branch March 12, 2020 17:58
devjgm pushed a commit to devjgm/google-cloud-cpp that referenced this pull request May 7, 2020
…#1358)

* feat: add Managed Backup feature

* testing: update the ABI dump

* Made the new functions virtual functions (not pure virtual)

* explicitly set done(false) on the operation

* explicitely verify the expectations on the mock

* suppress a false leak
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants