Skip to content

[ENH] Bootstrap a wal3 log from existing content. #4560

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 19, 2025

Conversation

rescrv
Copy link
Contributor

@rescrv rescrv commented May 15, 2025

Description of changes

This adds a "bootstrap" call to wal3 that copies the data semi-atomically. It is reasoned to be
safe to do concurrent with all other log operations, but may leave a FragmentSeqNo(1) lying around
in the event that a log is initialized at the same time it's bootstrapping. This should never
happen in our system so thankfully we can just leave the hole.

See #4558 for the reasoning.

Test plan

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Documentation Changes

N/A

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@rescrv rescrv force-pushed the rescrv/wal3-bootstrap-impl branch from 51295b2 to cad8641 Compare May 16, 2025 19:44
@rescrv rescrv requested review from codetheweb and sanketkedia May 16, 2025 22:00
@rescrv rescrv marked this pull request as ready for review May 16, 2025 22:00
Copy link
Contributor

propel-code-bot bot commented May 16, 2025

Add Bootstrap Functionality to wal3 Log from Existing Content

This PR introduces a new bootstrap method to the wal3 log writer, enabling initialization of a new log from an existing set of records. The function provides a semi-atomic process to copy data into a new log, with checks to avoid bootstrapping over already-initialized or in-use logs, and is designed to be idempotent and safe for recovery if partially completed. Two integration tests are added to validate both populated and empty bootstrap scenarios.

Key Changes:
• Implemented LogWriter::bootstrap async method for initializing a log from supplied messages/offset.
• Updated fragment upload logic to avoid creating empty fragments.
• Adjusted internal error handling around atomic file creation and manifest updates during bootstrap.
• Added two integration tests: one for bootstrapping a log with records and one for empty bootstraps.
• Minor update: Derived Default for Limits struct in reader module.

Affected Areas:
• rust/wal3/src/writer.rs
• rust/wal3/src/reader.rs
• rust/wal3/tests/test_k8s_integration_83_bootstrap.rs
• rust/wal3/tests/test_k8s_integration_84_bootstrap_empty.rs

Potential Impact:

Functionality: Enables initializing Rust-based logs from arbitrary existing data, facilitating system migrations and recovery scenarios.

Performance: No significant runtime path changes; only affects manual/admin/bootstrap operations.

Security: No sensitive information exposed, but as with all initialization routines, care must be taken to avoid race conditions.

Scalability: No impact on log scalability or concurrency in normal operation; bootstrap is a one-time admin operation.

Review Focus:
• Correctness and atomicity of the bootstrap process.
• Error handling and rollback on partial failure.
• Test coverage for edge cases, such as empty log or concurrent initialization.

Testing Needed

• Run the new integration tests to verify both non-empty and empty bootstrapping.
• Exercise concurrent initialization (if feasible) to observe contention/error handling.
• Manually test idempotency by re-running bootstrap under failure scenarios.

Code Quality Assessment

rust/wal3/src/writer.rs: Well-reasoned, clear inline documentation, explains invariants and known tradeoffs.

rust/wal3/src/reader.rs: Minor improvement with Default derivation; otherwise unchanged.

rust/wal3/tests/test_k8s_integration_83_bootstrap.rs: Good scenario and data validation.

rust/wal3/tests/test_k8s_integration_84_bootstrap_empty.rs: Test coverage for empty bootstraps.

Best Practices

Modularization:
• New API surface (bootstrap) is clearly separated

Atomicity:
• Partial failure recovery considered, idempotency noted

Testing:
• Relevant integration tests for new code

Documentation:
• Clear doc-comments and inline reasoning

Error Handling:
• Explicit error paths, safe to re-run on failure

Potential Issues

• Very rare edge case where a 'hole' (FragmentSeqNo(1)) is left if initialization and bootstrap race, but this is predicated not to occur in expected workflows.
• If bootstrap is used outside migration/one-shot admin scenarios, it could subvert concurrency guarantees.
• Any changes to upload_parquet or manifest error handling in the future should re-examine bootstrap robustness.

This summary was automatically generated by @propel-code-bot

rescrv added 2 commits May 19, 2025 09:41
The space of cases I have to handle:

panic(BenignRace, AlreadyInitialized, Uninitialized) -> cannot double-initialize manifest
panic(BenignRace, AlreadyInitialized, Failure) -> cannot double-initialize manifest
panic(BenignRace, AlreadyInitialized, Success) -> cannot double-initialize manifest
panic(Success, AlreadyInitialized, Uninitialized) -> cannot double-initialize manifest
panic(Success, AlreadyInitialized, Failure) -> cannot double-initialize manifest
panic(Success, AlreadyInitialized, Success) -> cannot double-initialize manifest

panic(BenignRace, Uninitialized, Failure) -> failed to install recovered manifest
panic(BenignRace, Success, Failure) -> failed to install recovered manifest
panic(Success, Uninitialized, Failure) -> failed to install recovered manifest
panic(Success, Success, Failure) -> failed to install recovered manifest

panic(BenignRace, Uninitialized, Uninitialized) -> cannot have manifest become uninitialized
panic(BenignRace, Success, Uninitialized) -> cannot have manifest become uninitialized
panic(Success, Uninitialized, Uninitialized) -> cannot have manifest become uninitialized
panic(Success, Success, Uninitialized) -> cannot have manifest become uninitialized

Committing so I don't lose it.
/// and I'd prefer to manually inspect failures than get the automation right to do it always
/// automatically. Bootstrap is intended only to last as long as there is a migration from the
/// go to the rust log services.
pub async fn bootstrap<D: MarkDirty>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be good to have a basic test for this

@rescrv rescrv force-pushed the rescrv/wal3-bootstrap branch from cd14dcb to 307051a Compare May 19, 2025 16:42
@rescrv rescrv force-pushed the rescrv/wal3-bootstrap-impl branch from fc8b17c to dbb2ec3 Compare May 19, 2025 16:42
Base automatically changed from rescrv/wal3-bootstrap to main May 19, 2025 22:06
@rescrv rescrv merged commit df23ae7 into main May 19, 2025
70 checks passed
@rescrv rescrv deleted the rescrv/wal3-bootstrap-impl branch May 19, 2025 23:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants