Skip to content

[ENH]: soft delete databases, add FinishDatabaseDeletion gRPC method to hard delete databases #4627

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 29, 2025

Conversation

codetheweb
Copy link
Contributor

@codetheweb codetheweb commented May 23, 2025

Description of changes

Databases are now always soft deleted instead of being hard deleted. Databases are hard deleted when the new FinishDatabaseDeletion method is called (the garbage collector calls this in the next PR).

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Updated existing test to cover soft delete behavior.

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

n/a

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from 738703d to af4051e Compare May 23, 2025 22:55
@codetheweb codetheweb force-pushed the feat-gc-soft-delete-database branch from b099250 to c84c9a6 Compare May 23, 2025 22:55
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from af4051e to ed65c55 Compare May 23, 2025 22:55
@codetheweb codetheweb force-pushed the feat-gc-soft-delete-database branch from c84c9a6 to add6683 Compare May 23, 2025 22:56
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from ed65c55 to 230d5ac Compare May 23, 2025 22:56
@codetheweb codetheweb force-pushed the feat-gc-soft-delete-database branch from add6683 to 42b9bff Compare May 23, 2025 22:56
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from 230d5ac to cf6858c Compare May 23, 2025 23:36
@codetheweb codetheweb force-pushed the feat-gc-soft-delete-database branch 2 times, most recently from 156ccde to 0ee676b Compare May 23, 2025 23:40
@codetheweb codetheweb changed the title [ENH]: soft delete databases, hard delete from garbage collector [ENH]: soft delete databases, add FinishDatabaseDeletion gRPC method to hard delete databases May 23, 2025
@codetheweb codetheweb force-pushed the feat-gc-soft-delete-database branch from 0ee676b to 6ff85b5 Compare May 23, 2025 23:42
return nil
})
}

func (tc *Catalog) GetAllDatabases(ctx context.Context, ts types.Timestamp) ([]*model.Database, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method was never used

@codetheweb codetheweb marked this pull request as ready for review May 23, 2025 23:43
@codetheweb codetheweb requested a review from sanketkedia May 23, 2025 23:43
Copy link
Contributor

propel-code-bot bot commented May 23, 2025

Implement Soft Delete for Databases with Hard Delete via New gRPC Method

This PR transitions database deletion in the system to a soft delete model, where databases are marked as deleted but remain in storage. A new gRPC method FinishDatabaseDeletion is introduced, allowing for controlled hard deletion (permanent removal), which will typically be invoked by the garbage collector. Associated collection deletions are also soft, and tests are updated to verify the new workflow. All major components and gRPC/protobuf definitions are updated to support these changes.

Key Changes:
• Adds an is_deleted flag to database model and transitions all deletions to soft delete (set flag, do not drop row).
• Removes existing methods and code logic for direct/hard deletes, replacing with soft delete logic.
• Introduces FinishDatabaseDeletion gRPC/proto endpoint (and implementation through all layers) to perform hard deletes on eligible databases (i.e., those soft-deleted and without collections, after a cutoff time).
• Propagates soft delete logic to database-listing and get methods (filters out deleted DBs).
• Updates and expands test cases to validate both soft and hard database deletion behaviors.
• Adjusts Rust API (chroma_types) and sysdb backend to support new deletion flow and error types.

Affected Areas:
• Go: sysdb coordinator, dao/database.go, dbmodel/database.go, collection and coordinator services
• Rust: sysdb.rs, api_types.rs
• Protobuf IDL: coordinator.proto
• Tests: tenant_database_service_test.go

Potential Impact:

Functionality: All client- and internal-facing database deletions become soft deletes; actual data persists until explicitly purged by hard delete. Deletion visibility (list/lookup) is changed: soft-deleted DBs become invisible to normal operations.

Performance: Negligible change under normal operation; possible minor overhead in list/get queries due to filtering. Hard deletion operates in batches to avoid transaction limits.

Security: Soft deleted databases remain recoverable and might be accessible via manual DB manipulation; eventual hard delete cleans them up as expected.

Scalability: Batch-oriented hard delete in FinishDatabaseDeletion supports cleaning up many entries at once without running into DB limits. Soft delete avoids immediate DB/IO cost.

Review Focus:
• Correctness and transactionality of soft and hard delete logic in both Go and Rust layers.
• Query correctness: only non-deleted databases are returned by list/get methods.
• Thoroughness of test coverage for new behavior and edge cases.
• Proto/gRPC compatibility between Go and Rust.
• Potential backward compatibility issues for existing API clients.

Testing Needed

• Run API and system tests to verify database and collection deletion workflows (deletion, attempted access, and garbage collection).
• Test edge cases where a database is soft deleted but contains soft- or hard-deleted collections.
• Validate new gRPC endpoint FinishDatabaseDeletion from client/garbage-collector perspective.
• Check list/get endpoints filter soft-deleted resources as intended.

Code Quality Assessment

go/pkg/sysdb/coordinator/table_catalog.go: Removed unused methods, updated workflow for soft delete and ensured cascading soft deletes for collections.

go/pkg/sysdb/metastore/db/dbmodel/database.go: Interface updated cleanly; model change (is_deleted) matches DB expectations.

rust/sysdb/src/sysdb.rs: New trait implementations for hard deletion; error handling for new workflows; consistent with existing structure.

rust/types/src/api_types.rs: New error types and trait implementations; maintains idiomatic usage.

idl/chromadb/proto/coordinator.proto: Backward-compatible addition of new methods/messages; comments are clear.

go/pkg/sysdb/metastore/db/dao/database.go: Refactored and extended with new methods; clear separation between soft and hard delete; code is readable and robust.

Best Practices

Database Schema:
• Uses soft delete flags instead of immediate row removal
• Batch operations for safe deletion

API Design:
• Adds new endpoints in a backward-compatible way
• Updates error handling and surface across service boundaries

Test Coverage:
• Expands tests for both normal and edge cases
• Aligned with updated business logic

Error Handling:
• Introduces specific errors/types for new workflows

Potential Issues

• Soft-deleted databases still exist at the storage layer until the hard delete runs; operators should be aware this may affect storage utilization.
• Hard deletion runs in batches (limit 1000 per call); there may be a delay between soft and hard deletion depending on how/when GC runs.
• If soft-deleted databases have lingering collections or if collection deletion fails, hard deletion is deferred.
• API clients relying on immediate delete behavior should verify/updatetheir expectations.

This summary was automatically generated by @propel-code-bot

@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from cf6858c to 4baa298 Compare May 27, 2025 17:22
@codetheweb codetheweb force-pushed the feat-gc-soft-delete-database branch from 6ff85b5 to c375ff5 Compare May 27, 2025 17:22
Copy link
Contributor

@Sicheng-Pan Sicheng-Pan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from 4baa298 to 114728f Compare May 27, 2025 23:03
@codetheweb codetheweb force-pushed the feat-gc-soft-delete-database branch from c375ff5 to 5dccc28 Compare May 27, 2025 23:04
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from 114728f to 24d7092 Compare May 27, 2025 23:13
@codetheweb codetheweb force-pushed the feat-gc-soft-delete-database branch from 5dccc28 to 4b46394 Compare May 27, 2025 23:13
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from 24d7092 to e949e65 Compare May 28, 2025 00:17
@codetheweb codetheweb force-pushed the feat-gc-soft-delete-database branch from 4b46394 to 8bfdba2 Compare May 28, 2025 00:17
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from e949e65 to 0da7dfa Compare May 28, 2025 19:43
@codetheweb codetheweb force-pushed the feat-gc-soft-delete-database branch from 8bfdba2 to 005c7a9 Compare May 28, 2025 19:43
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from 0da7dfa to 7de8859 Compare May 28, 2025 21:14
@codetheweb codetheweb force-pushed the feat-gc-soft-delete-database branch from 005c7a9 to cd722cd Compare May 28, 2025 21:14
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch 2 times, most recently from 719d31b to cdc60db Compare May 28, 2025 22:09
@codetheweb codetheweb force-pushed the feat-gc-soft-delete-database branch from cd722cd to 30b2b88 Compare May 28, 2025 22:09
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from cdc60db to e167434 Compare May 29, 2025 18:50
@codetheweb codetheweb force-pushed the feat-gc-soft-delete-database branch from 30b2b88 to a897895 Compare May 29, 2025 18:50
@codetheweb codetheweb force-pushed the feat-gc-hard-delete-collection branch from e167434 to 3f81298 Compare May 29, 2025 21:12
@codetheweb codetheweb force-pushed the feat-gc-soft-delete-database branch 2 times, most recently from 89363e7 to 5e896ee Compare May 29, 2025 21:43
@codetheweb codetheweb changed the base branch from feat-gc-hard-delete-collection to graphite-base/4627 May 29, 2025 22:38
@codetheweb codetheweb force-pushed the feat-gc-soft-delete-database branch from 5e896ee to 05cf039 Compare May 29, 2025 22:38
@codetheweb codetheweb force-pushed the graphite-base/4627 branch from 3f81298 to cdf5e60 Compare May 29, 2025 22:38
@codetheweb codetheweb changed the base branch from graphite-base/4627 to main May 29, 2025 22:38
@codetheweb codetheweb force-pushed the feat-gc-soft-delete-database branch from 05cf039 to 52d3755 Compare May 29, 2025 22:39
@codetheweb codetheweb merged commit 771fd96 into main May 29, 2025
72 checks passed
Copy link
Contributor Author

Merge activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants