Skip to content

[ENH] Clean up client manager into manager/assigner - make log client use it #4640

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 29, 2025

Conversation

HammadB
Copy link
Collaborator

@HammadB HammadB commented May 28, 2025

Description of changes

Summarize the changes made by this PR.

  • Remove set_system pattern in lieu of passing system into try_from_config - aligning with the common. pattern since we touched this
  • Refactor client manager into client manager + assigner thats generic over its clients
  • Use this in the old call site and in the log, the log uses assignment for write path, and reads all for read path

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Documentation Changes

None

Copy link
Collaborator Author

HammadB commented May 28, 2025

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

Copy link

Please tag your PR title with one of: [ENH | BUG | DOC | TST | BLD | PERF | TYP | CLN | CHORE]. See https://docs.trychroma.com/contributing#contributing-code-and-ideas

1 similar comment
Copy link

Please tag your PR title with one of: [ENH | BUG | DOC | TST | BLD | PERF | TYP | CLN | CHORE]. See https://docs.trychroma.com/contributing#contributing-code-and-ideas

@@ -69,7 +69,7 @@ tokio = { version = "1.41", features = ["fs", "macros", "rt-multi-thread"] }
tokio-util = "0.7.12"
tonic = "0.12"
tonic-health = "0.12.3"
tower = "0.5"
tower = { version = "0.4.13", features = ["discover"] }
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some users of this were on an older version and depend on it, keep that version for now.

@HammadB HammadB changed the title pro [ENH] Clean up client manager into manager/assigner - make log client use it May 28, 2025
let client_manager = ClientManager::new(
node_name_to_client.clone(),
client_assigner.clone(),
config.connections_per_node,
Copy link
Collaborator Author

@HammadB HammadB May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will move these other config into the ClientOptions

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will address in another PR

.collection_id
.to_string(),
)
.map_err(|e| ExecutorError::Internal(e.boxed()))?;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forced to box this error due to a cyclic dep since ExecutorError is in types and the crate that defines this error depends on types

let client_assigner = ClientAssigner::new(assignment_policy, 1);
let client_manager = ClientManager::new(
client_assigner.clone(),
1,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replication factor is always one for log

} else {
None

// Alt client manager setup
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be made a bit cleaner, will leave that for another pr since the other use already did it this way

memberlist_provider.subscribe(client_manager_handle.receiver());
let _memberlist_provider_handle = system.start_component(memberlist_provider);

// let alt_client = if let Some(alt_host) = my_config.alt_host.as_ref() {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove

.unwrap_or_default()
.drain(..)
.next()
// fallback to the standard client if no alt client is available,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a temporary choice, when we remove alt client we can remove this

}
}
tracing::info!("using standard client for {collection_id}");
&mut self.client
self.client.clone()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grpc client clones are cheap

},
)
.await
let mut combined_response = Vec::new();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ensure this is correct

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct to me. I read it twice over.

@HammadB HammadB marked this pull request as ready for review May 28, 2025 03:21
Copy link
Contributor

Major Refactor: Client Manager Abstraction, Memberlist Integration, & System Hydration Cleanup

This PR significantly refactors and centralizes client (gRPC) management logic for both the distributed executor and log components by introducing a generic, reusable ClientManager and ClientAssigner abstraction within the chroma-memberlist crate. The PR eliminates legacy patterns such as set_system, streamlines system hydration across components, and migrates all systems that relied on ad hoc or redundant client management (including log and executor) to use the new abstractions. It also adapts interfaces and configs to support the new generic lifecycle and dependency requirements, and updates configuration to facilitate easier swapping of assignment policies and memberlist providers. Support and wiring for System everywhere is modernized, and integration test and config changes reflect all these improvements.

Key Changes:
• Introduced a new, generic ClientManager<T> and ClientAssigner<T> system in chroma-memberlist for type-safe management/assignment of gRPC clients.
• Refactored the distributed executor, log client, and test logic to rely on the new managed client system instead of direct references or custom node-to-client maps.
• Made client management generic over client type, allowing both log and query services to use the same infra.
• Eliminated legacy set_system/Option<System> patterns in favor of constructor injection everywhere.
• Adapted configs, especially GrpcLogConfig, to hold assignment and memberlist provider settings (with sensible defaults).
• Upgraded dependency handling/tower versioning and workspace usage.
• Removed legacy client_manager module from frontend (now in memberlist crate).
• Reworked and expanded tests to work with the new manager/assigner.
• Updated Cargo.toml/Cargo.lock as appropriate for new dependencies and organization.

Affected Areas:
• chroma-memberlist (new client_manager)
• chroma-frontend (executor/distributed)
• chroma-log (grpc_log, config, lib)
• chroma-worker (compactor/compaction_manager, server, lib, config)
• types/api_types.rs (ExecutorError simplification)
• Cargo.toml, Cargo.lock (dependency updates)
• Various component and config wiring across log, worker, frontend

Potential Impact:

Functionality: Should improve reliability and flexibility in rolling out new node membership, assignment policies, and dynamic membership scenarios for both log and query services. Removes old system hydration footguns.

Performance: No expected regression; centralization could improve efficiency, but extra indirection may have minor effect. gRPC client clone noted as cheap.

Security: No direct security changes, but more robust system wiring and centralized error paths could reduce some edge error handling flaws.

Scalability: Enables more robust scaling with clean assignment and memberlist policy swapping. Generalizes client replica management and flexibility.

Review Focus:
• Correctness of the client manager/assigner generic abstractions: race/locking, type safety.
• All sites now using new manager; no direct NodeNameToClient usages left.
• Construction (hydration) of all major components (worker, frontend, log, compactor) -- especially System injection consistency.
• Backward compatibility: configuration defaults, assignment/memberlist config override points.
• Cargo.toml and lockfile workspace sanity, no duplicate tower/tonic/futures versions creeping in.
• Log clients and distributed executor: assignment and fallback logic (no client found, alt client selection).

Testing Needed

• Ensure that all tests pass and that membership changes propagate correctly to clients in log and executor paths.
• Test assignment policy swaps, especially with non-default configs.
• Run distributed queries and log operations (write and read paths) across node churn, multi-node clusters.
• Regression test for system startup/shutdown and graceful teardown (signal handling etc.).
• Test client fallback paths for missing alt log clients (log client assignment logic).

Code Quality Assessment

rust/memberlist/src/client_manager.rs: High complexity, but clearly structured; heavy use of generics and trait bounds is correctly applied.

rust/log/src/grpc_log.rs: Now cleaner with manager/assigner usage; alt client handing more robust but could use future cleanup.

rust/frontend/src/executor/distributed.rs: Greatly simplified, clearer separation of assignment vs. concrete GRPC client handling.

rust/worker/src/compactor/compaction_manager.rs: Simplified system wiring, less boilerplate and set_system/set_dispatcher hacks.

Cargo.toml/Cargo.lock: Workspace improvements and dependency pinning all look correct for new crate structure.

Best Practices

Dependency Injection:
• Moves all hydration to constructor style, eliminates Options in stateful core types

Error Handling:
• Boxing errors where cyclic deps exist; error paths are clear and most new APIs are Result-returning

Testing:
• Extensive test updates, including multi-member handling, churn, and unit tests for new manager/assigner

Configuration:
• Config structs now use sensible defaults and allow flexible overrides

Abstraction:
• Extracts reusable generic patterns (ClientManager, ClientAssigner)
• Reduces code duplication and centralizes relevant ownership

Potential Issues

• If any user relies on direct low-level node_name-to-client manipulation, those code paths will now be broken.
• Some logging and assignment-fallback code is intentionally temporary (marked as such) and will need cleanup in a future pass.
• Any custom deployment scenario with non-default memberlist/assignment policies will need to validate the config path now passes through correctly.
• Backward compatibility with older versions of tower in downstream consumers could require followup (see review comment).
• Some APIs now require (Config, System) tuples vs. just Config -- upgrading is required across the board.

This summary was automatically generated by @propel-code-bot

fn new_from_channel(channel: GrpcTraceService<Channel>) -> Self {
QueryExecutorClient::new(channel)
}
fn max_decoding_message_size(self, max_size: usize) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised this works with a trait, but I guess it works only on concrete types.

},
)
.await
let mut combined_response = Vec::new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct to me. I read it twice over.

@HammadB HammadB enabled auto-merge (squash) May 29, 2025 00:09
@HammadB HammadB merged commit 258f7d6 into main May 29, 2025
72 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants