Skip to content

Feature: Expose clustering API in Rust #611

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 3 tasks
henryjandrews opened this issue May 28, 2025 · 0 comments
Open
2 of 3 tasks

Feature: Expose clustering API in Rust #611

henryjandrews opened this issue May 28, 2025 · 0 comments
Labels
enhancement New feature or request

Comments

@henryjandrews
Copy link

Describe what you are looking for

Title: Feature Request: Expose Clustering API in Rust Bindings

Is your feature request related to a problem? Please describe.
Currently, the Rust bindings for usearch provide excellent access to core functionalities like index creation, adding items, and performing nearest neighbor searches. However, a direct, high-level API for the clustering capabilities that are present in the Python and C++ interfaces (e.g., index.cluster() in Python or the cluster() function with index_dense_clustering_config_t in C++) doesn't seem to be explicitly exposed in the Rust bindings.

Our use case involves grouping a large number of items based on their embedding similarity. While we can (and currently do) build an index and then perform iterative searches from seed items to form clusters, having direct access to usearch's internal, optimized clustering algorithms would be highly beneficial. This would potentially simplify our implementation and leverage usearch's performance for the clustering step itself.

Describe the solution you'd like
We would like to request the usearch developers to consider exposing a clustering API within the Rust bindings. This could look similar to:

  • A method on the Index struct, perhaps index.cluster(&ClusterConfig) or similar.
  • The ClusterConfig struct would allow specifying parameters analogous to those available in C++/Python, such as:
    • Minimum/maximum number of clusters (or min/max members per cluster).
    • Number of threads for clustering.
    • Configuration for how centroids are determined or how merging happens (e.g., merge_smallest_k from C++).
  • The API should return a data structure that allows easy access to:
    • Cluster assignments for each item (or keys of items belonging to each cluster).
    • Centroid keys/vectors for each cluster.
    • Cluster sizes.
    • Optionally, support for sub-clustering or accessing the cluster graph (similar to clustering.network in Python).

Describe alternatives you've considered

  1. Manual Clustering via Iterative Search (Current Approach): We build an index per pre-defined metadata group and then perform iterative search() calls from seed items. We then apply our own similarity thresholds and logic to form fine-grained groups. This works, but a built-in usearch clustering method could be more optimized and reduce boilerplate code on our end.
  2. Using FFI to call C++ clustering functions: This would add significant complexity to our build process and code, and we'd prefer to use idiomatic Rust if possible.

Additional context
We are using usearch in a Rust-based server application for log analysis, where we group semantically similar log entries based on their embeddings. The performance and efficiency of usearch for ANN search are excellent, and extending this to its clustering capabilities in Rust would be a great addition.

We've seen from the documentation that Python (index.cluster()) and C++ (index_dense_clustering_config_t and the cluster function) have robust clustering features. Bringing this parity to the Rust ecosystem would be very valuable for Rust developers using usearch.

Thank you for considering this feature request!

Can you contribute to the implementation?

  • I can contribute

Is your feature request specific to a certain interface?

Other bindings

Contact Details

[email protected]

Is there an existing issue for this?

  • I have searched the existing issues

Code of Conduct

  • I agree to follow this project's Code of Conduct
@henryjandrews henryjandrews added the enhancement New feature or request label May 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant