You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Title: Feature Request: Expose Clustering API in Rust Bindings
Is your feature request related to a problem? Please describe.
Currently, the Rust bindings for usearch provide excellent access to core functionalities like index creation, adding items, and performing nearest neighbor searches. However, a direct, high-level API for the clustering capabilities that are present in the Python and C++ interfaces (e.g., index.cluster() in Python or the cluster() function with index_dense_clustering_config_t in C++) doesn't seem to be explicitly exposed in the Rust bindings.
Our use case involves grouping a large number of items based on their embedding similarity. While we can (and currently do) build an index and then perform iterative searches from seed items to form clusters, having direct access to usearch's internal, optimized clustering algorithms would be highly beneficial. This would potentially simplify our implementation and leverage usearch's performance for the clustering step itself.
Describe the solution you'd like
We would like to request the usearch developers to consider exposing a clustering API within the Rust bindings. This could look similar to:
A method on the Index struct, perhaps index.cluster(&ClusterConfig) or similar.
The ClusterConfig struct would allow specifying parameters analogous to those available in C++/Python, such as:
Minimum/maximum number of clusters (or min/max members per cluster).
Number of threads for clustering.
Configuration for how centroids are determined or how merging happens (e.g., merge_smallest_k from C++).
The API should return a data structure that allows easy access to:
Cluster assignments for each item (or keys of items belonging to each cluster).
Centroid keys/vectors for each cluster.
Cluster sizes.
Optionally, support for sub-clustering or accessing the cluster graph (similar to clustering.network in Python).
Describe alternatives you've considered
Manual Clustering via Iterative Search (Current Approach): We build an index per pre-defined metadata group and then perform iterative search() calls from seed items. We then apply our own similarity thresholds and logic to form fine-grained groups. This works, but a built-in usearch clustering method could be more optimized and reduce boilerplate code on our end.
Using FFI to call C++ clustering functions: This would add significant complexity to our build process and code, and we'd prefer to use idiomatic Rust if possible.
Additional context
We are using usearch in a Rust-based server application for log analysis, where we group semantically similar log entries based on their embeddings. The performance and efficiency of usearch for ANN search are excellent, and extending this to its clustering capabilities in Rust would be a great addition.
We've seen from the documentation that Python (index.cluster()) and C++ (index_dense_clustering_config_t and the cluster function) have robust clustering features. Bringing this parity to the Rust ecosystem would be very valuable for Rust developers using usearch.
Thank you for considering this feature request!
Can you contribute to the implementation?
I can contribute
Is your feature request specific to a certain interface?
Describe what you are looking for
Title: Feature Request: Expose Clustering API in Rust Bindings
Is your feature request related to a problem? Please describe.
Currently, the Rust bindings for
usearch
provide excellent access to core functionalities like index creation, adding items, and performing nearest neighbor searches. However, a direct, high-level API for the clustering capabilities that are present in the Python and C++ interfaces (e.g.,index.cluster()
in Python or thecluster()
function withindex_dense_clustering_config_t
in C++) doesn't seem to be explicitly exposed in the Rust bindings.Our use case involves grouping a large number of items based on their embedding similarity. While we can (and currently do) build an index and then perform iterative searches from seed items to form clusters, having direct access to
usearch
's internal, optimized clustering algorithms would be highly beneficial. This would potentially simplify our implementation and leverageusearch
's performance for the clustering step itself.Describe the solution you'd like
We would like to request the
usearch
developers to consider exposing a clustering API within the Rust bindings. This could look similar to:Index
struct, perhapsindex.cluster(&ClusterConfig)
or similar.ClusterConfig
struct would allow specifying parameters analogous to those available in C++/Python, such as:merge_smallest_k
from C++).clustering.network
in Python).Describe alternatives you've considered
search()
calls from seed items. We then apply our own similarity thresholds and logic to form fine-grained groups. This works, but a built-inusearch
clustering method could be more optimized and reduce boilerplate code on our end.Additional context
We are using
usearch
in a Rust-based server application for log analysis, where we group semantically similar log entries based on their embeddings. The performance and efficiency ofusearch
for ANN search are excellent, and extending this to its clustering capabilities in Rust would be a great addition.We've seen from the documentation that Python (
index.cluster()
) and C++ (index_dense_clustering_config_t
and thecluster
function) have robust clustering features. Bringing this parity to the Rust ecosystem would be very valuable for Rust developers usingusearch
.Thank you for considering this feature request!
Can you contribute to the implementation?
Is your feature request specific to a certain interface?
Other bindings
Contact Details
[email protected]
Is there an existing issue for this?
Code of Conduct
The text was updated successfully, but these errors were encountered: