Skip to content

[ENH] Disallow empty string ids during add #4488

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 8, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions rust/frontend/src/impls/utils.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ use chroma_types::{
pub(crate) enum ToRecordsError {
#[error("Inconsistent number of IDs, embeddings, documents, URIs and metadatas")]
InconsistentLength,
#[error("Empty ID, ID must have at least one character")]
EmptyId,
}

impl ChromaError for ToRecordsError {
Expand Down Expand Up @@ -47,6 +49,9 @@ pub(crate) fn to_records<
let mut records = Vec::with_capacity(len);

for id in ids {
if id.is_empty() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems inconsistent with the rest of our code where we use #[validate]

return Err(ToRecordsError::EmptyId);
}
let embedding = embeddings_iter.next().flatten();
let document = documents_iter.next().flatten();
let uri = uris_iter.next().flatten();
Expand Down Expand Up @@ -87,3 +92,59 @@ pub(crate) fn to_records<

Ok((records, total_bytes))
}

#[cfg(test)]
mod tests {
use chroma_types::Operation;

use super::*;

#[test]
fn test_to_records_empty_id() {
let ids = vec![String::from("")];
let embeddings = vec![Some(vec![1.0, 2.0, 3.0])];
let result = to_records::<
chroma_types::UpdateMetadataValue,
Vec<(String, chroma_types::UpdateMetadataValue)>,
>(ids, Some(embeddings), None, None, None, Operation::Add);
assert!(matches!(result, Err(ToRecordsError::EmptyId)));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[TestCoverage]

You should add a test case for IDs containing valid non-empty values (including whitespace or unusual Unicode), to ensure is_empty() is the correct check and not overly restrictive. This will prevent regressions if the ID requirements change.


#[test]
fn test_normal_ids() {
let ids = vec![String::from("1"), String::from("2"), String::from("3")];
let embeddings = vec![
Some(vec![1.0, 2.0, 3.0]),
Some(vec![4.0, 5.0, 6.0]),
Some(vec![7.0, 8.0, 9.0]),
];
let documents = vec![
Some(String::from("document 1")),
Some(String::from("document 2")),
Some(String::from("document 3")),
];
let result = to_records::<
chroma_types::UpdateMetadataValue,
Vec<(String, chroma_types::UpdateMetadataValue)>,
>(
ids,
Some(embeddings),
Some(documents),
None,
None,
Operation::Add,
);
assert!(result.is_ok());
let records = result.unwrap().0;
assert_eq!(records.len(), 3);
assert_eq!(records[0].id, "1");
assert_eq!(records[1].id, "2");
assert_eq!(records[2].id, "3");
assert_eq!(records[0].embedding, Some(vec![1.0, 2.0, 3.0]));
assert_eq!(records[1].embedding, Some(vec![4.0, 5.0, 6.0]));
assert_eq!(records[2].embedding, Some(vec![7.0, 8.0, 9.0]));
assert_eq!(records[0].document, Some(String::from("document 1")));
assert_eq!(records[1].document, Some(String::from("document 2")));
assert_eq!(records[2].document, Some(String::from("document 3")));
}
}
Loading