You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Description of changes
_Summarize the changes made by this PR._
- Improvements & Bug fixes
- Return prefix as part of `get_range` method
- New functionality
- Wire up regex evaluation process in filter operator
## Test plan
_How are these changes tested?_
- [ ] Tests pass locally with `pytest` for python, `yarn test` for js, `cargo test` for rust
## Documentation Changes
_Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the [docs section](https://github.com/chroma-core/chroma/tree/main/docs/docs.trychroma.com)?_
<!-- Summary by @propel-code-bot -->
---
**Implementing Regex Matching in Filter Operator**
This PR implements regex evaluation within the filter operator, enabling document-level regex matching. It changes blockfile reader APIs to expose prefixes as part of query results and leverages full-text indexes for more efficient regex evaluation. The implementation includes optimizations for exact pattern matching that avoids re-scanning documents when possible.
**Key Changes:**
• Add regex pattern evaluation in filter operator using `ChromaRegex` from `chroma_types`
• Update blockfile reader `APIs` to return prefixes along with keys and values
• Implement `NgramLiteralProvider` for `FullTextIndexReader` to optimize regex searches
• Add benchmark for regex matching with various patterns
• Return prefix as part of `get_range` and `get_range_stream` methods in blockstore
**Affected Areas:**
• Worker filter operator implementation
• Blockstore reader/writer ``API``
• `FullText` index implementation
• Distributed segment implementation
• Metadata segment implementation
**Potential Impact:**
**Functionality**: Adds regex matching capability for document filtering, providing users with more powerful query options
**Performance**: Optimizes regex matching by using full-text indexes for preliminary candidate selection before regex validation, improving search performance for complex patterns
**Security**: No significant security implications
**Scalability**: Efficient regex implementation should maintain good performance as document collections grow
**Review Focus:**
• Regex implementation efficiency in filter.rs
• Error handling for regex pattern compilation and matching
• ``API`` changes to return prefix in blockstore methods
• Benchmark methodology for regex performance evaluation
<details>
<summary><strong>Testing Needed</strong></summary>
• Benchmark different regex patterns against large document collections
• Test complex regex patterns with edge cases
• Verify consistent results between optimized and brute-force implementations
• Test with documents of varying sizes and content
</details>
<details>
<summary><strong>Code Quality Assessment</strong></summary>
**rust/worker/src/execution/operators/filter.rs**: Well-structured implementation with good error handling, but has a potential performance bottleneck with sequential document fetching
**rust/blockstore/src/memory/reader_writer.rs**: Multiple unwrap() calls could be replaced with expect() for better error messages
**rust/index/src/fulltext/types.rs**: Possible lifetime parameter issues in the lookup_ngram_range method
</details>
<details>
<summary><strong>Best Practices</strong></summary>
**Performance**:
• Consider parallelizing document fetches with `buffer_unordered` for concurrent I/O
**Error Handling**:
• Replace unwrap() with expect() for better error messages
• Consider proper error propagation instead of unwrapping
</details>
<details>
<summary><strong>Possible Issues</strong></summary>
• Potential panics from unwrap() calls in several places
• Sequential document fetching in filter operator might cause performance issues with large result sets
• Lifetime parameter conflicts in lookup_ngram_range method
• Fallback bitmap handling in assertions could cause test failures
</details>
---
*This summary was automatically generated by @propel-code-bot*
0 commit comments