Skip to content

[ENH] Implement literal expression for regex #4421

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

Sicheng-Pan
Copy link
Contributor

@Sicheng-Pan Sicheng-Pan commented May 1, 2025

Description of changes

Summarize the changes made by this PR.

  • Improvements & Bug fixes
    • N/A
  • New functionality
    • Implement custom internal representations for regular expression. They will be used with full text search index to filter documents.

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

Copy link

github-actions bot commented May 1, 2025

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

Copy link
Contributor Author

Sicheng-Pan commented May 1, 2025

@Sicheng-Pan Sicheng-Pan force-pushed the sicheng/05-01-_enh_implement_literal_expression_for_regex branch from a9a95c8 to af29420 Compare May 2, 2025 00:09
@Sicheng-Pan Sicheng-Pan force-pushed the sicheng/05-01-_enh_implement_literal_expression_for_regex branch from 8872466 to 3675914 Compare May 5, 2025 18:32
@Sicheng-Pan Sicheng-Pan force-pushed the sicheng/05-01-_enh_implement_literal_expression_for_regex branch from 9593947 to df27c54 Compare May 6, 2025 18:25
@Sicheng-Pan Sicheng-Pan mentioned this pull request May 6, 2025
1 task
@Sicheng-Pan Sicheng-Pan marked this pull request as ready for review May 7, 2025 00:20
@Sicheng-Pan Sicheng-Pan mentioned this pull request May 8, 2025
1 task
Copy link
Contributor

Implementation of Literal Expression for Regular Expressions

This PR adds custom internal representations for regular expressions in the Rust codebase to be used with full text search for document filtering. The implementation includes a ChromaHir enum that creates a simplified representation of regex patterns, along with validation logic to ensure patterns are specific enough for efficient filtering.

Key Changes:
• Added ChromaHir struct to represent regex patterns in a custom, simplified format
• Implemented LiteralExpr to extract literal expressions from regex patterns
• Added validation to ensure regex patterns contain at least one 3-character literal string
• Integrated regex validation in the document filtering system

Affected Areas:
• rust/types/src/regex/ (new module with 3 new files)
• rust/types/src/where_parsing.rs
• rust/types/src/lib.rs

This summary was automatically generated by @propel-code-bot

Copy link
Contributor Author

Sicheng-Pan commented May 9, 2025

Merge activity

  • May 9, 12:59 PM EDT: A user started a stack merge that includes this pull request via Graphite.
  • May 9, 12:59 PM EDT: @Sicheng-Pan merged this pull request with Graphite.

@Sicheng-Pan Sicheng-Pan merged commit 8ef2d2c into main May 9, 2025
71 checks passed
itaismith pushed a commit that referenced this pull request May 23, 2025
## Description of changes

_Summarize the changes made by this PR._

- Improvements & Bug fixes
  - N/A
- New functionality
  - Implement custom internal representations for regular expression. They will be used with full text search index to filter documents.

## Test plan

_How are these changes tested?_

- [ ] Tests pass locally with `pytest` for python, `yarn test` for js, `cargo test` for rust

## Documentation Changes

_Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the [docs section](https://github.com/chroma-core/chroma/tree/main/docs/docs.trychroma.com)?_
@Sicheng-Pan Sicheng-Pan deleted the sicheng/05-01-_enh_implement_literal_expression_for_regex branch May 28, 2025 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants