Skip to content

DOCS: partners/chroma: Fix documentation around chroma query filter syntax #31058

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

hesreallyhim
Copy link
Contributor

Thank you for contributing to LangChain!

  • PR title: "package: description"
    • Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes.
    • Example: "community: add foobar LLM"

Description:

  • Starting to put together some PR's to fix the typing around langchain-chroma filter and where_document query filtering, as mentioned:

#30879
#30507

The typing of dict[str, str] is on the one hand too restrictive (marks valid filter expressions as ill-typed) and also too permissive (allows illegal filter expressions). That's not what this PR addresses though. This PR just removes from the documentation some examples of filters that are illegal, and also syntactically incorrect: (a) dictionaries with keys like $contains but the key is missing quotation marks; (b) dictionaries with multiple entries - this is illegal in Chroma filter syntax and will raise an exception. ({"foo": "bar", "qux": "baz"}). Filter dictionaries in Chroma must have one and one key only. Again this is just the documentation issue, which is the lowest hanging fruit. I also think we need to update the types for filter and where_document to be (at the very least dict[str, Any]), or, since we have access to Chroma's types, they should be Where and WhereDocument types. This has a wider blast radius though, so I'm starting small.

This PR does not fix the issues mentioned above, it's just starting to get the ball rolling, and cleaning up the documentation.

Additional guidelines:

  • Make sure optional dependencies are imported within a function.
  • Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests.
  • Most PRs should not touch more than one package.
  • Changes should be backwards compatible.
  • If you are adding something to community, do not re-import it in langchain.

If no one reviews your PR within a few days, please @-mention one of baskaryan, eyurtsev, ccurme, vbarda, hwchase17.

Copy link

vercel bot commented Apr 28, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Apr 28, 2025 5:28pm

@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. Ɑ: vector store Related to vector store module 🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder labels Apr 28, 2025
@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Apr 30, 2025
@ccurme ccurme merged commit 918c950 into langchain-ai:master Apr 30, 2025
20 checks passed
@hesreallyhim
Copy link
Contributor Author

Thanks @ccurme .... I was going to follow up on this by changing the types for the chroma filter from dict[str, str] to the actual types, which are available through chromadb (Where and WhereDocument) because dict[str, str] is really not accurate.... what I found though, is that my IDE still couldn't validate whether a filter expression was correct or not, because Where is a nested/recursive type. One alternative is to change the type to dict[str, Any], since it seems to me like it's better to be overly permissive than overly restrictive in this case, but at the end of the day I think it's not possible to give them the right type without other strategies of validation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder lgtm PR looks good. Use to confirm that a PR is ready for merging. size:M This PR changes 30-99 lines, ignoring generated files. Ɑ: vector store Related to vector store module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants