-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat: Adding Docling RAG demo #5109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
94c60d6
to
043fb6d
Compare
5ce78e6
to
b249b87
Compare
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
…t unique chunk-id Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
…ieval Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
09c73a7
to
267b740
Compare
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
shuchu
approved these changes
Apr 2, 2025
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
franciscojavierarceo
pushed a commit
that referenced
this pull request
Apr 7, 2025
# [0.48.0](v0.47.0...v0.48.0) (2025-04-07) ### Bug Fixes * Enhance integration logos display and styling in the UI ([#5221](#5221)) ([5799257](5799257)) * Fix space typo in push.md docs ([#5184](#5184)) ([81677b2](81677b2)) * Fixed integration tests for qdrant and milvus ([#5224](#5224)) ([d6b080d](d6b080d)) * Formatting trino ([760ec0e](760ec0e)) * Multiple fixes in retrieval of online documents ([#5168](#5168)) ([66ddd3e](66ddd3e)) * Operator route creation for Feast UI in OpenShift ([e3946b4](e3946b4)) * Remove entity_rows parameter from retrieve_online_documents_v2 call ([#5225](#5225)) ([2a2e304](2a2e304)) * Styling ([#5222](#5222)) ([34c393c](34c393c)) * typo in the chart ([bd3448b](bd3448b)) * Update milvus-quickstart and feature_store.yaml with correct Milvus Config ([#5200](#5200)) ([306acca](306acca)) * Update Qdrant online store paths in repo_config.py ([#5207](#5207)) ([ab35b0b](ab35b0b)), closes [#5206](#5206) * Update the doc ([#5194](#5194)) ([726464e](726464e)) * Updated the operator-rabc example to test RBAC from a Kubernete pod ([#5147](#5147)) ([d23a1a5](d23a1a5)) ### Features * add `real`(float32) type for trino offline store ([#4749](#4749)) ([0947f96](0947f96)) * Add async DynamoDB timeout and retry configuration ([#5178](#5178)) ([2f3bcf5](2f3bcf5)) * Add CronJob capability to the Operator (feast apply & materialize-incremental) ([#5217](#5217)) ([285c0dc](285c0dc)) * Add RAG tutorial and Use Cases documentation ([#5226](#5226)) ([99f4004](99f4004)) * Added CLI for features, get historical and online features ([#5197](#5197)) ([4ab9f74](4ab9f74)) * Added export support in feast UI ([#5198](#5198)) ([b079553](b079553)) * Added global registry search support in Feast UI ([#5195](#5195)) ([f09ea49](f09ea49)) * Added UI for Features list ([#5192](#5192)) ([cc7fd47](cc7fd47)) * Adding blog on RAG with Milvus ([#5161](#5161)) ([b9e2e6c](b9e2e6c)) * Adding Docling RAG demo ([#5109](#5109)) ([569404b](569404b)) * Allow transformations on writes to output list of entities ([#5209](#5209)) ([955521a](955521a)) * Cache get_any_feature_view results ([#5175](#5175)) ([924b8a3](924b8a3)) * Clickhouse offline store ([#4725](#4725)) ([86794c2](86794c2)) * Enable keyword search for Milvus ([#5199](#5199)) ([ac44967](ac44967)) * Enable transformations on PDFs ([#5172](#5172)) ([3674971](3674971)) * Enable users to use Entity Query as CTE during historical retrieval ([#5202](#5202)) ([fe69eaf](fe69eaf)) * helm support more deployment config ([d575372](d575372)) * Improved CLI file structuring ([#5201](#5201)) ([972ed34](972ed34)) * Kickoff Transformation implementationtransformation code base ([#5181](#5181)) ([0083303](0083303)) * Make keep-alive timeout configurable for async DynamoDB connections ([#5167](#5167)) ([7f3e528](7f3e528)) * Operator mounts the odh-trusted-ca-bundle configmap when deployed on RHOAI or ODH ([d4d7b0d](d4d7b0d)) * Spark Transformation ([#5185](#5185)) ([be3d85c](be3d85c))
j-wine
pushed a commit
to j-wine/feast
that referenced
this pull request
Jun 7, 2025
* feat: Adding Docling RAG demo Signed-off-by: Francisco Javier Arceo <[email protected]> * updated demo Signed-off-by: Francisco Javier Arceo <[email protected]> * cleaned up notebook Signed-off-by: Francisco Javier Arceo <[email protected]> * adding chunk id Signed-off-by: Francisco Javier Arceo <[email protected]> * adding quickstart demo that is WIP and updating docling-demo to export unique chunk-id Signed-off-by: Francisco Javier Arceo <[email protected]> * adding current tentative exmaple repo Signed-off-by: Francisco Javier Arceo <[email protected]> * adding current temporary work Signed-off-by: Francisco Javier Arceo <[email protected]> * updating demo script to rename things Signed-off-by: Francisco Javier Arceo <[email protected]> * updated quickstart Signed-off-by: Francisco Javier Arceo <[email protected]> * added comment Signed-off-by: Francisco Javier Arceo <[email protected]> * checking in progress Signed-off-by: Francisco Javier Arceo <[email protected]> * checking in progress for now, still have some issues with vector retrieval Signed-off-by: Francisco Javier Arceo <[email protected]> * okay think i have most things working Signed-off-by: Francisco Javier Arceo <[email protected]> * removing commenting and unnecessary code Signed-off-by: Francisco Javier Arceo <[email protected]> * uploading demo Signed-off-by: Francisco Javier Arceo <[email protected]> * uploading other files Signed-off-by: Francisco Javier Arceo <[email protected]> * updated repo exaxmple Signed-off-by: Francisco Javier Arceo <[email protected]> * checking in current notebook, almost there Signed-off-by: Francisco Javier Arceo <[email protected]> * fixed linter Signed-off-by: Francisco Javier Arceo <[email protected]> * fixed transformation logic: Signed-off-by: Francisco Javier Arceo <[email protected]> * removed print Signed-off-by: Francisco Javier Arceo <[email protected]> * added README with description Signed-off-by: Francisco Javier Arceo <[email protected]> * removing print Signed-off-by: Francisco Javier Arceo <[email protected]> * updating Signed-off-by: Francisco Javier Arceo <[email protected]> * updating metadata file Signed-off-by: Francisco Javier Arceo <[email protected]> * updated readme and adding dataset Signed-off-by: Francisco Javier Arceo <[email protected]> --------- Signed-off-by: Francisco Javier Arceo <[email protected]> Signed-off-by: Jacob Weinhold <[email protected]>
j-wine
pushed a commit
to j-wine/feast
that referenced
this pull request
Jun 7, 2025
# [0.48.0](feast-dev/feast@v0.47.0...v0.48.0) (2025-04-07) ### Bug Fixes * Enhance integration logos display and styling in the UI ([feast-dev#5221](feast-dev#5221)) ([5799257](feast-dev@5799257)) * Fix space typo in push.md docs ([feast-dev#5184](feast-dev#5184)) ([81677b2](feast-dev@81677b2)) * Fixed integration tests for qdrant and milvus ([feast-dev#5224](feast-dev#5224)) ([d6b080d](feast-dev@d6b080d)) * Formatting trino ([760ec0e](feast-dev@760ec0e)) * Multiple fixes in retrieval of online documents ([feast-dev#5168](feast-dev#5168)) ([66ddd3e](feast-dev@66ddd3e)) * Operator route creation for Feast UI in OpenShift ([e3946b4](feast-dev@e3946b4)) * Remove entity_rows parameter from retrieve_online_documents_v2 call ([feast-dev#5225](feast-dev#5225)) ([2a2e304](feast-dev@2a2e304)) * Styling ([feast-dev#5222](feast-dev#5222)) ([34c393c](feast-dev@34c393c)) * typo in the chart ([bd3448b](feast-dev@bd3448b)) * Update milvus-quickstart and feature_store.yaml with correct Milvus Config ([feast-dev#5200](feast-dev#5200)) ([306acca](feast-dev@306acca)) * Update Qdrant online store paths in repo_config.py ([feast-dev#5207](feast-dev#5207)) ([ab35b0b](feast-dev@ab35b0b)), closes [feast-dev#5206](feast-dev#5206) * Update the doc ([feast-dev#5194](feast-dev#5194)) ([726464e](feast-dev@726464e)) * Updated the operator-rabc example to test RBAC from a Kubernete pod ([feast-dev#5147](feast-dev#5147)) ([d23a1a5](feast-dev@d23a1a5)) ### Features * add `real`(float32) type for trino offline store ([feast-dev#4749](feast-dev#4749)) ([0947f96](feast-dev@0947f96)) * Add async DynamoDB timeout and retry configuration ([feast-dev#5178](feast-dev#5178)) ([2f3bcf5](feast-dev@2f3bcf5)) * Add CronJob capability to the Operator (feast apply & materialize-incremental) ([feast-dev#5217](feast-dev#5217)) ([285c0dc](feast-dev@285c0dc)) * Add RAG tutorial and Use Cases documentation ([feast-dev#5226](feast-dev#5226)) ([99f4004](feast-dev@99f4004)) * Added CLI for features, get historical and online features ([feast-dev#5197](feast-dev#5197)) ([4ab9f74](feast-dev@4ab9f74)) * Added export support in feast UI ([feast-dev#5198](feast-dev#5198)) ([b079553](feast-dev@b079553)) * Added global registry search support in Feast UI ([feast-dev#5195](feast-dev#5195)) ([f09ea49](feast-dev@f09ea49)) * Added UI for Features list ([feast-dev#5192](feast-dev#5192)) ([cc7fd47](feast-dev@cc7fd47)) * Adding blog on RAG with Milvus ([feast-dev#5161](feast-dev#5161)) ([b9e2e6c](feast-dev@b9e2e6c)) * Adding Docling RAG demo ([feast-dev#5109](feast-dev#5109)) ([569404b](feast-dev@569404b)) * Allow transformations on writes to output list of entities ([feast-dev#5209](feast-dev#5209)) ([955521a](feast-dev@955521a)) * Cache get_any_feature_view results ([feast-dev#5175](feast-dev#5175)) ([924b8a3](feast-dev@924b8a3)) * Clickhouse offline store ([feast-dev#4725](feast-dev#4725)) ([86794c2](feast-dev@86794c2)) * Enable keyword search for Milvus ([feast-dev#5199](feast-dev#5199)) ([ac44967](feast-dev@ac44967)) * Enable transformations on PDFs ([feast-dev#5172](feast-dev#5172)) ([3674971](feast-dev@3674971)) * Enable users to use Entity Query as CTE during historical retrieval ([feast-dev#5202](feast-dev#5202)) ([fe69eaf](feast-dev@fe69eaf)) * helm support more deployment config ([d575372](feast-dev@d575372)) * Improved CLI file structuring ([feast-dev#5201](feast-dev#5201)) ([972ed34](feast-dev@972ed34)) * Kickoff Transformation implementationtransformation code base ([feast-dev#5181](feast-dev#5181)) ([0083303](feast-dev@0083303)) * Make keep-alive timeout configurable for async DynamoDB connections ([feast-dev#5167](feast-dev#5167)) ([7f3e528](feast-dev@7f3e528)) * Operator mounts the odh-trusted-ca-bundle configmap when deployed on RHOAI or ODH ([d4d7b0d](feast-dev@d4d7b0d)) * Spark Transformation ([feast-dev#5185](feast-dev#5185)) ([be3d85c](feast-dev@be3d85c)) Signed-off-by: Jacob Weinhold <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it:
This PR introduces enhancements to the RAG (Retrieval-Augmented Generation) demo, adds support for PDF transformation with Docling, and updates various components for better feature handling and testing.
examples/rag-docling/*
docling-demo.ipynb
) demonstrating the use of Docling for text extraction from PDFs.docling-quickstart.ipynb
) showing how to use Feast to ingest and retrieve text data from the online store.sdk/python/feast/feature_store.py
_get_feature_view_and_df_for_online_write
to handle feature view transformations differently for singleton and non-singleton cases.sdk/python/tests/unit/online_store/test_online_retrieval.py
sdk/python/tests/unit/online_store/test_online_writes.py
TestOnlineWritesWithTransform
to validate PDF handling and feature transformations during online writes.sdk/python/tests/unit/test_on_demand_python_transformation.py
docling_transform_docs
to handle multiple inputs and validate the transformation logic.Which issue(s) this PR fixes:
N/A
Misc:
N/A