Skip to content

feat: Kickoff Transformation implementationtransformation code base #5181

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 23, 2025

Conversation

HaoXuAI
Copy link
Collaborator

@HaoXuAI HaoXuAI commented Mar 22, 2025

What this PR does / why we need it:

Created a Transformation interface. it still works with the current pandas_transformation, python_transformation etc.

The next step is refactor the BatchMaterializationEngine to make it works for both Materialization and Transformation.

Which issue(s) this PR fixes:

#4584
#4277 (comment)
#4696

Misc

@HaoXuAI HaoXuAI requested a review from a team as a code owner March 22, 2025 22:25
@HaoXuAI
Copy link
Collaborator Author

HaoXuAI commented Mar 22, 2025

Reference comment: #5130 (comment)

@HaoXuAI
Copy link
Collaborator Author

HaoXuAI commented Mar 22, 2025

One unstable unit test seems to be related to HF @franciscojavierarceo :
FAILED sdk/python/tests/unit/test_on_demand_python_transformation.py::TestOnDemandTransformationsWithWrites::test_docling_transform - huggingface_hub.errors.LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.

And rerun got new error:
FAILED sdk/python/tests/unit/test_on_demand_python_transformation.py::TestOnDemandTransformationsWithWrites::test_docling_transform - TypeError: a bytes-like object is required, not 'list

@franciscojavierarceo
Copy link
Member

Got it, I can make a patch for that. 👍

@HaoXuAI
Copy link
Collaborator Author

HaoXuAI commented Mar 22, 2025

Got it, I can make a patch for that. 👍

No rush, take your time. This pr is ready for review now :)

@@ -678,9 +687,6 @@ def _construct_random_input(
) -> dict[str, Union[list[Any], Any]]:
rand_dict_value: dict[ValueType, Union[list[Any], Any]] = {
ValueType.BYTES: [str.encode("hello world")],
ValueType.PDF_BYTES: [

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what's making the unit tests fail.

This is used in infer_features to validate the function schema / types actually.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I think I accidentally delete it when resolve merge conflict. let me add it back


on_demand_feature_view_obj = OnDemandFeatureView(
name=name if name is not None else user_function.__name__,
sources=sources,
schema=schema,
feature_transformation=transformation,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we keep this to be backwards compatible?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is backwards compatible. I put the feature_transformation extraction logic inside the OnDemandFeatureView initialization, since the decorator doesn't pass in the feature_transformation param.
So user can do:

@demand_feature_view(...)
def udf()...

or

odfv = OnDemandFeatureView(feature_transformation=Transformation(...))`

from feast.transformation.mode import TransformationMode


class Transformation(ABC):
Copy link
Member

@franciscojavierarceo franciscojavierarceo Mar 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd add a docstring here. ChatGPT should be a nice friend.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good point

owner: Optional[str] = "",
):
def mainify(obj):
# Needed to allow dill to properly serialize the udf. Otherwise, clients will need to have a file with the same

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we encountered some serialization issues with dill in the past. @Rostifar do you remember what they were?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have dill issue as well. Created a new issue for it:
#5182

inferred_value = feature_value[0]
if singleton and isinstance(inferred_value, list):
Copy link
Member

@franciscojavierarceo franciscojavierarceo Mar 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added this FYI as it's an edge case I didn't consider before, so please feel free to add it back in!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strange I didn't modify this. not sure why it sneak in

Copy link
Member

@franciscojavierarceo franciscojavierarceo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this look great! some small nits and some notes about fixing stuff I recently added which will fix the unit tests. otherwise looks awesome. 👏🚀🤠

@HaoXuAI
Copy link
Collaborator Author

HaoXuAI commented Mar 23, 2025

this look great! some small nits and some notes about fixing stuff I recently added which will fix the unit tests. otherwise looks awesome. 👏🚀🤠

good catch. thanks!

@HaoXuAI
Copy link
Collaborator Author

HaoXuAI commented Mar 23, 2025

@franciscojavierarceo finally passed all tests!

@franciscojavierarceo
Copy link
Member

@franciscojavierarceo finally passed all tests!

image

Copy link
Member

@franciscojavierarceo franciscojavierarceo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @HaoXuAI 🚀🚀🚀

Probably we can update the docs in a follow up PR before the next release?

@franciscojavierarceo franciscojavierarceo merged commit 0083303 into master Mar 23, 2025
31 of 32 checks passed
@franciscojavierarceo franciscojavierarceo deleted the transformation-base branch March 26, 2025 13:29
@franciscojavierarceo franciscojavierarceo restored the transformation-base branch March 26, 2025 13:29
franciscojavierarceo pushed a commit that referenced this pull request Apr 7, 2025
# [0.48.0](v0.47.0...v0.48.0) (2025-04-07)

### Bug Fixes

* Enhance integration logos display and styling in the UI ([#5221](#5221)) ([5799257](5799257))
* Fix space typo in push.md docs ([#5184](#5184)) ([81677b2](81677b2))
* Fixed integration tests for qdrant and milvus ([#5224](#5224)) ([d6b080d](d6b080d))
* Formatting trino ([760ec0e](760ec0e))
* Multiple fixes in retrieval of online documents ([#5168](#5168)) ([66ddd3e](66ddd3e))
* Operator route creation for Feast UI in OpenShift ([e3946b4](e3946b4))
* Remove entity_rows parameter from retrieve_online_documents_v2 call ([#5225](#5225)) ([2a2e304](2a2e304))
* Styling ([#5222](#5222)) ([34c393c](34c393c))
* typo in the chart ([bd3448b](bd3448b))
* Update milvus-quickstart and feature_store.yaml with correct Milvus Config ([#5200](#5200)) ([306acca](306acca))
* Update Qdrant online store paths in repo_config.py ([#5207](#5207)) ([ab35b0b](ab35b0b)), closes [#5206](#5206)
* Update the doc ([#5194](#5194)) ([726464e](726464e))
* Updated the operator-rabc example to test RBAC from a Kubernete pod ([#5147](#5147)) ([d23a1a5](d23a1a5))

### Features

* add `real`(float32) type for trino offline store ([#4749](#4749)) ([0947f96](0947f96))
* Add async DynamoDB timeout and retry configuration ([#5178](#5178)) ([2f3bcf5](2f3bcf5))
* Add CronJob capability to the Operator (feast apply & materialize-incremental) ([#5217](#5217)) ([285c0dc](285c0dc))
* Add RAG tutorial and Use Cases documentation ([#5226](#5226)) ([99f4004](99f4004))
* Added CLI for features, get historical and online features ([#5197](#5197)) ([4ab9f74](4ab9f74))
* Added export support in feast UI ([#5198](#5198)) ([b079553](b079553))
* Added global registry search support in Feast UI ([#5195](#5195)) ([f09ea49](f09ea49))
* Added UI for Features list ([#5192](#5192)) ([cc7fd47](cc7fd47))
* Adding blog on RAG with Milvus ([#5161](#5161)) ([b9e2e6c](b9e2e6c))
* Adding Docling RAG demo ([#5109](#5109)) ([569404b](569404b))
* Allow transformations on writes to output list of entities ([#5209](#5209)) ([955521a](955521a))
* Cache get_any_feature_view results ([#5175](#5175)) ([924b8a3](924b8a3))
* Clickhouse offline store ([#4725](#4725)) ([86794c2](86794c2))
* Enable keyword search for Milvus ([#5199](#5199)) ([ac44967](ac44967))
* Enable transformations on PDFs ([#5172](#5172)) ([3674971](3674971))
* Enable users to use Entity Query as CTE during historical retrieval ([#5202](#5202)) ([fe69eaf](fe69eaf))
* helm support more deployment config ([d575372](d575372))
* Improved CLI file structuring ([#5201](#5201)) ([972ed34](972ed34))
* Kickoff Transformation implementationtransformation code base ([#5181](#5181)) ([0083303](0083303))
* Make keep-alive timeout configurable for async DynamoDB connections ([#5167](#5167)) ([7f3e528](7f3e528))
* Operator mounts the odh-trusted-ca-bundle configmap when deployed on RHOAI or ODH ([d4d7b0d](d4d7b0d))
* Spark Transformation ([#5185](#5185)) ([be3d85c](be3d85c))
j-wine pushed a commit to j-wine/feast that referenced this pull request Jun 7, 2025
…east-dev#5181)

* transformation code base

Signed-off-by: HaoXuAI <[email protected]>

* add back master change to resovle unit test error

Signed-off-by: HaoXuAI <[email protected]>

* add back master change to resovle unit test error

Signed-off-by: HaoXuAI <[email protected]>

* fix linthing

Signed-off-by: HaoXuAI <[email protected]>

* fix linthing

Signed-off-by: HaoXuAI <[email protected]>

* add back master change to resovle unit test error

Signed-off-by: HaoXuAI <[email protected]>

---------

Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: Jacob Weinhold <[email protected]>
j-wine pushed a commit to j-wine/feast that referenced this pull request Jun 7, 2025
# [0.48.0](feast-dev/feast@v0.47.0...v0.48.0) (2025-04-07)

### Bug Fixes

* Enhance integration logos display and styling in the UI ([feast-dev#5221](feast-dev#5221)) ([5799257](feast-dev@5799257))
* Fix space typo in push.md docs ([feast-dev#5184](feast-dev#5184)) ([81677b2](feast-dev@81677b2))
* Fixed integration tests for qdrant and milvus ([feast-dev#5224](feast-dev#5224)) ([d6b080d](feast-dev@d6b080d))
* Formatting trino ([760ec0e](feast-dev@760ec0e))
* Multiple fixes in retrieval of online documents ([feast-dev#5168](feast-dev#5168)) ([66ddd3e](feast-dev@66ddd3e))
* Operator route creation for Feast UI in OpenShift ([e3946b4](feast-dev@e3946b4))
* Remove entity_rows parameter from retrieve_online_documents_v2 call ([feast-dev#5225](feast-dev#5225)) ([2a2e304](feast-dev@2a2e304))
* Styling ([feast-dev#5222](feast-dev#5222)) ([34c393c](feast-dev@34c393c))
* typo in the chart ([bd3448b](feast-dev@bd3448b))
* Update milvus-quickstart and feature_store.yaml with correct Milvus Config ([feast-dev#5200](feast-dev#5200)) ([306acca](feast-dev@306acca))
* Update Qdrant online store paths in repo_config.py ([feast-dev#5207](feast-dev#5207)) ([ab35b0b](feast-dev@ab35b0b)), closes [feast-dev#5206](feast-dev#5206)
* Update the doc ([feast-dev#5194](feast-dev#5194)) ([726464e](feast-dev@726464e))
* Updated the operator-rabc example to test RBAC from a Kubernete pod ([feast-dev#5147](feast-dev#5147)) ([d23a1a5](feast-dev@d23a1a5))

### Features

* add `real`(float32) type for trino offline store ([feast-dev#4749](feast-dev#4749)) ([0947f96](feast-dev@0947f96))
* Add async DynamoDB timeout and retry configuration ([feast-dev#5178](feast-dev#5178)) ([2f3bcf5](feast-dev@2f3bcf5))
* Add CronJob capability to the Operator (feast apply & materialize-incremental) ([feast-dev#5217](feast-dev#5217)) ([285c0dc](feast-dev@285c0dc))
* Add RAG tutorial and Use Cases documentation ([feast-dev#5226](feast-dev#5226)) ([99f4004](feast-dev@99f4004))
* Added CLI for features, get historical and online features ([feast-dev#5197](feast-dev#5197)) ([4ab9f74](feast-dev@4ab9f74))
* Added export support in feast UI ([feast-dev#5198](feast-dev#5198)) ([b079553](feast-dev@b079553))
* Added global registry search support in Feast UI ([feast-dev#5195](feast-dev#5195)) ([f09ea49](feast-dev@f09ea49))
* Added UI for Features list ([feast-dev#5192](feast-dev#5192)) ([cc7fd47](feast-dev@cc7fd47))
* Adding blog on RAG with Milvus ([feast-dev#5161](feast-dev#5161)) ([b9e2e6c](feast-dev@b9e2e6c))
* Adding Docling RAG demo ([feast-dev#5109](feast-dev#5109)) ([569404b](feast-dev@569404b))
* Allow transformations on writes to output list of entities ([feast-dev#5209](feast-dev#5209)) ([955521a](feast-dev@955521a))
* Cache get_any_feature_view results ([feast-dev#5175](feast-dev#5175)) ([924b8a3](feast-dev@924b8a3))
* Clickhouse offline store ([feast-dev#4725](feast-dev#4725)) ([86794c2](feast-dev@86794c2))
* Enable keyword search for Milvus ([feast-dev#5199](feast-dev#5199)) ([ac44967](feast-dev@ac44967))
* Enable transformations on PDFs ([feast-dev#5172](feast-dev#5172)) ([3674971](feast-dev@3674971))
* Enable users to use Entity Query as CTE during historical retrieval ([feast-dev#5202](feast-dev#5202)) ([fe69eaf](feast-dev@fe69eaf))
* helm support more deployment config ([d575372](feast-dev@d575372))
* Improved CLI file structuring ([feast-dev#5201](feast-dev#5201)) ([972ed34](feast-dev@972ed34))
* Kickoff Transformation implementationtransformation code base ([feast-dev#5181](feast-dev#5181)) ([0083303](feast-dev@0083303))
* Make keep-alive timeout configurable for async DynamoDB connections ([feast-dev#5167](feast-dev#5167)) ([7f3e528](feast-dev@7f3e528))
* Operator mounts the odh-trusted-ca-bundle configmap when deployed on RHOAI or ODH ([d4d7b0d](feast-dev@d4d7b0d))
* Spark Transformation ([feast-dev#5185](feast-dev#5185)) ([be3d85c](feast-dev@be3d85c))

Signed-off-by: Jacob Weinhold <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants