feat: Enable transformations on PDFs#5172
Merged
franciscojavierarceo merged 29 commits intomasterfrom Mar 21, 2025
Merged
Conversation
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
…t unique chunk-id Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
…ieval Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
HaoXuAI
reviewed
Mar 21, 2025
| ) -> dict[str, Union[list[Any], Any]]: | ||
| rand_dict_value: dict[ValueType, Union[list[Any], Any]] = { | ||
| ValueType.BYTES: [str.encode("hello world")], | ||
| ValueType.PDF_BYTES: [ |
Collaborator
There was a problem hiding this comment.
Is it necessary to have a new type, maybe just use BYTES?
Member
Author
There was a problem hiding this comment.
Yes, running type inference on raw bytes will fail when using a transformation expecting PDF content.
HaoXuAI
reviewed
Mar 21, 2025
Collaborator
HaoXuAI
left a comment
There was a problem hiding this comment.
Overall looks good, just not sure if PDF_BYTES is a primitive value or not.
Member
Author
|
We need it for validation of ODFV. Using raw bytes fails when you test with PDF otherwise. |
Collaborator
sounds good |
HaoXuAI
approved these changes
Mar 21, 2025
franciscojavierarceo
pushed a commit
that referenced
this pull request
Apr 7, 2025
# [0.48.0](v0.47.0...v0.48.0) (2025-04-07) ### Bug Fixes * Enhance integration logos display and styling in the UI ([#5221](#5221)) ([5799257](5799257)) * Fix space typo in push.md docs ([#5184](#5184)) ([81677b2](81677b2)) * Fixed integration tests for qdrant and milvus ([#5224](#5224)) ([d6b080d](d6b080d)) * Formatting trino ([760ec0e](760ec0e)) * Multiple fixes in retrieval of online documents ([#5168](#5168)) ([66ddd3e](66ddd3e)) * Operator route creation for Feast UI in OpenShift ([e3946b4](e3946b4)) * Remove entity_rows parameter from retrieve_online_documents_v2 call ([#5225](#5225)) ([2a2e304](2a2e304)) * Styling ([#5222](#5222)) ([34c393c](34c393c)) * typo in the chart ([bd3448b](bd3448b)) * Update milvus-quickstart and feature_store.yaml with correct Milvus Config ([#5200](#5200)) ([306acca](306acca)) * Update Qdrant online store paths in repo_config.py ([#5207](#5207)) ([ab35b0b](ab35b0b)), closes [#5206](#5206) * Update the doc ([#5194](#5194)) ([726464e](726464e)) * Updated the operator-rabc example to test RBAC from a Kubernete pod ([#5147](#5147)) ([d23a1a5](d23a1a5)) ### Features * add `real`(float32) type for trino offline store ([#4749](#4749)) ([0947f96](0947f96)) * Add async DynamoDB timeout and retry configuration ([#5178](#5178)) ([2f3bcf5](2f3bcf5)) * Add CronJob capability to the Operator (feast apply & materialize-incremental) ([#5217](#5217)) ([285c0dc](285c0dc)) * Add RAG tutorial and Use Cases documentation ([#5226](#5226)) ([99f4004](99f4004)) * Added CLI for features, get historical and online features ([#5197](#5197)) ([4ab9f74](4ab9f74)) * Added export support in feast UI ([#5198](#5198)) ([b079553](b079553)) * Added global registry search support in Feast UI ([#5195](#5195)) ([f09ea49](f09ea49)) * Added UI for Features list ([#5192](#5192)) ([cc7fd47](cc7fd47)) * Adding blog on RAG with Milvus ([#5161](#5161)) ([b9e2e6c](b9e2e6c)) * Adding Docling RAG demo ([#5109](#5109)) ([569404b](569404b)) * Allow transformations on writes to output list of entities ([#5209](#5209)) ([955521a](955521a)) * Cache get_any_feature_view results ([#5175](#5175)) ([924b8a3](924b8a3)) * Clickhouse offline store ([#4725](#4725)) ([86794c2](86794c2)) * Enable keyword search for Milvus ([#5199](#5199)) ([ac44967](ac44967)) * Enable transformations on PDFs ([#5172](#5172)) ([3674971](3674971)) * Enable users to use Entity Query as CTE during historical retrieval ([#5202](#5202)) ([fe69eaf](fe69eaf)) * helm support more deployment config ([d575372](d575372)) * Improved CLI file structuring ([#5201](#5201)) ([972ed34](972ed34)) * Kickoff Transformation implementationtransformation code base ([#5181](#5181)) ([0083303](0083303)) * Make keep-alive timeout configurable for async DynamoDB connections ([#5167](#5167)) ([7f3e528](7f3e528)) * Operator mounts the odh-trusted-ca-bundle configmap when deployed on RHOAI or ODH ([d4d7b0d](d4d7b0d)) * Spark Transformation ([#5185](#5185)) ([be3d85c](be3d85c))
jfw-ppi
pushed a commit
to jfw-ppi/feast
that referenced
this pull request
Jun 7, 2025
Signed-off-by: Jacob Weinhold <[email protected]>
jfw-ppi
pushed a commit
to jfw-ppi/feast
that referenced
this pull request
Jun 7, 2025
# [0.48.0](feast-dev/feast@v0.47.0...v0.48.0) (2025-04-07) ### Bug Fixes * Enhance integration logos display and styling in the UI ([feast-dev#5221](feast-dev#5221)) ([5799257](feast-dev@5799257)) * Fix space typo in push.md docs ([feast-dev#5184](feast-dev#5184)) ([81677b2](feast-dev@81677b2)) * Fixed integration tests for qdrant and milvus ([feast-dev#5224](feast-dev#5224)) ([d6b080d](feast-dev@d6b080d)) * Formatting trino ([760ec0e](feast-dev@760ec0e)) * Multiple fixes in retrieval of online documents ([feast-dev#5168](feast-dev#5168)) ([66ddd3e](feast-dev@66ddd3e)) * Operator route creation for Feast UI in OpenShift ([e3946b4](feast-dev@e3946b4)) * Remove entity_rows parameter from retrieve_online_documents_v2 call ([feast-dev#5225](feast-dev#5225)) ([2a2e304](feast-dev@2a2e304)) * Styling ([feast-dev#5222](feast-dev#5222)) ([34c393c](feast-dev@34c393c)) * typo in the chart ([bd3448b](feast-dev@bd3448b)) * Update milvus-quickstart and feature_store.yaml with correct Milvus Config ([feast-dev#5200](feast-dev#5200)) ([306acca](feast-dev@306acca)) * Update Qdrant online store paths in repo_config.py ([feast-dev#5207](feast-dev#5207)) ([ab35b0b](feast-dev@ab35b0b)), closes [feast-dev#5206](feast-dev#5206) * Update the doc ([feast-dev#5194](feast-dev#5194)) ([726464e](feast-dev@726464e)) * Updated the operator-rabc example to test RBAC from a Kubernete pod ([feast-dev#5147](feast-dev#5147)) ([d23a1a5](feast-dev@d23a1a5)) ### Features * add `real`(float32) type for trino offline store ([feast-dev#4749](feast-dev#4749)) ([0947f96](feast-dev@0947f96)) * Add async DynamoDB timeout and retry configuration ([feast-dev#5178](feast-dev#5178)) ([2f3bcf5](feast-dev@2f3bcf5)) * Add CronJob capability to the Operator (feast apply & materialize-incremental) ([feast-dev#5217](feast-dev#5217)) ([285c0dc](feast-dev@285c0dc)) * Add RAG tutorial and Use Cases documentation ([feast-dev#5226](feast-dev#5226)) ([99f4004](feast-dev@99f4004)) * Added CLI for features, get historical and online features ([feast-dev#5197](feast-dev#5197)) ([4ab9f74](feast-dev@4ab9f74)) * Added export support in feast UI ([feast-dev#5198](feast-dev#5198)) ([b079553](feast-dev@b079553)) * Added global registry search support in Feast UI ([feast-dev#5195](feast-dev#5195)) ([f09ea49](feast-dev@f09ea49)) * Added UI for Features list ([feast-dev#5192](feast-dev#5192)) ([cc7fd47](feast-dev@cc7fd47)) * Adding blog on RAG with Milvus ([feast-dev#5161](feast-dev#5161)) ([b9e2e6c](feast-dev@b9e2e6c)) * Adding Docling RAG demo ([feast-dev#5109](feast-dev#5109)) ([569404b](feast-dev@569404b)) * Allow transformations on writes to output list of entities ([feast-dev#5209](feast-dev#5209)) ([955521a](feast-dev@955521a)) * Cache get_any_feature_view results ([feast-dev#5175](feast-dev#5175)) ([924b8a3](feast-dev@924b8a3)) * Clickhouse offline store ([feast-dev#4725](feast-dev#4725)) ([86794c2](feast-dev@86794c2)) * Enable keyword search for Milvus ([feast-dev#5199](feast-dev#5199)) ([ac44967](feast-dev@ac44967)) * Enable transformations on PDFs ([feast-dev#5172](feast-dev#5172)) ([3674971](feast-dev@3674971)) * Enable users to use Entity Query as CTE during historical retrieval ([feast-dev#5202](feast-dev#5202)) ([fe69eaf](feast-dev@fe69eaf)) * helm support more deployment config ([d575372](feast-dev@d575372)) * Improved CLI file structuring ([feast-dev#5201](feast-dev#5201)) ([972ed34](feast-dev@972ed34)) * Kickoff Transformation implementationtransformation code base ([feast-dev#5181](feast-dev#5181)) ([0083303](feast-dev@0083303)) * Make keep-alive timeout configurable for async DynamoDB connections ([feast-dev#5167](feast-dev#5167)) ([7f3e528](feast-dev@7f3e528)) * Operator mounts the odh-trusted-ca-bundle configmap when deployed on RHOAI or ODH ([d4d7b0d](feast-dev@d4d7b0d)) * Spark Transformation ([feast-dev#5185](feast-dev#5185)) ([be3d85c](feast-dev@be3d85c)) Signed-off-by: Jacob Weinhold <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it:
This PR updates feature transformations, online store handling, and support for PDFs during transformation.
sdk/python/feast/infra/online_stores/milvus_online_store/milvus.pyonline_readto include a new condition for updating thefeature_name_feast_primitive_type_mapbased on thetable.schema.string_valin the list of value types.sdk/python/feast/feature_store.py_get_feature_view_and_df_for_online_writeto handle feature view transformations differently for singleton and non-singleton cases.sdk/python/feast/on_demand_feature_view.pyPDF_BYTESin the random input construction.sdk/python/feast/transformation/python_transformation.pysdk/python/feast/types.pyPDF_BYTESto the primitive feast types and updated related mappings and enums.sdk/python/feast/value_type.pyPDF_BYTESto theValueTypeenum.sdk/python/tests/unit/online_store/test_online_retrieval.pytest_milvus_stored_writes_with_explodeto validate storing and retrieving exploded document embeddings with Milvus online store.test_milvus_lite_get_online_documents_v2totest_milvus_lite_retrieve_online_documents_v2.sdk/python/tests/unit/test_on_demand_python_transformation.pyWhich issue(s) this PR fixes:
#5173
Misc:
N/A