feat: Adding support for dictionary writes to online store #4156
feat: Adding support for dictionary writes to online store #4156jeremyary merged 10 commits intofeast-dev:masterfrom
Conversation
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
sdk/python/feast/feature_store.py
Outdated
| feature_view_name: str, | ||
| df: pd.DataFrame, | ||
| df: Optional[pd.DataFrame] = None, | ||
| input_dict: Optional[Dict] = None, |
There was a problem hiding this comment.
Wouldn't this make more sense as a single param with a Union type?
There was a problem hiding this comment.
I could do a type check I just thought I'd be explicit about it first...because if I make it a single parameter then it would make sense to change the name of the input argument, which means then I'd have to make a breaking change or do a staged rollout.
There was a problem hiding this comment.
We can do a staggered release for this and get feedback from folks but I wouldn't want to make any breaking changes.
There was a problem hiding this comment.
yeah, df name is really awkward. maybe call the new one something other than input_dict so that it can become the universal one after we deprecate and remove df? just inputs for example.
There was a problem hiding this comment.
Yeah I like that. I'll update.
|
hey quick question, how will the change optimize feature writes? 🤔 |
I think this is the same question for me :) |
@HaoXuAI @sudohainguyen I added some notes to the PR to be more explicit about the plan here. |
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
|
@franciscojavierarceo btw, what is |
Signed-off-by: Francisco Javier Arceo <[email protected]>
I haven't spent any time using pyarrow but definitely open to it. |
Signed-off-by: Francisco Javier Arceo <[email protected]>
| feature_view_name: str, | ||
| df: pd.DataFrame, | ||
| df: Optional[pd.DataFrame] = None, | ||
| inputs: Optional[Union[Dict[str, List[Any]], pd.DataFrame]] = None, |
…#4156) * feat: Adding support for dictionary writes to online store Signed-off-by: Francisco Javier Arceo <[email protected]> * Simple approach Signed-off-by: Francisco Javier Arceo <[email protected]> * lint Signed-off-by: Francisco Javier Arceo <[email protected]> * adding error if both are missing Signed-off-by: Francisco Javier Arceo <[email protected]> * rename dict to input_dict Signed-off-by: Francisco Javier Arceo <[email protected]> * updated input arg to test Signed-off-by: Francisco Javier Arceo <[email protected]> * Renaming function argument Signed-off-by: Francisco Javier Arceo <[email protected]> * updated docstring Signed-off-by: Francisco Javier Arceo <[email protected]> * updated type signature Signed-off-by: Francisco Javier Arceo <[email protected]> * updated type to be more explicit Signed-off-by: Francisco Javier Arceo <[email protected]> --------- Signed-off-by: Francisco Javier Arceo <[email protected]>
…#4156) * feat: Adding support for dictionary writes to online store Signed-off-by: Francisco Javier Arceo <[email protected]> * Simple approach Signed-off-by: Francisco Javier Arceo <[email protected]> * lint Signed-off-by: Francisco Javier Arceo <[email protected]> * adding error if both are missing Signed-off-by: Francisco Javier Arceo <[email protected]> * rename dict to input_dict Signed-off-by: Francisco Javier Arceo <[email protected]> * updated input arg to test Signed-off-by: Francisco Javier Arceo <[email protected]> * Renaming function argument Signed-off-by: Francisco Javier Arceo <[email protected]> * updated docstring Signed-off-by: Francisco Javier Arceo <[email protected]> * updated type signature Signed-off-by: Francisco Javier Arceo <[email protected]> * updated type to be more explicit Signed-off-by: Francisco Javier Arceo <[email protected]> --------- Signed-off-by: Francisco Javier Arceo <[email protected]> Signed-off-by: Theodor Mihalache <[email protected]>
# [0.38.0](v0.37.0...v0.38.0) (2024-05-24) ### Bug Fixes * Add vector database doc ([#4165](#4165)) ([37f36b6](37f36b6)) * Change checkout action back to v3 from v5 which isn't released yet ([#4147](#4147)) ([9523fff](9523fff)) * Change numpy version <1.25 dependency to <2 in setup.py ([#4085](#4085)) ([2ba71ff](2ba71ff)), closes [#4084](#4084) * Changed the code the way mysql container is initialized. ([#4140](#4140)) ([8b5698f](8b5698f)), closes [#4126](#4126) * Correct nightly install command, move all installs to uv ([#4164](#4164)) ([c86d594](c86d594)) * Default value is not set in Redis connection string using environment variable ([#4136](#4136)) ([95acfb4](95acfb4)), closes [#3669](#3669) * Get container host addresses from testcontainers (java) ([#4125](#4125)) ([9184dde](9184dde)) * Get rid of empty string `name_alias` during feature view projection deserialization ([#4116](#4116)) ([65056ce](65056ce)) * Helm chart `feast-feature-server`, improve Service template name ([#4161](#4161)) ([dedc164](dedc164)) * Improve the code related to on-demand-featureview. ([#4203](#4203)) ([d91d7e0](d91d7e0)) * Integration tests for async sdk method ([#4201](#4201)) ([08c44ae](08c44ae)) * Make sure schema is used when calling `get_table_query_string` method for Snowflake datasource ([#4131](#4131)) ([c1579c7](c1579c7)) * Make sure schema is used when generating `from_expression` for Snowflake ([#4177](#4177)) ([5051da7](5051da7)) * Pass native input values to `get_online_features` from feature server ([#4117](#4117)) ([60756cb](60756cb)) * Pass region to S3 client only if set (Java) ([#4151](#4151)) ([b8087f7](b8087f7)) * Pgvector patch ([#4108](#4108)) ([ad45bb4](ad45bb4)) * Update doc ([#4153](#4153)) ([e873636](e873636)) * Update master-only benchmark bucket name due to credential update ([#4183](#4183)) ([e88f1e3](e88f1e3)) * Updating the instructions for quickstart guide. ([#4120](#4120)) ([0c30e96](0c30e96)) * Upgrading the test container so that local tests works with updated d… ([#4155](#4155)) ([93ddb11](93ddb11)) ### Features * Add a Kubernetes Operator for the Feast Feature Server ([#4145](#4145)) ([4a696dc](4a696dc)) * Add delta format to `FileSource`, add support for it in ibis/duckdb ([#4123](#4123)) ([2b6f1d0](2b6f1d0)) * Add materialization support to ibis/duckdb ([#4173](#4173)) ([369ca98](369ca98)) * Add optional private key params to Snowflake config ([#4205](#4205)) ([20f5419](20f5419)) * Add s3 remote storage export for duckdb ([#4195](#4195)) ([6a04c48](6a04c48)) * Adding DatastoreOnlineStore 'database' argument. ([#4180](#4180)) ([e739745](e739745)) * Adding get_online_features_async to feature store sdk ([#4172](#4172)) ([311efc5](311efc5)) * Adding support for dictionary writes to online store ([#4156](#4156)) ([abfac01](abfac01)) * Elasticsearch vector database ([#4188](#4188)) ([bf99640](bf99640)) * Enable other distance metrics for Vector DB and Update docs ([#4170](#4170)) ([ba9f4ef](ba9f4ef)) * Feast/IKV datetime edgecase errors ([#4211](#4211)) ([bdae562](bdae562)) * Feast/IKV documenation language changes ([#4149](#4149)) ([690a621](690a621)) * Feast/IKV online store contrib plugin integration ([#4068](#4068)) ([f2b4eb9](f2b4eb9)) * Feast/IKV online store documentation ([#4146](#4146)) ([73601e4](73601e4)) * Feast/IKV upgrade client version ([#4200](#4200)) ([0e42150](0e42150)) * Incorporate substrait ODFVs into ibis-based offline store queries ([#4102](#4102)) ([c3a102f](c3a102f)) * Isolate input-dependent calculations in `get_online_features` ([#4041](#4041)) ([2a6edea](2a6edea)) * Make arrow primary interchange for online ODFV execution ([#4143](#4143)) ([3fdb716](3fdb716)) * Move data source validation entrypoint to offline store ([#4197](#4197)) ([a17725d](a17725d)) * Upgrading python version to 3.11, adding support for 3.11 as well. ([#4159](#4159)) ([4b1634f](4b1634f)), closes [#4152](#4152) [#4114](#4114) ### Reverts * Reverts "fix: Using version args to install the correct feast version" ([#4112](#4112)) ([b66baa4](b66baa4)), closes [#3953](#3953)
# [0.38.0](v0.37.0...v0.38.0) (2024-05-24) ### Bug Fixes * Add vector database doc ([#4165](#4165)) ([37f36b6](37f36b6)) * Change checkout action back to v3 from v5 which isn't released yet ([#4147](#4147)) ([9523fff](9523fff)) * Change numpy version <1.25 dependency to <2 in setup.py ([#4085](#4085)) ([2ba71ff](2ba71ff)), closes [#4084](#4084) * Changed the code the way mysql container is initialized. ([#4140](#4140)) ([8b5698f](8b5698f)), closes [#4126](#4126) * Correct nightly install command, move all installs to uv ([#4164](#4164)) ([c86d594](c86d594)) * Default value is not set in Redis connection string using environment variable ([#4136](#4136)) ([95acfb4](95acfb4)), closes [#3669](#3669) * Get container host addresses from testcontainers (java) ([#4125](#4125)) ([9184dde](9184dde)) * Get rid of empty string `name_alias` during feature view projection deserialization ([#4116](#4116)) ([65056ce](65056ce)) * Helm chart `feast-feature-server`, improve Service template name ([#4161](#4161)) ([dedc164](dedc164)) * Improve the code related to on-demand-featureview. ([#4203](#4203)) ([d91d7e0](d91d7e0)) * Integration tests for async sdk method ([#4201](#4201)) ([08c44ae](08c44ae)) * Make sure schema is used when calling `get_table_query_string` method for Snowflake datasource ([#4131](#4131)) ([c1579c7](c1579c7)) * Make sure schema is used when generating `from_expression` for Snowflake ([#4177](#4177)) ([5051da7](5051da7)) * Pass native input values to `get_online_features` from feature server ([#4117](#4117)) ([60756cb](60756cb)) * Pass region to S3 client only if set (Java) ([#4151](#4151)) ([b8087f7](b8087f7)) * Pgvector patch ([#4108](#4108)) ([ad45bb4](ad45bb4)) * Update doc ([#4153](#4153)) ([e873636](e873636)) * Update master-only benchmark bucket name due to credential update ([#4183](#4183)) ([e88f1e3](e88f1e3)) * Updating the instructions for quickstart guide. ([#4120](#4120)) ([0c30e96](0c30e96)) * Upgrading the test container so that local tests works with updated d… ([#4155](#4155)) ([93ddb11](93ddb11)) ### Features * Add a Kubernetes Operator for the Feast Feature Server ([#4145](#4145)) ([4a696dc](4a696dc)) * Add delta format to `FileSource`, add support for it in ibis/duckdb ([#4123](#4123)) ([2b6f1d0](2b6f1d0)) * Add materialization support to ibis/duckdb ([#4173](#4173)) ([369ca98](369ca98)) * Add optional private key params to Snowflake config ([#4205](#4205)) ([20f5419](20f5419)) * Add s3 remote storage export for duckdb ([#4195](#4195)) ([6a04c48](6a04c48)) * Adding DatastoreOnlineStore 'database' argument. ([#4180](#4180)) ([e739745](e739745)) * Adding get_online_features_async to feature store sdk ([#4172](#4172)) ([311efc5](311efc5)) * Adding support for dictionary writes to online store ([#4156](#4156)) ([abfac01](abfac01)) * Elasticsearch vector database ([#4188](#4188)) ([bf99640](bf99640)) * Enable other distance metrics for Vector DB and Update docs ([#4170](#4170)) ([ba9f4ef](ba9f4ef)) * Feast/IKV datetime edgecase errors ([#4211](#4211)) ([bdae562](bdae562)) * Feast/IKV documenation language changes ([#4149](#4149)) ([690a621](690a621)) * Feast/IKV online store contrib plugin integration ([#4068](#4068)) ([f2b4eb9](f2b4eb9)) * Feast/IKV online store documentation ([#4146](#4146)) ([73601e4](73601e4)) * Feast/IKV upgrade client version ([#4200](#4200)) ([0e42150](0e42150)) * Incorporate substrait ODFVs into ibis-based offline store queries ([#4102](#4102)) ([c3a102f](c3a102f)) * Isolate input-dependent calculations in `get_online_features` ([#4041](#4041)) ([2a6edea](2a6edea)) * Make arrow primary interchange for online ODFV execution ([#4143](#4143)) ([3fdb716](3fdb716)) * Move data source validation entrypoint to offline store ([#4197](#4197)) ([a17725d](a17725d)) * Upgrading python version to 3.11, adding support for 3.11 as well. ([#4159](#4159)) ([4b1634f](4b1634f)), closes [#4152](#4152) [#4114](#4114) ### Reverts * Reverts "fix: Using version args to install the correct feast version" ([#4112](#4112)) ([b66baa4](b66baa4)), closes [#3953](#3953)
What this PR does / why we need it:
This PR enables support for passing a Python
Dictobject to write to the online store. This is a first step along the path to optimizing feature writes to the online store.Note, this won't speed up anything and, in fact, introduces minor latency to serialize the dict into a DataFrame. In a subsequent PR I will directly use the dictionary object as it propagates through the Feast backend layer to write rather than any of the existing pandas code. When I do this next step, I will measure the performance benefits to highlight the latency gains.
Which issue(s) this PR fixes:
N/A
Fixes
N/A