feat: Support compute engine to use multi feature views as source#5482
feat: Support compute engine to use multi feature views as source#5482
Conversation
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
Signed-off-by: HaoXuAI <[email protected]>
| entities: List[str] | ||
| ttl: Optional[timedelta] | ||
| source: DataSource | ||
| sink_source: Optional[DataSource] = None |
There was a problem hiding this comment.
Maybe we should just call it sink?
There was a problem hiding this comment.
Sounds good, will update it
franciscojavierarceo
left a comment
There was a problem hiding this comment.
Some small nits but this mostly lgtm
Can you add a page on the docs before merging this PR? Would be great to share with community on how to use it.
| from feast.infra.compute_engines.dag.node import DAGNode | ||
|
|
||
|
|
||
| def topo_sort(root: DAGNode) -> List[DAGNode]: |
There was a problem hiding this comment.
Why not call it topological_sort?
| else None | ||
| ) | ||
| source_views = [ | ||
| FeatureView.from_proto(FeatureViewProto(spec=view_spec, meta=None)) |
There was a problem hiding this comment.
from_proto() method recursively calls itself for each source view without any depth limit. While cycle detection exists in FeatureResolver, cycle detection only runs when you use the compute engine, but proto deserialization happens much earlier during APIs/ registry loading.
We might need to handle this in FeatureView.from_proto().
There was a problem hiding this comment.
Also, do we not need to store metadata for nested feature views ? meta=None ?
There was a problem hiding this comment.
We need both cycle detection and de-duplication during serialization. It may not cause issue for few feature views but if there are many feature views, it could cause slowness.
A -> [B, C]
B -> [D, E]
C -> [D, E]
When serializing FeatureViewA = FeatureViewD and FeatureViewE get serialized twice.
There was a problem hiding this comment.
right make sense.
I don't have any meta data required for compute engine at the moment, what do you think something useful?
| ttl: Optional[timedelta] | ||
| batch_source: DataSource | ||
| stream_source: Optional[DataSource] | ||
| source_views: Optional[List["FeatureView"]] |
There was a problem hiding this comment.
In __eq__ , I think we also need to compare compare source_views, else two FeatureViews with different source dependencies will be considered equal.
same for __copy__
|
Merged it and add new PR with following suggestions |
# [0.51.0](v0.50.0...v0.51.0) (2025-07-21) ### Bug Fixes * FeatureView serialization with cycle detection ([#5502](#5502)) ([f287ca5](f287ca5)) * Fix current version in publish workflow ([#5499](#5499)) ([0af6e94](0af6e94)) * Fix NPM authentication ([#5506](#5506)) ([9f85892](9f85892)) * Fix verify wheels workflow for macos14 ([#5486](#5486)) ([07174cc](07174cc)) * Fixed error thrown for invalid project name on features api ([#5525](#5525)) ([4a9a5d0](4a9a5d0)) * Fixed ODFV on-write transformations ([271ef74](271ef74)) * Move Install OS X dependencies before python setup ([#5488](#5488)) ([35f211c](35f211c)) * Normalize current version by removing 'v' prefix if present ([#5500](#5500)) ([43f3d52](43f3d52)) * Skip macOS 14 with Python 3.10 due to gettext library ([#5490](#5490)) ([41d4977](41d4977)) * Standalone Web UI Publish Workflow ([#5498](#5498)) ([c47b134](c47b134)) ### Features * Added endpoints to allow user to get data for all projects ([4e06965](4e06965)) * Added grpc and rest endpoint for features ([#5519](#5519)) ([0a75696](0a75696)) * Added relationship support to all API endpoints ([#5496](#5496)) ([bea83e7](bea83e7)) * Continue updating doc ([#5523](#5523)) ([ea53b2b](ea53b2b)) * Hybrid offline store ([#5510](#5510)) ([8f1af55](8f1af55)) * Populate created and updated timestamp on data sources ([af3056b](af3056b)) * Provide ready-to-use Python definitions in api ([37628d9](37628d9)) * Snowflake source. fetch MAX in a single query ([#5387](#5387)) ([b49cea1](b49cea1)) * Support compute engine to use multi feature views as source ([#5482](#5482)) ([b9ac90b](b9ac90b)) * Support pagination and sorting on registry apis ([#5495](#5495)) ([c4b6fbe](c4b6fbe)) * Update doc ([#5521](#5521)) ([2808ce1](2808ce1))
# [0.51.0](v0.50.0...v0.51.0) (2025-07-21) ### Bug Fixes * FeatureView serialization with cycle detection ([#5502](#5502)) ([f287ca5](f287ca5)) * Fix current version in publish workflow ([#5499](#5499)) ([0af6e94](0af6e94)) * Fix NPM authentication ([#5506](#5506)) ([9f85892](9f85892)) * Fix verify wheels workflow for macos14 ([#5486](#5486)) ([07174cc](07174cc)) * Fixed error thrown for invalid project name on features api ([#5525](#5525)) ([4a9a5d0](4a9a5d0)) * Fixed ODFV on-write transformations ([271ef74](271ef74)) * Move Install OS X dependencies before python setup ([#5488](#5488)) ([35f211c](35f211c)) * Normalize current version by removing 'v' prefix if present ([#5500](#5500)) ([43f3d52](43f3d52)) * Skip macOS 14 with Python 3.10 due to gettext library ([#5490](#5490)) ([41d4977](41d4977)) * Standalone Web UI Publish Workflow ([#5498](#5498)) ([c47b134](c47b134)) ### Features * Added endpoints to allow user to get data for all projects ([4e06965](4e06965)) * Added grpc and rest endpoint for features ([#5519](#5519)) ([0a75696](0a75696)) * Added relationship support to all API endpoints ([#5496](#5496)) ([bea83e7](bea83e7)) * Continue updating doc ([#5523](#5523)) ([ea53b2b](ea53b2b)) * Hybrid offline store ([#5510](#5510)) ([8f1af55](8f1af55)) * Populate created and updated timestamp on data sources ([af3056b](af3056b)) * Provide ready-to-use Python definitions in api ([37628d9](37628d9)) * Snowflake source. fetch MAX in a single query ([#5387](#5387)) ([b49cea1](b49cea1)) * Support compute engine to use multi feature views as source ([#5482](#5482)) ([b9ac90b](b9ac90b)) * Support pagination and sorting on registry apis ([#5495](#5495)) ([c4b6fbe](c4b6fbe)) * Update doc ([#5521](#5521)) ([2808ce1](2808ce1))
What this PR does / why we need it:
Nonbreaking changes, backward compatible.
Support multi views in source. E.g,
Diagram:

APIs:
transformationudf you specified yourself, or the default join operation. The default join operation is aninnerjoin on each FeatureView's features, and left join on entity df.This unlocks the request to join multiple data sources, such as SparkSource + SnowflakeSource.
You can do with this setups:
Which issue(s) this PR fixes:
#5444 (comment)
Misc
TODO:
statefulstore.