fix: Redundant feature materialization and premature incremental materialization timestamp updates#3789
Conversation
|
great, I have been working on the similar PR, but you did it first and way better 💯 |
sdk/python/feast/infra/materialization/contrib/bytewax/bytewax_materialization_engine.py
Show resolved
Hide resolved
|
@james-crabtree-sp Thanks again for putting this PR together. I think in order to merge, you'll need to sign your commits: https://docs.feast.dev/project/development-guide#signing-off-commits |
sdk/python/feast/infra/materialization/contrib/bytewax/bytewax_materialization_engine.py
Show resolved
Hide resolved
I think they are all signed off already. I see that the DCO task that checks this completed successfully https://github.com/feast-dev/feast/pull/3789/checks?check_run_id=17541354092 |
Sorry for the confusion, I was referring to the requirements to merge, and linked to the wrong documentation. This is the requirement that I was referring to: Merging is blocked |
|
hey @james-crabtree-sp this is a great PR, you need any hands to make it merged? I see there's a lot of commits in the past unverified if you don't mind I can create another PR to mimic this one |
0631425 to
baf108c
Compare
Signed-off-by: James Crabtree <[email protected]>
Signed-off-by: James Crabtree <[email protected]>
Signed-off-by: James Crabtree <[email protected]>
) * SAASMLOPS-809 fix bytewax workers so they only process a single file * SAASMLOPS-809 fix newlines Signed-off-by: James Crabtree <[email protected]>
* SAASMLOPS-833 add configurable job timeout * SAASMLOPS-833 fix whitespace Signed-off-by: James Crabtree <[email protected]>
Signed-off-by: James Crabtree <[email protected]>
Signed-off-by: James Crabtree <[email protected]>
Signed-off-by: James Crabtree <[email protected]>
baf108c to
fabd417
Compare
|
@whoahbot @sudohainguyen Ahh, I understand now. Commits are now verified |
exactly 🤣 looks good now but you need to run code format though |
fabd417 to
c35b0c1
Compare
Noticed that too. Weird rebase issue. Should be fixed now |
…error Signed-off-by: James Crabtree <[email protected]>
Signed-off-by: James Crabtree <[email protected]>
9920df7 to
938b903
Compare
|
@james-crabtree-sp I've been adopting your changes on my side so far do we need to wait the job to be done, like forever without timeout? |
Well, this PR adds the configurable active_deadline_seconds setting, which can be used to impose a timeout on the kubernetes jobs. Does that answer your question? |
cool clear |
# [0.35.0](v0.34.0...v0.35.0) (2024-01-13) ### Bug Fixes * Add async refresh to prevent synchronous refresh in main thread ([#3812](#3812)) ([9583ed6](9583ed6)) * Adopt connection pooling for HBase ([#3793](#3793)) ([b3852bf](b3852bf)) * Bytewax engine create configmap from object ([#3821](#3821)) ([25e9775](25e9775)) * Fix warnings from deprecated paths and update default log level ([#3757](#3757)) ([68a8737](68a8737)) * improve parsing bytewax job status ([5983f40](5983f40)) * make bytewax settings unexposed ([ae1bb8b](ae1bb8b)) * Make generated temp table name escaped ([#3797](#3797)) ([175d796](175d796)) * Pin numpy version to avoid spammy deprecation messages ([774ed33](774ed33)) * Redundant feature materialization and premature incremental materialization timestamp updates ([#3789](#3789)) ([417b16b](417b16b)), closes [#6](#6) [#7](#7) * Resolve hbase hotspot issue when materializing ([#3790](#3790)) ([7376db8](7376db8)) * Set keepalives_idle None by default ([#3756](#3756)) ([8717e9b](8717e9b)) * Set upper bound for bigquery client due to its breaking changes ([2151c39](2151c39)) * UI project cannot handle fallback routes ([#3766](#3766)) ([96ece0f](96ece0f)) * update dependencies versions due to conflicts ([5dc0b24](5dc0b24)) * Update jackson and remove unnecessary logging ([#3809](#3809)) ([018d0ea](018d0ea)) * upgrade the pyarrow to latest v14.0.1 for CVE-2023-47248. ([052182b](052182b)) ### Features * Add get online feature rpc to gprc server ([#3815](#3815)) ([01db8cc](01db8cc)) * Add materialize and materialize-incremental rest endpoints ([#3761](#3761)) ([fa600fe](fa600fe)), closes [#3760](#3760) * add redis sentinel support ([3387a15](3387a15)) * add redis sentinel support ([4337c89](4337c89)) * add redis sentinel support format lint ([aad8718](aad8718)) * Add support for `table_create_disposition` in bigquery job for offline store ([#3762](#3762)) ([6a728fe](6a728fe)) * Add support for in_cluster config and additional labels for bytewax materialization ([#3754](#3754)) ([2192e65](2192e65)) * Apply cache to load proto registry for performance ([#3702](#3702)) ([709c709](709c709)) * Make bytewax job write as mini-batches ([#3777](#3777)) ([9b0e5ce](9b0e5ce)) * Optimize bytewax pod resource with zero-copy ([9cf9d96](9cf9d96)) * Support GCS filesystem for bytewax engine ([#3774](#3774)) ([fb6b807](fb6b807))
# [0.35.0](feast-dev/feast@v0.34.0...v0.35.0) (2024-01-13) ### Bug Fixes * Add async refresh to prevent synchronous refresh in main thread ([feast-dev#3812](feast-dev#3812)) ([9583ed6](feast-dev@9583ed6)) * Adopt connection pooling for HBase ([feast-dev#3793](feast-dev#3793)) ([b3852bf](feast-dev@b3852bf)) * Bytewax engine create configmap from object ([feast-dev#3821](feast-dev#3821)) ([25e9775](feast-dev@25e9775)) * Fix warnings from deprecated paths and update default log level ([feast-dev#3757](feast-dev#3757)) ([68a8737](feast-dev@68a8737)) * improve parsing bytewax job status ([5983f40](feast-dev@5983f40)) * make bytewax settings unexposed ([ae1bb8b](feast-dev@ae1bb8b)) * Make generated temp table name escaped ([feast-dev#3797](feast-dev#3797)) ([175d796](feast-dev@175d796)) * Pin numpy version to avoid spammy deprecation messages ([774ed33](feast-dev@774ed33)) * Redundant feature materialization and premature incremental materialization timestamp updates ([feast-dev#3789](feast-dev#3789)) ([417b16b](feast-dev@417b16b)), closes [feast-dev#6](feast-dev#6) [feast-dev#7](feast-dev#7) * Resolve hbase hotspot issue when materializing ([feast-dev#3790](feast-dev#3790)) ([7376db8](feast-dev@7376db8)) * Set keepalives_idle None by default ([feast-dev#3756](feast-dev#3756)) ([8717e9b](feast-dev@8717e9b)) * Set upper bound for bigquery client due to its breaking changes ([2151c39](feast-dev@2151c39)) * UI project cannot handle fallback routes ([feast-dev#3766](feast-dev#3766)) ([96ece0f](feast-dev@96ece0f)) * update dependencies versions due to conflicts ([5dc0b24](feast-dev@5dc0b24)) * Update jackson and remove unnecessary logging ([feast-dev#3809](feast-dev#3809)) ([018d0ea](feast-dev@018d0ea)) * upgrade the pyarrow to latest v14.0.1 for CVE-2023-47248. ([052182b](feast-dev@052182b)) ### Features * Add get online feature rpc to gprc server ([feast-dev#3815](feast-dev#3815)) ([01db8cc](feast-dev@01db8cc)) * Add materialize and materialize-incremental rest endpoints ([feast-dev#3761](feast-dev#3761)) ([fa600fe](feast-dev@fa600fe)), closes [feast-dev#3760](feast-dev#3760) * add redis sentinel support ([3387a15](feast-dev@3387a15)) * add redis sentinel support ([4337c89](feast-dev@4337c89)) * add redis sentinel support format lint ([aad8718](feast-dev@aad8718)) * Add support for `table_create_disposition` in bigquery job for offline store ([feast-dev#3762](feast-dev#3762)) ([6a728fe](feast-dev@6a728fe)) * Add support for in_cluster config and additional labels for bytewax materialization ([feast-dev#3754](feast-dev#3754)) ([2192e65](feast-dev@2192e65)) * Apply cache to load proto registry for performance ([feast-dev#3702](feast-dev#3702)) ([709c709](feast-dev@709c709)) * Make bytewax job write as mini-batches ([feast-dev#3777](feast-dev#3777)) ([9b0e5ce](feast-dev@9b0e5ce)) * Optimize bytewax pod resource with zero-copy ([9cf9d96](feast-dev@9cf9d96)) * Support GCS filesystem for bytewax engine ([feast-dev#3774](feast-dev#3774)) ([fb6b807](feast-dev@fb6b807)) Signed-off-by: tokoko <[email protected]>
…rialization timestamp updates (feast-dev#3789) * SAASMLOPS-767 wait for jobs to complete Signed-off-by: James Crabtree <[email protected]> * SAASMLOPS-805 Stopgap change to fix duplicate materialization of data Signed-off-by: James Crabtree <[email protected]> * SAASMLOPS-805 save BYTEWAX_REPLICAS=1 Signed-off-by: James Crabtree <[email protected]> * SAASMLOPS-809 fix bytewax workers so they only process a single file (feast-dev#6) * SAASMLOPS-809 fix bytewax workers so they only process a single file * SAASMLOPS-809 fix newlines Signed-off-by: James Crabtree <[email protected]> * SAASMLOPS-833 add configurable job timeout (feast-dev#7) * SAASMLOPS-833 add configurable job timeout * SAASMLOPS-833 fix whitespace Signed-off-by: James Crabtree <[email protected]> * develop Run large materializations in batches of pods Signed-off-by: James Crabtree <[email protected]> * master Set job_batch_size at least equal to max_parallelism Signed-off-by: James Crabtree <[email protected]> * master clarity max_parallelism description Signed-off-by: James Crabtree <[email protected]> * master resolve bug that causes materialization to continue after job error Signed-off-by: James Crabtree <[email protected]> * master resolve bug causing pod logs to not be printed Signed-off-by: James Crabtree <[email protected]> --------- Signed-off-by: James Crabtree <[email protected]> Signed-off-by: Attila Toth <[email protected]>
# [0.35.0](feast-dev/feast@v0.34.0...v0.35.0) (2024-01-13) ### Bug Fixes * Add async refresh to prevent synchronous refresh in main thread ([feast-dev#3812](feast-dev#3812)) ([9583ed6](feast-dev@9583ed6)) * Adopt connection pooling for HBase ([feast-dev#3793](feast-dev#3793)) ([b3852bf](feast-dev@b3852bf)) * Bytewax engine create configmap from object ([feast-dev#3821](feast-dev#3821)) ([25e9775](feast-dev@25e9775)) * Fix warnings from deprecated paths and update default log level ([feast-dev#3757](feast-dev#3757)) ([68a8737](feast-dev@68a8737)) * improve parsing bytewax job status ([5983f40](feast-dev@5983f40)) * make bytewax settings unexposed ([ae1bb8b](feast-dev@ae1bb8b)) * Make generated temp table name escaped ([feast-dev#3797](feast-dev#3797)) ([175d796](feast-dev@175d796)) * Pin numpy version to avoid spammy deprecation messages ([774ed33](feast-dev@774ed33)) * Redundant feature materialization and premature incremental materialization timestamp updates ([feast-dev#3789](feast-dev#3789)) ([417b16b](feast-dev@417b16b)), closes [feast-dev#6](feast-dev#6) [feast-dev#7](feast-dev#7) * Resolve hbase hotspot issue when materializing ([feast-dev#3790](feast-dev#3790)) ([7376db8](feast-dev@7376db8)) * Set keepalives_idle None by default ([feast-dev#3756](feast-dev#3756)) ([8717e9b](feast-dev@8717e9b)) * Set upper bound for bigquery client due to its breaking changes ([2151c39](feast-dev@2151c39)) * UI project cannot handle fallback routes ([feast-dev#3766](feast-dev#3766)) ([96ece0f](feast-dev@96ece0f)) * update dependencies versions due to conflicts ([5dc0b24](feast-dev@5dc0b24)) * Update jackson and remove unnecessary logging ([feast-dev#3809](feast-dev#3809)) ([018d0ea](feast-dev@018d0ea)) * upgrade the pyarrow to latest v14.0.1 for CVE-2023-47248. ([052182b](feast-dev@052182b)) ### Features * Add get online feature rpc to gprc server ([feast-dev#3815](feast-dev#3815)) ([01db8cc](feast-dev@01db8cc)) * Add materialize and materialize-incremental rest endpoints ([feast-dev#3761](feast-dev#3761)) ([fa600fe](feast-dev@fa600fe)), closes [feast-dev#3760](feast-dev#3760) * add redis sentinel support ([3387a15](feast-dev@3387a15)) * add redis sentinel support ([4337c89](feast-dev@4337c89)) * add redis sentinel support format lint ([aad8718](feast-dev@aad8718)) * Add support for `table_create_disposition` in bigquery job for offline store ([feast-dev#3762](feast-dev#3762)) ([6a728fe](feast-dev@6a728fe)) * Add support for in_cluster config and additional labels for bytewax materialization ([feast-dev#3754](feast-dev#3754)) ([2192e65](feast-dev@2192e65)) * Apply cache to load proto registry for performance ([feast-dev#3702](feast-dev#3702)) ([709c709](feast-dev@709c709)) * Make bytewax job write as mini-batches ([feast-dev#3777](feast-dev#3777)) ([9b0e5ce](feast-dev@9b0e5ce)) * Optimize bytewax pod resource with zero-copy ([9cf9d96](feast-dev@9cf9d96)) * Support GCS filesystem for bytewax engine ([feast-dev#3774](feast-dev#3774)) ([fb6b807](feast-dev@fb6b807)) Signed-off-by: Attila Toth <[email protected]>
What this PR does / why we need it:
This PR fixes 2 issues encountered while attempting to run very large materializations (Several billion rows as part of an initial run of online materialization).
Which issue(s) this PR fixes:
Fixes #3786
Fixes #3787
Fixes #3788