-
Notifications
You must be signed in to change notification settings - Fork 2.4k
RepeatUnion improvements: CALCITE-3673 + CALCITE-4054 #2690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RepeatUnion improvements: CALCITE-3673 + CALCITE-4054 #2690
Conversation
@zabetak you contributed a lot on the original implementation of this (experimental) feature. If you have a bit of time, could you please take a look at this patch to address some existing issues on the current implementation? Thanks. |
Hey @rubenada , I don't have much time the following 1-2 weeks to dive into this :/ I had a quick look but cannot have a clear picture in 10-15 min. I am afraid that putting the burden of adding/removing the transient table to the About CALCITE-3673, can't we remove the table while close the enumerator of the |
The repeat union closes the enumerator of the iteration enumerable after every iteration. So this would also close the enumerator of the
The table parameter of |
Thanks for your feedback @zabetak . No worries, there is no rush for this PR (as long as it makes it for 1.30 :P )
Yes, that could be a possible solution. However, there is a small problem, this This is a minor issue though, in fact from the two tickets involved in this PR, CALCITE-3673 is the "minor" one (the transient table left behind in the schema, but the application does not crash). On the contrary, CALCITE-4054 is much more serious because we have a NPE. The problem here is when the transient table is added to the schema; and as I see it, the only possible solution to guarantee that the table will be added on time, and will be accessible for all the operators inside RepeatUnion's inputs is putting the addTable logic inside the RepeatUnion itself (this guarantees fixing CALCITE-4054). Following this logic, my understanding is that analogously the removeTable logic should be put in the RepeatUnion's It is true that this change puts some extra burden on the RepeatUnion, and it makes it less generic and potentially reusable on other use cases (and this was one of our main concerns when we first designed this feature). However, I think it is more important that the primary RepeatUnion use case (recursive union) is able to perform flawless. In any case, at this point this main (and only?) RepeatUnion use case is very tight to its transient table (see |
It's true that the Apart from that, it might worth brainstorming a bit if the problem could be solved by handling the addition/removal of transient tables at another level and not during the execution of some specific operator; I don't have ideas at the moment 😆 |
@zabetak @thomasrebele I would like to get these issues fixed in the next release 1.30. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor comments that may be worth addressing now otherwise LGTM.
/** | ||
* Transient table where repeat union's intermediate results will be stored. | ||
*/ | ||
protected final RelOptTable transientTable; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to update the class javadoc based on this change to explain how is this table supposed to be used? Also can this be null (optional) in case some concrete implementations do not want to make use of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Field converted into nullable.
I do not think the class javadoc needs to be change, I prefer to keep its high-level description as it is (the info about the table can be found in the field javadoc or looking at its usage e.g. in EnumerableRepeatUnion)
+ " EnumerableTableSpool(readType=[LAZY], writeType=[LAZY], table=[[#DELTA#]])\n" | ||
+ " EnumerableCalc(expr#0..8=[{inputs}], empid=[$t4], name=[$t6])\n" | ||
+ " EnumerableCorrelate(correlation=[$cor1], joinType=[inner], requiredColumns=[{1}])\n" | ||
// It is important to have EnumerableCorrelate + #DELTA# table scan on its right |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An explanation of why is it important would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added clarification in the comment
It would be nice if the SchemaPlus.removeTable(String) could be avoided. As there is no easy way around this: LGTM. |
51cb089
to
f7e7834
Compare
[CALCITE-4054] RepeatUnion containing a Correlate with a transientScan on its RHS causes NPE
f7e7834
to
48009f1
Compare
Thanks for the feedback. I have addressed the suggestions and rebased. |
This PR solves some known issues around the RepeatUnion operator, which was implemented via CALCITE-2812.
These issues are:
[CALCITE-3673] ListTransientTable should not leave tables in the schema
[CALCITE-4054] RepeatUnion containing a Correlate with a transientScan on its RHS causes NPE
The root cause of both is the same: how / when the transient table (temporary table that holds RepeatUnion intermediate results) is added / removed in the schema.
As a solution, it is proposed that this responsibility shall be placed in the RepeatUnion operator itself:
EnumerableRepeatUnion#implement
; this will ensure that the table will be ready for its inputs' operators (fixes CALCITE-4054).EnumerableDefaults#repeatUnion#close
(fixes CALCITE-3673).A new method
SchemaPlus#removeTable
needs to be added for the second point. A (no-op) default implementation is provided in the interface for backwards compatibility, to be removed in a future version.