Skip to content

RepeatUnion improvements: CALCITE-3673 + CALCITE-4054 #2690

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

rubenada
Copy link
Contributor

This PR solves some known issues around the RepeatUnion operator, which was implemented via CALCITE-2812.

These issues are:
[CALCITE-3673] ListTransientTable should not leave tables in the schema
[CALCITE-4054] RepeatUnion containing a Correlate with a transientScan on its RHS causes NPE

The root cause of both is the same: how / when the transient table (temporary table that holds RepeatUnion intermediate results) is added / removed in the schema.

As a solution, it is proposed that this responsibility shall be placed in the RepeatUnion operator itself:

  • It will add the table to the schema at the beginning of its process, see EnumerableRepeatUnion#implement; this will ensure that the table will be ready for its inputs' operators (fixes CALCITE-4054).
  • It will remove the table from the schema at the end of its process, see EnumerableDefaults#repeatUnion#close (fixes CALCITE-3673).

A new method SchemaPlus#removeTable needs to be added for the second point. A (no-op) default implementation is provided in the interface for backwards compatibility, to be removed in a future version.

@rubenada
Copy link
Contributor Author

@zabetak you contributed a lot on the original implementation of this (experimental) feature. If you have a bit of time, could you please take a look at this patch to address some existing issues on the current implementation? Thanks.

@zabetak
Copy link
Member

zabetak commented Jan 15, 2022

Hey @rubenada , I don't have much time the following 1-2 weeks to dive into this :/ I had a quick look but cannot have a clear picture in 10-15 min. I am afraid that putting the burden of adding/removing the transient table to the EnumerableRepeatUnion operator will make the latter less generic since it implies that we will not be able to use it with other sub-plans.

About CALCITE-3673, can't we remove the table while close the enumerator of the ListTransientTable?

@thomasrebele
Copy link
Contributor

The repeat union closes the enumerator of the iteration enumerable after every iteration. So this would also close the enumerator of the ListTransientTable, therefore making the table unavailable for the iterations after the first one. Alternatively we could introduce a TableSpoolScope operator that initializes the table and removes it when it is closed:

TableSpoolScope(delta) // <- creates the ListTransientScan and removes the table from the schema at the end
  RepeatUnion(iterations=..., all=...)
     TableSpool(delta)
        TableScan

The table parameter of EnumerableRepeatUnion could be made optional, allowing to keep the same behavior of the operator as before. @zabetak, could you be more specific about the uses with other sub-plans that you have in mind?

@rubenada
Copy link
Contributor Author

Thanks for your feedback @zabetak . No worries, there is no rush for this PR (as long as it makes it for 1.30 :P )

About CALCITE-3673, can't we remove the table while close the enumerator of the ListTransientTable?

Yes, that could be a possible solution. However, there is a small problem, this close method can be called multiple times in the life time of the RepeatUnion (e.g. multiple iterations of the iterativeRel; or in case this iterativeRel contains a correlation, etc.). So the removeTable would be called multiple times, only the first one being effective.

This is a minor issue though, in fact from the two tickets involved in this PR, CALCITE-3673 is the "minor" one (the transient table left behind in the schema, but the application does not crash).

On the contrary, CALCITE-4054 is much more serious because we have a NPE. The problem here is when the transient table is added to the schema; and as I see it, the only possible solution to guarantee that the table will be added on time, and will be accessible for all the operators inside RepeatUnion's inputs is putting the addTable logic inside the RepeatUnion itself (this guarantees fixing CALCITE-4054). Following this logic, my understanding is that analogously the removeTable logic should be put in the RepeatUnion's close (i.e. the operator will take responsibility for adding/removing its temporary table). This solution guarantees a proper (and singly executed) add/remove table.

It is true that this change puts some extra burden on the RepeatUnion, and it makes it less generic and potentially reusable on other use cases (and this was one of our main concerns when we first designed this feature). However, I think it is more important that the primary RepeatUnion use case (recursive union) is able to perform flawless. In any case, at this point this main (and only?) RepeatUnion use case is very tight to its transient table (see RelBuilder#repeatUnion, without the valid table the RepeatUnion cannot be created). If in the future this changes, and we see another use case for the RepeatUnion that does not require the transient table, we could re-work the operator, e.g. we could make the new RepeatUnion's transientTable nullable (to be used in the recursive union scenario; and null in this theoretical new scenario), and adapt the code generation in EnumerableRepeatUnion#implement (if transientTable != null generate the addTable / removeTable code, otherwise not); but at this point IMO it is more important to fix the current use case than thinking about operator re-usage in potential new use cases (bearing in mind that nothing is set on stone, all the operators involved in this feature are @Experimental, so they could be re-worked in the future).

@zabetak
Copy link
Member

zabetak commented Jan 17, 2022

It's true that the RelBuilder API around RepeatUnion is tight to a table (it may not even be a TransientTable but let's not focus on this right now) but the javadoc of RepeatUnion and its implementation (EnumerableRepeatUnion) do not mention at all tables, spools, or other operators. If you want to change the contract maybe it would be better to send an email to the dev list to see what people think and if there are others using RepeatUnion in some other way.

Apart from that, it might worth brainstorming a bit if the problem could be solved by handling the addition/removal of transient tables at another level and not during the execution of some specific operator; I don't have ideas at the moment 😆

@rubenada
Copy link
Contributor Author

@zabetak @thomasrebele I would like to get these issues fixed in the next release 1.30.
The proposed PR might not be perfect, but I'm not sure we will have much time for a deeper refactoring. On the positive side, this patch fixes 2 problems, and maintains RelBuilder API (so it is backwards compatible), so I think it is an improvement compared to the current status (and it is no hindrance to potential further work / re-designs around this feature in future versions). Wdyt?

Copy link
Member

@zabetak zabetak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments that may be worth addressing now otherwise LGTM.

Comment on lines 67 to 73
/**
* Transient table where repeat union's intermediate results will be stored.
*/
protected final RelOptTable transientTable;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to update the class javadoc based on this change to explain how is this table supposed to be used? Also can this be null (optional) in case some concrete implementations do not want to make use of it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field converted into nullable.
I do not think the class javadoc needs to be change, I prefer to keep its high-level description as it is (the info about the table can be found in the field javadoc or looking at its usage e.g. in EnumerableRepeatUnion)

+ " EnumerableTableSpool(readType=[LAZY], writeType=[LAZY], table=[[#DELTA#]])\n"
+ " EnumerableCalc(expr#0..8=[{inputs}], empid=[$t4], name=[$t6])\n"
+ " EnumerableCorrelate(correlation=[$cor1], joinType=[inner], requiredColumns=[{1}])\n"
// It is important to have EnumerableCorrelate + #DELTA# table scan on its right
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An explanation of why is it important would be helpful.

Copy link
Contributor Author

@rubenada rubenada Feb 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added clarification in the comment

@thomasrebele
Copy link
Contributor

It would be nice if the SchemaPlus.removeTable(String) could be avoided. As there is no easy way around this: LGTM.

@rubenada rubenada force-pushed the CALCITE-4054_CALCITE-3673_FixRepeatUnion branch from 51cb089 to f7e7834 Compare February 14, 2022 15:05
[CALCITE-4054] RepeatUnion containing a Correlate with a transientScan on its RHS causes NPE
@rubenada rubenada force-pushed the CALCITE-4054_CALCITE-3673_FixRepeatUnion branch from f7e7834 to 48009f1 Compare February 14, 2022 15:07
@rubenada
Copy link
Contributor Author

Thanks for the feedback. I have addressed the suggestions and rebased.
If no new remarks come up, I plan to merge the PR in the coming days.

@rubenada rubenada added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label Feb 15, 2022
@rubenada rubenada merged commit 9c4f3bb into apache:master Feb 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LGTM-will-merge-soon Overall PR looks OK. Only minor things left.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants