Skip to content

[CALCITE-6435] SqlToRel conversion of IN expressions may lead to incorrect simplifications #3835

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 11, 2024

Conversation

kgyrtkirk
Copy link
Member

Conversion path for comparisions generated from IN expressions was handling types differently. This may have lead to some over-simplification in some cases.

Altered the conversion to do the full SqlToRex conversion steps for these generated nodes as well. Added an extra safeguard check to RexSimplify to prevent the bug from being triggered.

@kgyrtkirk kgyrtkirk force-pushed the literal-type-mismtach3 branch from b1d979c to 84069b2 Compare June 27, 2024 17:19

if (prevLiteral != null
&& literal.getType().equals(prevLiteral.getType())
&& !literal.equals(prevLiteral)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so two literals can be equal but their types can be different?
one could argue that this is a bug in SqlLiteral::equals.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the situation was caused by two literals which had the same value; but different types (VARCHAR; CHAR(n))

actually the predicate based simplification also does this (beyond other things); but there are situations in which its disabled.

Copy link
Contributor

@YiwenWu YiwenWu Jun 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think VARCHAR and CHAR should be considered equivalent if RexLiteral.value is the same

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CHAR(N) indicates that the literal has to be padded with spaces, whereas VARCHAR indicates that trailing spaces are to be ignored. So two literals with identical strings but different types can have different values.

@@ -1030,7 +1030,7 @@ private void checkGroupBySingleSortLimit(boolean approx) {
+ "intervals=[[1900-01-09T00:00:00.000Z/2992-01-10T00:00:00.000Z]], "
+ "filter=[AND("
+ "=($3, 'High Top Dried Mushrooms'), "
+ "SEARCH($87, Sarg['Q2', 'Q3']:CHAR(2)), "
+ "SEARCH($87, Sarg['Q2':VARCHAR, 'Q3':VARCHAR]:VARCHAR), "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this change surprising.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was surprised too ; however this is the same kind of expressions SqlToRel would produce if the input would have been a set of comparisison ; so the IN rewrite now matches that

@kgyrtkirk kgyrtkirk force-pushed the literal-type-mismtach3 branch from 84069b2 to f08ddbf Compare June 28, 2024 07:27
@mihaibudiu
Copy link
Contributor

Let's wait a bit to see whether anyone else has comments.

…rrect simplifications

Conversion path for comparisions generated from IN expressions was handling types differently.
This may have lead to some over-simplification in some cases.

Altered the conversion to do the full SqlToRex conversion steps for these generated nodes as well.
Added an extra safeguard check to RexSimplify to prevent the bug from being triggered.
@kgyrtkirk kgyrtkirk force-pushed the literal-type-mismtach3 branch from f08ddbf to 0a4a974 Compare June 29, 2024 06:45
Copy link

@YiwenWu
Copy link
Contributor

YiwenWu commented Jun 29, 2024

I think it can be simpler to modify the judgment of RexSimplify#simplifyAnd2ForUnknownAsFalse?

if (prevLiteral != null && !literal.equalsWithTypeName(prevLiteral)) {
     return rexBuilder.makeLiteral(false);
 }

where RexLiteral#equalsWithTypeName compares to TypeName instead of Type in RexLiteral#equals . In this case, varchar Sebastian and char Sebastian can be equivalent.

@kgyrtkirk
Copy link
Member Author

I think it can be simpler to modify the judgment of RexSimplify#simplifyAnd2ForUnknownAsFalse?

if (prevLiteral != null && !literal.equalsWithTypeName(prevLiteral)) {
     return rexBuilder.makeLiteral(false);
 }

where RexLiteral#equalsWithTypeName compares to TypeName instead of Type in RexLiteral#equals . In this case, varchar Sebastian and char Sebastian can be equivalent.

I don't see any such method in
RexLiteral did I missed it?

I would be kinda against of the introduction of such method because it would place custom type comparision logic inside a Rex implementation...

I might have not mentioned this here...but
weakening these in RexSimplify has no adverse effects because predicate based simplification can do this too(and more) ; I think these map based things can even be removed....

@@ -1753,7 +1756,7 @@ private <C extends Comparable<C>> RexNode simplifyAnd2ForUnknownAsFalse(
if (literal2 == null) {
continue;
}
if (!literal1.equals(literal2)) {
if (literal1.getType().equals(literal2.getType()) && !literal1.equals(literal2)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm puzzled about the necessity of adding extra type equals checks. What specific issues might arise without them in SQL simplification?

// TODO: remove requireNonNull when checkerframework issue resolved
ensureSqlType(requireNonNull(pair.left, "pair.left").getType(),
bb.convertExpression(pair.right)))));
RexUtil.composeConjunction(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the focus solely on the SQL IN case, or might there be a need for adjustments elsewhere too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really; this conjunction is there for complex expressions; which will be there for NOT IN as well:

(a,b) NOT IN ((1,2), (3,4))
NOT ((a=1 && b=2) || (a=3 && b=4) )

@YiwenWu
Copy link
Contributor

YiwenWu commented Jun 30, 2024

I think it can be simpler to modify the judgment of RexSimplify#simplifyAnd2ForUnknownAsFalse?

if (prevLiteral != null && !literal.equalsWithTypeName(prevLiteral)) {
     return rexBuilder.makeLiteral(false);
 }

where RexLiteral#equalsWithTypeName compares to TypeName instead of Type in RexLiteral#equals . In this case, varchar Sebastian and char Sebastian can be equivalent.

I don't see any such method in RexLiteral did I missed it?

I would be kinda against of the introduction of such method because it would place custom type comparision logic inside a Rex implementation...

I might have not mentioned this here...but weakening these in RexSimplify has no adverse effects because predicate based simplification can do this too(and more) ; I think these map based things can even be removed....

I think it can be simpler to modify the judgment of RexSimplify#simplifyAnd2ForUnknownAsFalse?

if (prevLiteral != null && !literal.equalsWithTypeName(prevLiteral)) {
     return rexBuilder.makeLiteral(false);
 }

where RexLiteral#equalsWithTypeName compares to TypeName instead of Type in RexLiteral#equals . In this case, varchar Sebastian and char Sebastian can be equivalent.

I don't see any such method in RexLiteral did I missed it?

I would be kinda against of the introduction of such method because it would place custom type comparision logic inside a Rex implementation...

I might have not mentioned this here...but weakening these in RexSimplify has no adverse effects because predicate based simplification can do this too(and more) ; I think these map based things can even be removed....

Currently, there is no RexLiteral#equalsWithTypeName method.

It seems two distinct approaches. This PR handles CHAR to VARCHAR conversion early in SqlToRelConverter , enabling direct conjunction in the simplification stage. Alternatively, refine the simplifyAnd2ForUnknownAsFalse method to bypass false judgments for RexLiteral with identical value and typeName(SqlTypeName).

There is no problem based on the approach of this PR. Only two small points that need to be confirmed:

  1. Addressing range concerns, the CHAR to VARCHAR conversion is specific to IN conditions?
  2. Is an update to simplifyAnd2ForUnknownAsFalse necessary, what scenarios would benefit from this change?

@kgyrtkirk
Copy link
Member Author

  1. Addressing range concerns, the CHAR to VARCHAR conversion is specific to IN conditions?

the issue arised from the fact that SqlToRel had a different route to produce = checks when IN was rewritten to a set of OR conditions; this PR only makes that extra route go away - and shows the converter the full = expression to do the same conversion as it would have done in case it would have been part of the original query

  1. Is an update to simplifyAnd2ForUnknownAsFalse necessary, what scenarios would benefit from this change?

not mandatory for the current bug ; as 1 solves that ; but I would rather restrict the cases in which RexSimplify could over-simplify things; as its a quite frequently called part.

@kgyrtkirk kgyrtkirk added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label Jul 9, 2024
Copy link
Member

@caicancai caicancai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kgyrtkirk kgyrtkirk merged commit 73846cc into apache:main Jul 11, 2024
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LGTM-will-merge-soon Overall PR looks OK. Only minor things left.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants