Skip to content

[CALCITE-5997] OFFSET operator is incorrectly unparsed #3423

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 20, 2023

Conversation

mihaibudiu
Copy link
Contributor

No description provided.

@mihaibudiu
Copy link
Contributor Author

@tanclary I think you have added the "offset" field, so perhaps you want to review this PR. Thank you.

@@ -78,7 +78,13 @@ public SqlItemOperator(String name,
SqlWriter writer, SqlCall call, int leftPrec, int rightPrec) {
call.operand(0).unparse(writer, leftPrec, 0);
final SqlWriter.Frame frame = writer.startList("[", "]");
call.operand(1).unparse(writer, 0, 0);
if (offset == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to add something similar for the other operators

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

safe_offset, safe_ordinal etc

Copy link
Contributor Author

@mihaibudiu mihaibudiu Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, Calcite currently treats a simple array index as a SAFE_ORDINAL index, as the comment on SqlOperator ITEM says:

   /** The SQL standard calls the ARRAY variant a
   * <array element reference>. Index is 1-based. The standard says
   * to raise "data exception - array element error" but we currently return
   * null. */
   public static final SqlOperator ITEM =
      new SqlItemOperator("ITEM", OperandTypes.ARRAY_OR_MAP, 1, true);

The last true argument is for the safe parameter.

If we take this approach, a raw index is unparsed as a SAFE_ORDINAL, reflecting the implementation semantics.
If you want to preserve the original syntax we need to carry additional information in the SqlItemOperator.
What is the right solution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, should the following test pass?

@Test void testOffset() {
    sql("SELECT ARRAY[2,4,6][2]")
        .ok("SELECT (ARRAY[2, 4, 6])[SAFE_ORDINAL(2)]");
    sql("SELECT ARRAY[2,4,6][OFFSET(2)]")
        .ok("SELECT (ARRAY[2, 4, 6])[OFFSET(2)]");
    sql("SELECT ARRAY[2,4,6][SAFE_OFFSET(2)]")
        .ok("SELECT (ARRAY[2, 4, 6])[SAFE_OFFSET(2)]");
    sql("SELECT ARRAY[2,4,6][SAFE_ORDINAL(2)]")
        .ok("SELECT (ARRAY[2, 4, 6])[SAFE_ORDINAL(2)]");
  }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would imagine you do not want to unparse with SAFE_ORDINAL unless you are using BigQuery as it is dialect-specific syntax (to my knowledge). The comment you mentioned- it seems like a good time to address that as well if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BigQuerySqlDialect already handles this correctly.
The bug I am fixing was caused by SqlOperatorTest, where tests look like this:

final SqlOperatorFixture f0 = fixture();
    f0.setFor(SqlLibraryOperators.SAFE_ORDINAL);
    final SqlOperatorFixture f = f0.withLibrary(SqlLibrary.BIG_QUERY);
    f.checkScalar("ARRAY[2,4,6][SAFE_ORDINAL(3)]", "6", "INTEGER");    

Notice that this test only sets up the library, but not the dialect. In a new test fixture that I am writing, which unparses and reparses the query before executing, this test fails because the query is not unparsed correctly. In this case the unparsing does not go through the dialect.

I don't see a way to set the dialect in the SqlOperatorFixture. Do you have a suggestion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover, the SQL parser will happily parse the SAFE_* functions without the big query dialect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you can create the fixture with a specified parser config (see withParserConfig()) which I believe certain dialects like BigQuery have default parser configs. Perhaps you could pull on this thread a bit, the functionality may already be there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but I still need to unparse such a query when the dialect is NOT specified.
The parser accepts SAFE_ORDINAL, so the unparser should too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see what you're saying. Yeah I believe when I originally added this cluster of functions that I essentially pair the OFFSET operator with ITEM because at the time they seemed identical. Admittedly I did not give enough thought to the unparsing so maybe treating them more independently would help also, given that that approach seems to work for most other BQ-specific operators.

@mihaibudiu
Copy link
Contributor Author

I hope I have found an adequate solution.
The BIG_QUERY dialect handled unparsing correctly, but since SAFE_ORDINAL and other related functions can be parsed with a standard SQL parser, there must be a way to unparse them even when the BIG_QUERY dialect is not being used.


public SqlItemOperator(String name,
SqlSingleOperandTypeChecker operandTypeChecker,
int offset, boolean safe) {
int offset, boolean safe, boolean simple) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think String funcId is more valuable than boolean simple.

// Technically these functions are extensions to standard SQL,
// however, the standard SQL parser accepts them,
// so there must be a way to unparse them.
String function = safe ? "SAFE_OFFSET" : "OFFSET";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have String funcId, we can write the code

if (funcId != null) {
      final SqlWriter.Frame offsetFrame = writer.startFunCall(funcId);
      call.operand(1).unparse(writer, 0, 0);
      writer.endFunCall(offsetFrame);
}
else {
      call.operand(1).unparse(writer, 0, 0);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much better, thank you.

@@ -725,6 +725,37 @@ protected static SortedSet<String> keywords(@Nullable String dialect) {
+ ".*");
}

/** Test case for <a href="https://issues.apache.org/jira/browse/CALCITE-5997">[CALCITE-5997]</a>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

IMO, a tag should contain Jira summary.

@JiajunBernoulli
Copy link
Contributor

I think the current PR looks good.

@tanclary, Do you have any suggestions?

@tanclary
Copy link
Contributor

@JiajunBernoulli @mihaibudiu No, I think this looks good. I appreciate you opening the bug/fix. If you squash down to one commit I would be happy to merge.

@mihaibudiu
Copy link
Contributor Author

@JiajunBernoulli @mihaibudiu No, I think this looks good. I appreciate you opening the bug/fix. If you squash down to one commit I would be happy to merge.

done

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

100.0% 100.0% Coverage
0.0% 0.0% Duplication

@JiajunBernoulli JiajunBernoulli added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label Sep 19, 2023
@JiajunBernoulli JiajunBernoulli merged commit 9ce7077 into apache:main Sep 20, 2023
@mihaibudiu mihaibudiu deleted the issue5997 branch September 20, 2023 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LGTM-will-merge-soon Overall PR looks OK. Only minor things left.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants