Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-6804

Anti-join with WHERE NOT EXISTS syntax has corrupted condition

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.39.0
    • None

    Description

      Queries like:

      SELECT * FROM
      (
        SELECT field1 FROM table1 JOIN table2 ON table1.field1 = table2.field2
      ) selected
      WHERE NOT EXISTS (select 1 from table3 where table3.field3 = selected.field1)
      

      are being converted into

      SELECT * FROM
      (
        SELECT field1 FROM table1 JOIN table2 ON table1.field1 = table2.field2
      ) selected
      WHERE NOT EXISTS (select 1 from table3 where table3.field3 = table2.<random_field>)
      

      Example I added to RelToSqlConverterTest

        @Test void testAntiJoinWithWhereNotExists() {
          final String sql = "SELECT * FROM (select * from (select e1.\"product_id\"\n"
              + "FROM \"foodmart\".\"product\" e1 LEFT JOIN \"foodmart\".\"product\" e3 on e1.\"product_id\" = e3.\"product_id\") s where true) selected where not exists\n"
              + "(select 1 from \"foodmart\".\"product\" e2 where e2.\"product_id\" = selected.\"product_id\")";
      
      
          final String expected = "SELECT *\n" +
              "FROM (SELECT \"product\".\"product_id\"\n" +
              "FROM \"foodmart\".\"product\"\n" +
              "LEFT JOIN \"foodmart\".\"product\" AS \"product0\" ON \"product\".\"product_id\" = \"product0\".\"product_id\"" +
              ") AS \"t\"\n" +
              "WHERE EXISTS (SELECT *\nFROM \"foodmart\".\"product\"\nWHERE \"product_id\" = \"t\".\"product_class_id\")"
              ;
          sql(sql).ok(expected);
        }
      
      Expected: is "SELECT ...) AS \"t\"\nWHERE EXISTS (... WHERE \"product_id\" = \"t\".\"product_id\")"
          but: was "SELECT ...) AS \"t\"\nWHERE EXISTS (... WHERE \"product_id\" = \"product1\".\"product_class_id\")"
      

      product1 is generated alias for a query from one of sub-queries, and product_class_id is a field from that misused table scan.

      My high level understanding is that query with WHERE NOT EXISTS syntax is considered as LogicalFilter and appropriate pieces of code (like AliasReplacementShuttle and visitAntiOrSemiJoin) are not invoked.

      And visit of Filter node builds alias context inappropriately.

      Directions I am trying:

      • Duplicate antiJoin visit under the scope of filter visit.
      • Explicit rule to convert Filter to Join manually. Feels artificial because it planner should be triggered, it requires a convention.

      Attachments

        Issue Links

          Activity

            People

              antonkw Anton Kovalevsky
              antonkw Anton Kovalevsky
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: