Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-6691

Qualify expressions not updated when merging two Project

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.39.0
    • None

    Description

      Given a query like the below

      WITH t0 AS (
        SELECT
          name,
          salary
          FROM mytable
      ),
      t1 AS (
        SELECT
          t0.name
          FROM t0
          qualify row_number() over (partition by t0.name order by t0.salary desc) = 1
      )
      select name from t1
      

      We end up with the wrong query plan below:

      [Logical plan]
      LogicalProject(name=[$0])
        LogicalFilter(condition=[$1])
          LogicalProject(name=[$5], QualifyExpression=[=(ROW_NUMBER() OVER (PARTITION BY $0 ORDER BY $1 DESC), 1)])
            LogicalTableScan(table=[[mytable]])
      

      Under PARTITION BY $0, the $0 should be referencing t0.name which on the select list on the same node is name=[$5]. So something is off.


      After some debugging I can trace back the issue.

      Here is the query plan unsquashed:

      [Logical plan]
      LogicalProject(name=[$0])
        LogicalProject(name=[$0])
          LogicalFilter(condition=[$1])
            LogicalProject(name=[$0], QualifyExpression=[=(ROW_NUMBER() OVER (PARTITION BY $0 ORDER BY $1 DESC), 1)])
              LogicalProject(name=[$5], salary=[$6])
                LogicalTableScan(table=[[mytable]])
      

      And here it is squashed:

      [Logical plan]
      LogicalProject(name=[$0])
        LogicalFilter(condition=[$1])
          LogicalProject(name=[$5], QualifyExpression=[=(ROW_NUMBER() OVER (PARTITION BY $0 ORDER BY $1 DESC), 1)])
            LogicalTableScan(table=[[mytable]])
      

      This is the relevant section before squashing:

      (1)  LogicalProject(name=[$0], QualifyExpression=[=(ROW_NUMBER() OVER (PARTITION BY $0 ORDER BY $1 DESC), 1)])
      (2)    LogicalProject(name=[$5], salary=[$6])
      (3)      LogicalTableScan(table=[[mytable]])
      

      When it's generating (1) from SqlToRelConverter, Calcite realizes it has an opportunity to squash (1) and (2) together. It will keep (1) and delete (2). But to do that it will first have to replace references from (1) and rewrite them to push past the parent project (2) (See RelOptUtil#pushPastProject).

      It does this in steps, first it goes over the projected columns (in this case only name) then over any qualify expressions.

      The rough logic is straightforward. It takes name=[$0] from (1) and replaces that with whatever sits on index $0 on (2). So name=[$0] becomes name=[$5]. Then it removes (2) from the query plan.

      At this point we have this:

      (1)  LogicalProject(name=[$5], QualifyExpression=[=(ROW_NUMBER() OVER (PARTITION BY $0 ORDER BY $1 DESC), 1)])
      (3)      LogicalTableScan(table=[[mytable]])
      

      Now Calcite will do the same for the QualifyExpression which still has the wrong indexes. This is where it goes wrong. It eagerly removed (2) from the plan so there's no way for it to even know it there was another projection in there. It won't do anything and the indexes for QualifyExpression are now all wrong.

      More specifically right in this section inside RelBuilder#Project, frame.rel will not be an instance of Project because we just removed that project (it was (2)). It will be an instance of TableScan at this point.

      // Do not merge projection when top projection has correlation variables
          bloat:
          if (frame.rel instanceof Project
              && config.bloat() >= 0
              && variables.isEmpty()) {
            final Project project = (Project) frame.rel;
      

      I suppose the fix would be to delay removing the parent Project from the plan to after we've processed the entire Project. But I don't know that much about the project to try it. I'm happy to give it a shot if any of the maintainers point me in the right direction.

      Attachments

        Issue Links

          Activity

            People

              julianhyde Julian Hyde
              linorosa Lino Rosa
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: