Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-6846

Support basic DPhyp join reorder algorithm

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.38.0
    • 1.39.0
    • core

    Description

      Supports the basic dphyp join reorder algorithm.
      For example :

      SELECT
          i_item_id
      FROM store_sales, customer_demographics, date_dim, item, promotion
      WHERE ss_sold_date_sk = d_date_sk AND
          ss_item_sk = i_item_sk AND
          ss_cdemo_sk = cd_demo_sk AND
          ss_promo_sk = p_promo_sk 

      The plan tree after pushing down filter :

      LogicalProject(i_item_id=[$61])
        LogicalJoin(condition=[=($7, $82)], joinType=[inner])
          LogicalJoin(condition=[=($1, $60)], joinType=[inner])
            LogicalJoin(condition=[=($22, $32)], joinType=[inner])
              LogicalJoin(condition=[=($3, $23)], joinType=[inner])
                LogicalTableScan(table=[[tpcds, store_sales]])
                LogicalTableScan(table=[[tpcds, customer_demographics]])
              LogicalTableScan(table=[[tpcds, date_dim]])
            LogicalTableScan(table=[[tpcds, item]])
          LogicalTableScan(table=[[tpcds, promotion]])

      Convert Joins into one HyperGraph :

      LogicalProject(i_item_id=[$61])
        HyperGraph(edges=[{0}——INNER——{1},{0}——INNER——{2},{0}——INNER——{3},{0}——INNER——{4}])
          LogicalTableScan(table=[[tpcds, store_sales]])
          LogicalTableScan(table=[[tpcds, customer_demographics]])
          LogicalTableScan(table=[[tpcds, date_dim]])
          LogicalTableScan(table=[[tpcds, item]])
          LogicalTableScan(table=[[tpcds, promotion]]) 

      After dphyp join reorder (with trimming fields and pushing down Project), the plan is :

      LogicalProject(i_item_id=[$1])
        LogicalJoin(condition=[=($0, $2)], joinType=[inner])
          LogicalProject(ss_cdemo_sk=[$0], i_item_id=[$2])
            LogicalJoin(condition=[=($1, $3)], joinType=[inner])
              LogicalProject(ss_cdemo_sk=[$1], ss_sold_date_sk=[$2], i_item_id=[$4])
                LogicalJoin(condition=[=($0, $3)], joinType=[inner])
                  LogicalProject(ss_item_sk=[$0], ss_cdemo_sk=[$1], ss_sold_date_sk=[$3])
                    LogicalJoin(condition=[=($2, $4)], joinType=[inner])
                      LogicalProject(ss_item_sk=[$1], ss_cdemo_sk=[$3], ss_promo_sk=[$7], ss_sold_date_sk=[$22])
                        LogicalTableScan(table=[[tpcds, store_sales]])
                      LogicalProject(p_promo_sk=[$0])
                        LogicalTableScan(table=[[tpcds, promotion]])
                  LogicalProject(i_item_sk=[$0], i_item_id=[$1])
                    LogicalTableScan(table=[[tpcds, item]])
              LogicalProject(d_date_sk=[$0])
                LogicalTableScan(table=[[tpcds, date_dim]])
          LogicalProject(cd_demo_sk=[$0])
            LogicalTableScan(table=[[tpcds, customer_demographics]]) 

      The main enumeration process of dphyp will be implemented in pr. However, it only can process inner join for now and the simplification of hypergraph has not yet been implemented.

      Attachments

        Issue Links

          Activity

            People

              dongsl Silun Dong
              dongsl Silun Dong
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: