Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
JoinConditionOrExpansionRule transforms a join with OR conditions into a UNION ALL of multiple joins.
For example, the SQL:
SELECT * FROM emp JOIN dept ON emp.deptno = dept.deptno OR emp.deptno = dept.mgr
Will be transformed into:
SELECT * FROM emp JOIN dept ON emp.deptno = dept.deptno UNION ALL SELECT * FROM emp JOIN dept ON emp.deptno = dept.mgr
The original plan:
LogicalJoin(condition=[OR(=(deptno, deptno), =(deptno, mgr))]) LogicalTableScan(table=emp) LogicalTableScan(table=dept)
Is transformed to:
LogicalUnion(all=[true])
LogicalJoin(condition=[=(deptno, deptno)])
LogicalTableScan(table=emp)
LogicalTableScan(table=dept)
LogicalJoin(condition=[=(deptno, mgr)])
LogicalTableScan(table=emp)
LogicalTableScan(table=dept)
Benefits:
- Each individual join can use hash-join algorithm, which is more efficient than nested-loop joins required for OR conditions.
- Each join branch can independently choose its optimal join implementation.
- The joins can be executed in parallel.
- Individual joins may be able to use indexes that would not be usable with OR conditions.
The rule only fires when:
- The OR condition contains only simple equi-join conditions (comparing columns between left and right inputs).
Limitations and considerations:
- May not improve performance if individual joins produce large results that need to be union-ed.
- Multiple hash tables may cause memory pressure.
- The optimizer should consider statistics and cost estimates when applying this rule.
supplement:
- The inner join will perform the conversion as described above.
- The outer join will perform the conversion more complexly.
left join = inner join + anti join
Project[*] └── Join[OR(t1.id=t2.id, t1.age=t2.age), left] ├── TableScan[t1] └── TableScan[t2] to Project[*] └── UnionAll ├── Join[t1.id=t2.id, inner] │ ├── TableScan[t1] │ └── TableScan[t2] ├── Join[t1.age=t2.age AND t1.id≠t2.id, inner] │ ├── TableScan[t1] │ └── TableScan[t2] └── Project[t1-side cols + NULLs] └── Join[t1.id=t2.id, anti] ├── Join[t1.age=t2.age, anti] │ ├── TableScan[t1] │ └── TableScan[t2] └── TableScan[t2]
full join = inner join + anti join + right anti join
Project[*] └── Join[OR(t1.id=t2.id, t1.age=t2.age), full] ├── TableScan[t1] └── TableScan[t2] to Project[*] └── UnionAll ├── Join[t1.id=t2.id, inner] │ ├── TableScan[t1] │ └── TableScan[t2] ├── Join[t1.age=t2.age AND t1.id≠t2.id, inner] │ ├── TableScan[t1] │ └── TableScan[t2] ├── Project[t1-side cols + NULLs] │ └── Join[t1.id=t2.id, anti] │ ├── Join[t1.age=t2.age, anti] │ │ ├── TableScan[t1] │ │ └── TableScan[t2] │ └── TableScan[t2] └── Project[NULLs + t2-side cols] └── Join[t2.id=t1.id, anti] ├── Join[t2.age=t1.age, anti] │ ├── TableScan[t2] │ └── TableScan[t1] └── TableScan[t1]
Attachments
Attachments
Issue Links
- is related to
-
CALCITE-4990 Add SetOpFilterMergeRule for Converting UNION with same inputs but different filters to single input with OR Filter
-
- Open
-
- relates to
-
CALCITE-6966 Change JoinConditionOrExpansionRule name and accept more predicates that will allow the expansion to be performed
-
- Closed
-
- links to