Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Whether agg should be pushed down should be determined by the cost.
SQL:
select ename from emp where deptno = 10 intersect select ename from emp where deptno = 20
Then used rule INTERSECT_TO_DISTINCT(updated version) and AGGREGATE_UNION_TRANSPOSE in hep planner.
We can get logical plan:
LogicalProject(ENAME=[$0])
LogicalFilter(condition=[=($1, 2)])
LogicalAggregate(group=[{0}], agg#0=[$SUM0($1)])
LogicalUnion(all=[true])
LogicalAggregate(group=[{0}], agg#0=[COUNT()])
LogicalProject(ENAME=[$1])
LogicalFilter(condition=[=($7, 10)])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
LogicalAggregate(group=[{0}], agg#0=[COUNT()])
LogicalProject(ENAME=[$1])
LogicalFilter(condition=[=($7, 20)])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
Then we also use the two same rules in volcanol planner.
Final Phy Plan:
EnumerableProject(ENAME=[$0]): rowcount = 1.0, cumulative cost = {43.72500000000001 rows, 68.4 cpu, 0.0 io}, id = 85
EnumerableFilter(condition=[=($1, 2)]): rowcount = 1.0, cumulative cost = {42.72500000000001 rows, 67.4 cpu, 0.0 io}, id = 84
EnumerableAggregate(group=[{0}], agg#0=[COUNT()]): rowcount = 1.0, cumulative cost = {41.72500000000001 rows, 66.4 cpu, 0.0 io}, id = 83
EnumerableUnion(all=[true]): rowcount = 4.2, cumulative cost = {40.60000000000001 rows, 66.4 cpu, 0.0 io}, id = 82
EnumerableProject(ENAME=[$1]): rowcount = 2.1, cumulative cost = {18.200000000000003 rows, 31.1 cpu, 0.0 io}, id = 79
EnumerableFilter(condition=[=($7, 10)]): rowcount = 2.1, cumulative cost = {16.1 rows, 29.0 cpu, 0.0 io}, id = 78
EnumerableTableScan(table=[[CATALOG, SALES, EMP]]): rowcount = 14.0, cumulative cost = {14.0 rows, 15.0 cpu, 0.0 io}, id = 69
EnumerableProject(ENAME=[$1]): rowcount = 2.1, cumulative cost = {18.200000000000003 rows, 31.1 cpu, 0.0 io}, id = 81
EnumerableFilter(condition=[=($7, 20)]): rowcount = 2.1, cumulative cost = {16.1 rows, 29.0 cpu, 0.0 io}, id = 80
EnumerableTableScan(table=[[CATALOG, SALES, EMP]]): rowcount = 14.0, cumulative cost = {14.0 rows, 15.0 cpu, 0.0 io}, id = 69
We can see the best plan, the children of union do not have agg.
DAG:
Currently, Calcite does not support distributed planning. If in a distributed planning, agg will be divided into two stages. If the filtering effect in the first stage is very good, the downward push of agg will be meaningful and reduce the network transmission of shuffle. However, optimizing the current rule is also meaningful. Calcite now also has rules that can do the downward push of agg. We can give the choice to the volcano.
Attachments
Attachments
Issue Links
- Blocked
-
CALCITE-7086 Implement a rule that performs the inverse operation of AggregateCaseToFilterRule
-
- Resolved
-
- links to