Skip to content

[CALCITE-3830] The ‘approximate’ field should be considered when computing the digest of AggregateCall #1835

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

cshuo
Copy link

@cshuo cshuo commented Feb 27, 2020

In planner optimization, the digest of Aggregate node contains digest of its AggregateCall, i.e. AggregateCall.toString, but currently the 'approximate' filed of AggregateCall is not considered in toString() method, which may leads to the situation two different relNodes are considered as identical in planner optimizing phase.

Here is an example:

select * from (
  select a, count(distinct b) from T group by a
  union all
  select a, approx_count_distinct(b) from T group by a
)

After applying a rule, the plan is:

LogicalSink(name=[_DataStreamTable_1], fields=[a, EXPR$1], __id__=[96])
+- LogicalProject(a=[$0], EXPR$1=[$1], __id__=[94])
   +- LogicalUnion(all=[true], __id__=[92])
      :- LogicalAggregate(group=[{0}], EXPR$1=[COUNT(DISTINCT $1)], __id__=[89])
      :  +- LogicalTableScan(table=[[default, _DataStreamTable_2]], __id__=[100])
      +- LogicalAggregate(group=[{0}], EXPR$1=[COUNT(DISTINCT $1)], __id__=[89])
         +- LogicalTableScan(table=[[default, _DataStreamTable_2]], __id__=[100])

As showing in the example, after optimizing, these two Aggregates are considered as identical (both with 89 as relNode ID).

@cshuo
Copy link
Author

cshuo commented Feb 27, 2020

Many plan tests containing 'APPROX_COUNT_DISTINCT' may be failing, I'm fixing this...

// currently approximate = true is only for 'APPROX_COUNT_DISTINCT'
if (approximate && distinct) {
buf.append("APPROX_COUNT_DISTINCT");
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with Julian, we should not modify the agg call name which is a unique identifier of the function, better to add some distinct flag to the Aggregate digest instead:

public RelWriter explainTerms(RelWriter pw) {
    // We skip the "groups" element if it is a singleton of "group".
    super.explainTerms(pw)
        .item("group", groupSet)
        .itemIf("groups", groupSets, getGroupType() != Group.SIMPLE)
        .itemIf("aggs", aggCalls, pw.nest());
    if (!pw.nest()) {
      for (Ord<AggregateCall> ord : Ord.zip(aggCalls)) {
        pw.item(Util.first(ord.e.name, "agg#" + ord.i), ord.e);
      }
    }
    return pw;
  }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to add some distinct flag to AggregateCall rather than Aggregate, because 'approximate' is a property of AggregateCall, not Aggregate. What do you think, Danny?

Copy link
Contributor

@danny0405 danny0405 Feb 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either is fine to me, go with your choice. BTW, that format of the approximate call would be like ? approximate call_name(args) ? or call_name()... filter $0 approximate ?

@danny0405 danny0405 added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label Feb 28, 2020
@danny0405 danny0405 closed this in cdd141d Feb 28, 2020
danny0405 pushed a commit that referenced this pull request Mar 2, 2020
…uting the digest of AggregateCall

Add 'APPROXIMATE' keyword for the agg call to make the digest different
with call that isn't approximate.

close #1835

(cherry picked from commit cdd141d)
Aaaaaaron pushed a commit to Aaaaaaron/calcite that referenced this pull request Jun 21, 2020
…uting the digest of AggregateCall

Add 'APPROXIMATE' keyword for the agg call to make the digest different
with call that isn't approximate.

close apache#1835

(cherry picked from commit cdd141d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LGTM-will-merge-soon Overall PR looks OK. Only minor things left.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants