Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.39.0
Description
Given a query such as the below:
WITH window_cte AS ( SELECT foo, SUM(baz) OVER (PARTITION BY foo) AS window_agg FROM sample_data ) SELECT SUM(window_agg) AS bazzy FROM window_cte
When `baz` is optional, Calcite will wrap the inner function within a `CASE` statement. Something like:
CASE WHEN ( COUNT(baz) OVER ( PARTITION BY foo RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) ) > 0 THEN COALESCE( SUM(baz) OVER ( PARTITION BY foo ORDER BY RAND () NULLS LAST RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ), 0 ) ELSE NULL END
Nothing intrinsically wrong with that substitution. However because of it the inner aggregation function `SUM(baz) OVER...` has become deeply nested under the `CASE` statement.
Where that becomes an issue is inside `SqlImplemtor#Result#hasNested`. That function is supposed to look for nested aggregates, but it only looks at the top level. In this case it'll stop at the `CASE` statement, erroneously concluding there's no nested aggregate. In the end it'll collapse the cte from the original query into a `SUM(CASE(... SUM(...)))` which will fail running in dialects such as Spark.
Attachments
Issue Links
- links to