SlideShare a Scribd company logo
2
Most read
3
Most read
4
Most read
Apache Calcite Overview 
Julian Hyde Julian Hyde 
Page 1 © Hortonworks Inc. 2014 
Kylin Meetup (eBay, San Jose) 
December 4th, 2014
Apache Calcite 
Apache incubator project since May, 2014 
Originally named Optiq 
Query planning framework 
Relational algebra, rewrite rules, cost model 
Extensible 
Packaging 
Library (JDBC server optional) 
Community-authored rules, adapters 
Adoption 
Embedded: Lingual (SQL interface to Cascading), Apache Drill, Apache Hive, Apache Kylin 
Adapters: Splunk, Spark, MongoDB, JDBC, CSV, JSON, Web tables, In-memory, Phoenix 
Page 2 © Hortonworks Inc. 2014
Conventional DB architecture 
Page 3 © Hortonworks Inc. 2014
Calcite architecture 
Page 4 © Hortonworks Inc. 2014
Expression tree 
Splunk 
Table: splunk 
MySQL 
Page 5 © Hortonworks Inc. 2014 
SELECT p.“product_name”, COUNT(*) AS c 
FROM “splunk”.”splunk” AS s 
JOIN “mysql”.”products” AS p 
ON s.”product_id” = p.”product_id” 
WHERE s.“action” = 'purchase' 
GROUP BY p.”product_name” 
ORDER BY c DESC 
Key: product_id 
join 
Key: product_name 
Agg: count 
group 
Condition: 
action = 
'purchase' 
filter 
Key: c DESC 
sort 
scan 
scan 
Table: products
Expression tree 
(optimized) 
Splunk 
Table: splunk 
Page 6 © Hortonworks Inc. 2014 
SELECT p.“product_name”, COUNT(*) AS c 
FROM “splunk”.”splunk” AS s 
JOIN “mysql”.”products” AS p 
ON s.”product_id” = p.”product_id” 
WHERE s.“action” = 'purchase' 
GROUP BY p.”product_name” 
ORDER BY c DESC 
Key: product_id 
join 
Key: product_name 
Agg: count 
group 
Condition: 
action = 
'purchase' 
filter 
Key: c DESC 
sort 
scan 
MySQL 
scan 
Table: products
Defining a rule 
Page 7 © Hortonworks Inc. 2014 
class FilterIntoJoinRule extends RelOptRule { 
public FilterIntoJoinRule() { 
super( 
operand(Filter.class, 
operand(Join.class, any()))); 
} 
public void onMatch(RelOptRuleCall call) { 
Filter filter = call.rel(0); 
Join join = call.rel(1); 
Filter newFilter = ...; 
Join newJoin = ...; 
call.transformTo(newJoin); 
} 
} 
Filter 
Join Filter’ 
Join’ 
R1 R2 R1 R2
Calcite – APIs and SPIs 
Relational algebra 
RelNode (operator) 
• Scan 
• Filter 
• Project 
• Union 
• Aggregate 
• … 
RelDataType (type) 
RexNode (expression) 
RelTrait (physical property) 
• RelConvention (calling-convention) 
• RelCollation (sortedness) 
• TBD (bucketedness/distribution) JDBC driver 
Page 8 © Hortonworks Inc. 2014 
Cost, statistics 
RelOptCost 
RelOptCostFactory 
RelMetadataProvider 
• RelMdColumnUniquensss 
• RelMdDistinctRowCount 
• RelMdSelectivity 
SQL parser 
SqlNode 
SqlParser 
SqlValidator 
Transformation rules 
RelOptRule 
• MergeFilterRule 
• PushAggregateThroughUni 
onRule 
• RemoveCorrelationForScal 
arProjectRule 
• 100+ more 
Unification (materialized view) 
Column trimming 
Metadata 
Schema 
Table 
Function 
• TableFunction 
• TableMacro
Thank you! 
@julianhyde 
http://calcite.incubator.apache.org/ 
Page 9 © Hortonworks Inc. 2014

More Related Content

PPTX
Apache Spark Architecture
PDF
Apache Calcite: One planner fits all
PDF
Introduction to Apache Calcite
PDF
Apache Calcite Tutorial - BOSS 21
PDF
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
PPTX
Hive + Tez: A Performance Deep Dive
PDF
Apache Calcite (a tutorial given at BOSS '21)
PDF
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Apache Spark Architecture
Apache Calcite: One planner fits all
Introduction to Apache Calcite
Apache Calcite Tutorial - BOSS 21
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Hive + Tez: A Performance Deep Dive
Apache Calcite (a tutorial given at BOSS '21)
Cost-based Query Optimization in Apache Phoenix using Apache Calcite

What's hot (20)

PDF
Fast federated SQL with Apache Calcite
PDF
SQL for NoSQL and how Apache Calcite can help
PDF
Apache Calcite: One Frontend to Rule Them All
PPTX
Apache Tez: Accelerating Hadoop Query Processing
PDF
ksqlDB: A Stream-Relational Database System
PDF
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
PDF
SQL on everything, in memory
PDF
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
PDF
Designing Structured Streaming Pipelines—How to Architect Things Right
PDF
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
PPTX
Processing Large Data with Apache Spark -- HasGeek
PDF
Introduction to apache spark
PDF
Write Faster SQL with Trino.pdf
PDF
Better than you think: Handling JSON data in ClickHouse
PPTX
Introduction to Storm
PPTX
ORC File - Optimizing Your Big Data
PDF
[Pgday.Seoul 2020] SQL Tuning
PDF
Introduction to Redis
PPTX
Apache Spark Fundamentals
PDF
The evolution of Apache Calcite and its Community
Fast federated SQL with Apache Calcite
SQL for NoSQL and how Apache Calcite can help
Apache Calcite: One Frontend to Rule Them All
Apache Tez: Accelerating Hadoop Query Processing
ksqlDB: A Stream-Relational Database System
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
SQL on everything, in memory
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Designing Structured Streaming Pipelines—How to Architect Things Right
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Processing Large Data with Apache Spark -- HasGeek
Introduction to apache spark
Write Faster SQL with Trino.pdf
Better than you think: Handling JSON data in ClickHouse
Introduction to Storm
ORC File - Optimizing Your Big Data
[Pgday.Seoul 2020] SQL Tuning
Introduction to Redis
Apache Spark Fundamentals
The evolution of Apache Calcite and its Community
Ad

Viewers also liked (13)

PPT
Drill / SQL / Optiq
PDF
Query mechanisms for NoSQL databases
PDF
Data Science Languages and Industry Analytics
PPTX
Options for Data Prep - A Survey of the Current Market
PPTX
Apache Arrow - An Overview
PDF
The twins that everyone loved too much
PDF
Bi on Big Data - Strata 2016 in London
PPTX
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
PPTX
Building a Virtual Data Lake with Apache Arrow
PPTX
Apache Arrow: In Theory, In Practice
PDF
Don’t optimize my queries, optimize my data!
PPTX
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
PPTX
The Impala Cookbook
Drill / SQL / Optiq
Query mechanisms for NoSQL databases
Data Science Languages and Industry Analytics
Options for Data Prep - A Survey of the Current Market
Apache Arrow - An Overview
The twins that everyone loved too much
Bi on Big Data - Strata 2016 in London
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
Building a Virtual Data Lake with Apache Arrow
Apache Arrow: In Theory, In Practice
Don’t optimize my queries, optimize my data!
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
The Impala Cookbook
Ad

Similar to Apache Calcite overview (20)

PPTX
A Smarter Pig: Building a SQL interface to Pig using Apache Calcite
PDF
ONE FOR ALL! Using Apache Calcite to make SQL smart
PDF
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
PDF
Cost-based query optimization in Apache Hive 0.14
PPTX
An Overview on Optimization in Apache Hive: Past, Present Future
PDF
Big data analytics using a custom SQL engine
PDF
phoenix-on-calcite-nyc-meetup
PPT
How to integrate Splunk with any data solution
PDF
Streaming SQL w/ Apache Calcite
PDF
Streaming SQL with Apache Calcite
PDF
Cost-based Query Optimization
PDF
phoenix-on-calcite-hadoop-summit-2016
PDF
Cost-Based query optimization
PPTX
Calcite meetup-2016-04-20
PPTX
Cost-based query optimization in Apache Hive 0.14
PPTX
An Overview on Optimization in Apache Hive: Past, Present, Future
PPT
Why is data independence (still) so important? Optiq and Apache Drill.
PPT
Optiq: a SQL front-end for everything
PDF
Enable SQL/JDBC Access to Apache Geode/GemFire Using Apache Calcite
PPT
SQL on Big Data using Optiq
A Smarter Pig: Building a SQL interface to Pig using Apache Calcite
ONE FOR ALL! Using Apache Calcite to make SQL smart
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
Cost-based query optimization in Apache Hive 0.14
An Overview on Optimization in Apache Hive: Past, Present Future
Big data analytics using a custom SQL engine
phoenix-on-calcite-nyc-meetup
How to integrate Splunk with any data solution
Streaming SQL w/ Apache Calcite
Streaming SQL with Apache Calcite
Cost-based Query Optimization
phoenix-on-calcite-hadoop-summit-2016
Cost-Based query optimization
Calcite meetup-2016-04-20
Cost-based query optimization in Apache Hive 0.14
An Overview on Optimization in Apache Hive: Past, Present, Future
Why is data independence (still) so important? Optiq and Apache Drill.
Optiq: a SQL front-end for everything
Enable SQL/JDBC Access to Apache Geode/GemFire Using Apache Calcite
SQL on Big Data using Optiq

More from Julian Hyde (20)

PPTX
Measures in SQL (SIGMOD 2024, Santiago, Chile)
PDF
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
PDF
Building a semantic/metrics layer using Calcite
PDF
Cubing and Metrics in SQL, oh my!
PDF
Adding measures to Calcite SQL
PDF
Morel, a data-parallel programming language
PDF
Is there a perfect data-parallel programming language? (Experiments with More...
PDF
Morel, a Functional Query Language
PDF
What to expect when you're Incubating
PDF
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
PDF
Efficient spatial queries on vanilla databases
PDF
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
PDF
Tactical data engineering
PDF
Don't optimize my queries, organize my data!
PDF
Spatial query on vanilla databases
PPTX
Lazy beats Smart and Fast
PDF
Data profiling with Apache Calcite
PDF
Data Profiling in Apache Calcite
PDF
Streaming SQL
PDF
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Building a semantic/metrics layer using Calcite
Cubing and Metrics in SQL, oh my!
Adding measures to Calcite SQL
Morel, a data-parallel programming language
Is there a perfect data-parallel programming language? (Experiments with More...
Morel, a Functional Query Language
What to expect when you're Incubating
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Efficient spatial queries on vanilla databases
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Tactical data engineering
Don't optimize my queries, organize my data!
Spatial query on vanilla databases
Lazy beats Smart and Fast
Data profiling with Apache Calcite
Data Profiling in Apache Calcite
Streaming SQL
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)

Recently uploaded (20)

PDF
This slide provides an overview Technology
PDF
madgavkar20181017ppt McKinsey Presentation.pdf
PDF
Google’s NotebookLM Unveils Video Overviews
PDF
Event Presentation Google Cloud Next Extended 2025
PDF
Why Endpoint Security Is Critical in a Remote Work Era?
PDF
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf
PPTX
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PDF
A Day in the Life of Location Data - Turning Where into How.pdf
PDF
Doc9.....................................
PDF
Revolutionize Operations with Intelligent IoT Monitoring and Control
PDF
Software Development Methodologies in 2025
PPTX
How Much Does It Cost to Build a Train Ticket App like Trenitalia in Italy.pptx
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
PDF
Dell Pro 14 Plus: Be better prepared for what’s coming
PDF
creating-agentic-ai-solutions-leveraging-aws.pdf
PDF
Building High-Performance Oracle Teams: Strategic Staffing for Database Manag...
PDF
Transforming Manufacturing operations through Intelligent Integrations
PDF
Enable Enterprise-Ready Security on IBM i Systems.pdf
This slide provides an overview Technology
madgavkar20181017ppt McKinsey Presentation.pdf
Google’s NotebookLM Unveils Video Overviews
Event Presentation Google Cloud Next Extended 2025
Why Endpoint Security Is Critical in a Remote Work Era?
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
GamePlan Trading System Review: Professional Trader's Honest Take
A Day in the Life of Location Data - Turning Where into How.pdf
Doc9.....................................
Revolutionize Operations with Intelligent IoT Monitoring and Control
Software Development Methodologies in 2025
How Much Does It Cost to Build a Train Ticket App like Trenitalia in Italy.pptx
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
NewMind AI Weekly Chronicles - July'25 - Week IV
Dell Pro 14 Plus: Be better prepared for what’s coming
creating-agentic-ai-solutions-leveraging-aws.pdf
Building High-Performance Oracle Teams: Strategic Staffing for Database Manag...
Transforming Manufacturing operations through Intelligent Integrations
Enable Enterprise-Ready Security on IBM i Systems.pdf

Apache Calcite overview

  • 1. Apache Calcite Overview Julian Hyde Julian Hyde Page 1 © Hortonworks Inc. 2014 Kylin Meetup (eBay, San Jose) December 4th, 2014
  • 2. Apache Calcite Apache incubator project since May, 2014 Originally named Optiq Query planning framework Relational algebra, rewrite rules, cost model Extensible Packaging Library (JDBC server optional) Community-authored rules, adapters Adoption Embedded: Lingual (SQL interface to Cascading), Apache Drill, Apache Hive, Apache Kylin Adapters: Splunk, Spark, MongoDB, JDBC, CSV, JSON, Web tables, In-memory, Phoenix Page 2 © Hortonworks Inc. 2014
  • 3. Conventional DB architecture Page 3 © Hortonworks Inc. 2014
  • 4. Calcite architecture Page 4 © Hortonworks Inc. 2014
  • 5. Expression tree Splunk Table: splunk MySQL Page 5 © Hortonworks Inc. 2014 SELECT p.“product_name”, COUNT(*) AS c FROM “splunk”.”splunk” AS s JOIN “mysql”.”products” AS p ON s.”product_id” = p.”product_id” WHERE s.“action” = 'purchase' GROUP BY p.”product_name” ORDER BY c DESC Key: product_id join Key: product_name Agg: count group Condition: action = 'purchase' filter Key: c DESC sort scan scan Table: products
  • 6. Expression tree (optimized) Splunk Table: splunk Page 6 © Hortonworks Inc. 2014 SELECT p.“product_name”, COUNT(*) AS c FROM “splunk”.”splunk” AS s JOIN “mysql”.”products” AS p ON s.”product_id” = p.”product_id” WHERE s.“action” = 'purchase' GROUP BY p.”product_name” ORDER BY c DESC Key: product_id join Key: product_name Agg: count group Condition: action = 'purchase' filter Key: c DESC sort scan MySQL scan Table: products
  • 7. Defining a rule Page 7 © Hortonworks Inc. 2014 class FilterIntoJoinRule extends RelOptRule { public FilterIntoJoinRule() { super( operand(Filter.class, operand(Join.class, any()))); } public void onMatch(RelOptRuleCall call) { Filter filter = call.rel(0); Join join = call.rel(1); Filter newFilter = ...; Join newJoin = ...; call.transformTo(newJoin); } } Filter Join Filter’ Join’ R1 R2 R1 R2
  • 8. Calcite – APIs and SPIs Relational algebra RelNode (operator) • Scan • Filter • Project • Union • Aggregate • … RelDataType (type) RexNode (expression) RelTrait (physical property) • RelConvention (calling-convention) • RelCollation (sortedness) • TBD (bucketedness/distribution) JDBC driver Page 8 © Hortonworks Inc. 2014 Cost, statistics RelOptCost RelOptCostFactory RelMetadataProvider • RelMdColumnUniquensss • RelMdDistinctRowCount • RelMdSelectivity SQL parser SqlNode SqlParser SqlValidator Transformation rules RelOptRule • MergeFilterRule • PushAggregateThroughUni onRule • RemoveCorrelationForScal arProjectRule • 100+ more Unification (materialized view) Column trimming Metadata Schema Table Function • TableFunction • TableMacro
  • 9. Thank you! @julianhyde http://calcite.incubator.apache.org/ Page 9 © Hortonworks Inc. 2014