SlideShare a Scribd company logo
7
Most read
8
Most read
10
Most read
One planner fits all


(How Apache Calcite makes it easier to
write a DBMS)
Lightning talk at XLDB 2015
Stanford, California
Julian Hyde (Hortonworks)
–Mike Stonebraker (2005)
“One size fits all” is an idea whose time has
come and gone
• Hadoop and other open source technologies
have deconstructed the DBMS
• Query parser/API + catalog + authorization +
algorithms + scheduler + engine + data format
+ storage
image credit: http://oliviaobryon.com
image credit: http://oliviaobryon.com
Interesting
Boring
Conventional

DB
Parser
Algebra


Catalog


Data
Algorithms
Apache Calcite

DB framework
Parser
Algebra
Engine


Data
Engine


Data
Engine


Data
Schema SPI
Operators,

Rules,

Statistics,
Cost model
SELECT products.name, COUNT(*)

FROM sales

JOIN products USING (productId)

WHERE sales.discount IS NOT NULL

GROUP BY products.name

ORDER BY COUNT(*) DESC
scan
[products]
scan
[sales]
join
filter
aggregate
sort
scan
[products]
scan
[sales]
filter’
join’
aggregate
sort
FilterIntoJoinRule
translate SQL
to relational
algebra
Relational algebra
• Robust
• Allows re-use
• Complex cost-based
optimization
• Multiple front-ends & back-
ends
• Not just for “flat” relations
SQL
Other

QL
API
Algebra
Rules
Engine
A
Engine
B
Thank you!
Download: http://calcite.incubator.apache.org
Use Calcite to build your next database!
Calcite powers Apache Hive, Drill, Phoenix,
Kylin
An Apache Incubator project

since May 2014
@julianhyde
What’s in the box?
• SQL parser & AST
• JDBC/ODBC framework
• Built-in operators (project,
filter, …)
• In-memory engine
• 100+ rules
• Planning engines
• Adapters (CSV, JDBC,
Mongo, …)
• Streaming SQL
• Materialized views
Apache

Calcite
Apache
Calcite

More Related Content

What's hot (20)

PDF
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Christian Tzolov
 
PPTX
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Flink Forward
 
PDF
Fast federated SQL with Apache Calcite
Chris Baynes
 
PDF
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Databricks
 
PPTX
Programming in Spark using PySpark
Mostafa
 
PDF
A Deep Dive into Query Execution Engine of Spark SQL
Databricks
 
PDF
Spark SQL
Joud Khattab
 
PDF
Understanding Query Plans and Spark UIs
Databricks
 
PDF
Deep Dive: Memory Management in Apache Spark
Databricks
 
PDF
The Parquet Format and Performance Optimization Opportunities
Databricks
 
PDF
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Databricks
 
PDF
Don’t optimize my queries, optimize my data!
Julian Hyde
 
PDF
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark Summit
 
PPTX
Apache Spark Fundamentals
Zahra Eskandari
 
PDF
Common Strategies for Improving Performance on Your Delta Lakehouse
Databricks
 
PPTX
Dynamic filtering for presto join optimisation
Ori Reshef
 
PDF
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Databricks
 
PPTX
Introduction to Apache Spark
Rahul Jain
 
ODP
Deep Dive Into Elasticsearch
Knoldus Inc.
 
PDF
Dynamic Partition Pruning in Apache Spark
Databricks
 
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Christian Tzolov
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Flink Forward
 
Fast federated SQL with Apache Calcite
Chris Baynes
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Databricks
 
Programming in Spark using PySpark
Mostafa
 
A Deep Dive into Query Execution Engine of Spark SQL
Databricks
 
Spark SQL
Joud Khattab
 
Understanding Query Plans and Spark UIs
Databricks
 
Deep Dive: Memory Management in Apache Spark
Databricks
 
The Parquet Format and Performance Optimization Opportunities
Databricks
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Databricks
 
Don’t optimize my queries, optimize my data!
Julian Hyde
 
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark Summit
 
Apache Spark Fundamentals
Zahra Eskandari
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Databricks
 
Dynamic filtering for presto join optimisation
Ori Reshef
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Databricks
 
Introduction to Apache Spark
Rahul Jain
 
Deep Dive Into Elasticsearch
Knoldus Inc.
 
Dynamic Partition Pruning in Apache Spark
Databricks
 

Viewers also liked (10)

PPTX
Options for Data Prep - A Survey of the Current Market
Dremio Corporation
 
PDF
Bi on Big Data - Strata 2016 in London
Dremio Corporation
 
PPTX
Building a Virtual Data Lake with Apache Arrow
Dremio Corporation
 
PPTX
Apache Arrow - An Overview
Dremio Corporation
 
PDF
The twins that everyone loved too much
Julian Hyde
 
PPTX
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
Dremio Corporation
 
PDF
Data Science Languages and Industry Analytics
Wes McKinney
 
PDF
SQL on everything, in memory
Julian Hyde
 
PPTX
Apache Arrow: In Theory, In Practice
Dremio Corporation
 
PPTX
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Dremio Corporation
 
Options for Data Prep - A Survey of the Current Market
Dremio Corporation
 
Bi on Big Data - Strata 2016 in London
Dremio Corporation
 
Building a Virtual Data Lake with Apache Arrow
Dremio Corporation
 
Apache Arrow - An Overview
Dremio Corporation
 
The twins that everyone loved too much
Julian Hyde
 
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
Dremio Corporation
 
Data Science Languages and Industry Analytics
Wes McKinney
 
SQL on everything, in memory
Julian Hyde
 
Apache Arrow: In Theory, In Practice
Dremio Corporation
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Dremio Corporation
 
Ad

Similar to Apache Calcite: One planner fits all (20)

PPTX
HQL over Tiered Data Warehouse
DataWorks Summit
 
PPTX
Fifth elephant-grill
amarsri
 
PPTX
Apache Kylin Extreme OLAP Engine for Big Data
Luke Han
 
PPTX
Grill at HadoopSummit
amarsri
 
PPTX
Testing Big Data: Automated Testing of Hadoop with QuerySurge
RTTS
 
PPTX
Building Modern Data Platform with Microsoft Azure
Dmitry Anoshin
 
PPTX
From Business Intelligence to Big Data - hack/reduce Dec 2014
Adam Ferrari
 
PDF
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Xu Jiang
 
PPTX
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
Lucidworks
 
PDF
ONE FOR ALL! Using Apache Calcite to make SQL smart
Evans Ye
 
PDF
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Databricks
 
PPTX
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
Chester Chen
 
PPTX
The modern analytics architecture
Joseph D'Antoni
 
PPTX
Introduction to Dremio
Dremio Corporation
 
PPTX
Apache Kylin: Hadoop OLAP Engine, 2014 Dec
Yang Li
 
PPTX
Microsoft Azure BI Solutions in the Cloud
Mark Kromer
 
PDF
Power BI / AAS Data Model Optimization 101 v2
Dan English
 
PDF
SolrCloud on Hadoop
Alex Moundalexis
 
PPTX
Apache Kylin @ Big Data Europe 2015
Seshu Adunuthula
 
PDF
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Rohit Kulkarni
 
HQL over Tiered Data Warehouse
DataWorks Summit
 
Fifth elephant-grill
amarsri
 
Apache Kylin Extreme OLAP Engine for Big Data
Luke Han
 
Grill at HadoopSummit
amarsri
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
RTTS
 
Building Modern Data Platform with Microsoft Azure
Dmitry Anoshin
 
From Business Intelligence to Big Data - hack/reduce Dec 2014
Adam Ferrari
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Xu Jiang
 
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
Lucidworks
 
ONE FOR ALL! Using Apache Calcite to make SQL smart
Evans Ye
 
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Databricks
 
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
Chester Chen
 
The modern analytics architecture
Joseph D'Antoni
 
Introduction to Dremio
Dremio Corporation
 
Apache Kylin: Hadoop OLAP Engine, 2014 Dec
Yang Li
 
Microsoft Azure BI Solutions in the Cloud
Mark Kromer
 
Power BI / AAS Data Model Optimization 101 v2
Dan English
 
SolrCloud on Hadoop
Alex Moundalexis
 
Apache Kylin @ Big Data Europe 2015
Seshu Adunuthula
 
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Rohit Kulkarni
 
Ad

More from Julian Hyde (20)

PPTX
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
PDF
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Julian Hyde
 
PDF
Building a semantic/metrics layer using Calcite
Julian Hyde
 
PDF
Cubing and Metrics in SQL, oh my!
Julian Hyde
 
PDF
Adding measures to Calcite SQL
Julian Hyde
 
PDF
Morel, a data-parallel programming language
Julian Hyde
 
PDF
Is there a perfect data-parallel programming language? (Experiments with More...
Julian Hyde
 
PDF
Morel, a Functional Query Language
Julian Hyde
 
PDF
The evolution of Apache Calcite and its Community
Julian Hyde
 
PDF
What to expect when you're Incubating
Julian Hyde
 
PDF
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Julian Hyde
 
PDF
Efficient spatial queries on vanilla databases
Julian Hyde
 
PDF
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Julian Hyde
 
PDF
Tactical data engineering
Julian Hyde
 
PDF
Don't optimize my queries, organize my data!
Julian Hyde
 
PDF
Spatial query on vanilla databases
Julian Hyde
 
PDF
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Julian Hyde
 
PPTX
Lazy beats Smart and Fast
Julian Hyde
 
PDF
Data profiling with Apache Calcite
Julian Hyde
 
PDF
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
Julian Hyde
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Julian Hyde
 
Building a semantic/metrics layer using Calcite
Julian Hyde
 
Cubing and Metrics in SQL, oh my!
Julian Hyde
 
Adding measures to Calcite SQL
Julian Hyde
 
Morel, a data-parallel programming language
Julian Hyde
 
Is there a perfect data-parallel programming language? (Experiments with More...
Julian Hyde
 
Morel, a Functional Query Language
Julian Hyde
 
The evolution of Apache Calcite and its Community
Julian Hyde
 
What to expect when you're Incubating
Julian Hyde
 
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Julian Hyde
 
Efficient spatial queries on vanilla databases
Julian Hyde
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Julian Hyde
 
Tactical data engineering
Julian Hyde
 
Don't optimize my queries, organize my data!
Julian Hyde
 
Spatial query on vanilla databases
Julian Hyde
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Julian Hyde
 
Lazy beats Smart and Fast
Julian Hyde
 
Data profiling with Apache Calcite
Julian Hyde
 
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
Julian Hyde
 

Apache Calcite: One planner fits all