SlideShare a Scribd company logo
Flink's SQL Engine:
Let's Open the Engine Room!
Timo Walther, Principal Software Engineer
2024-03-19
About me
Open Source
• Long-term committer since 2014 (before ASF)
• Member of the project management committee (PMC)
• Top 5 contributor (commits), top 1 contributor (additions)
• Among core architects of Flink SQL
Career
• Early Software Engineer @ DataArtisans (acquired by Alibaba)
• SDK Team, SQL Team Lead @ Ververica
• Co-Founder @ Immerok (acquired by Confluent)
• Principal Software Engineer @ Confluent
2
What is Apache Flink®?
Snapshots
• Backup
• Version
• Fork
• A/B test
• Time-travel
• Restore
State
• Store
• Buffer
• Cache
• Model
• Grow
• Expire
Time
• Synchronize
• Progress
• Wait
• Timeout
• Fast-forward
• Replay
Building Blocks for Stream Processing
4
Streams
• Pipeline
• Distribute
• Join
• Enrich
• Control
• Replay
Flink SQL
Flink SQL in a Nutshell
6
Properties
• Abstract the building blocks for stream processing
• Operator topology is determined by planner and optimizer
• Business logic is declared in ANSI SQL
• Internally, the engine works on binary data
• Conceptually a table, but a changelog under the hood!
SELECT 'Hello World';
SELECT * FROM (VALUES (1), (2), (3));
SELECT * FROM MyTable;
SELECT * FROM Orders o JOIN Payments p ON o.id = p.order;
How do I Work with Streams in Flink SQL?
7
• You don’t. You work with dynamic tables!
• A concept similar to materialized views
CREATE TABLE Revenue
(name STRING, total INT)
WITH (…)
INSERT INTO Revenue
SELECT name, SUM(amount)
FROM Transactions
GROUP BY name
CREATE TABLE Transactions
(name STRING, amount INT)
WITH (…)
name amount
Alice 56
Bob 10
Alice 89
name total
Alice 145
Bob 10
So, is Flink SQL a database? No, bring your own data!
Stream-Table Duality - Example
8
An applied changelog becomes a real (materialized) table.
name amount
Alice 56
Bob 10
Alice 89
name total
Alice 56
Bob 10
changelog
+I[Alice, 89] +I[Bob, 10] +I[Alice, 56] +U[Alice, 145] -U[Alice, 56] +I[Bob, 10] +I[Alice, 56]
145
materialization
CREATE TABLE Revenue
(name STRING, total INT)
WITH (…)
INSERT INTO Revenue
SELECT name, SUM(amount)
FROM Transactions
GROUP BY name
CREATE TABLE Transactions
(name STRING, amount INT)
WITH (…)
Stream-Table Duality - Example
9
An applied changelog becomes a real (materialized) table.
name amount
Alice 56
Bob 10
Alice 89
name total
Alice 56
Bob 10
+I[Alice, 89] +I[Bob, 10] +I[Alice, 56] +U[Alice, 145] -U[Alice, 56] +I[Bob, 10] +I[Alice, 56]
145
materialization
CREATE TABLE Revenue
(PRIMARY KEY(name) …)
WITH (…)
INSERT INTO Revenue
SELECT name, SUM(amount)
FROM Transactions
GROUP BY name
CREATE TABLE Transactions
(name STRING, amount INT)
WITH (…)
Save ~50% of traffic if the downstream system supports upserting!
Let's Open the Engine Room!
SQL Declaration
SQL Declaration – Variant 1 – Basic
12
-- Example tables
CREATE TABLE Transaction (ts TIMESTAMP(3), tid BIGINT, amount INT);
CREATE TABLE Payment (ts TIMESTAMP(3), tid BIGINT, type STRING);
CREATE TABLE Matched (tid BIGINT, amount INT, type STRING);
-- Join two tables based on key within time and store in target table
INSERT INTO Matched
SELECT T.tid, T.amount, P.type
FROM Transaction T JOIN Payment P ON T.tid = P.tid
WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES;
SQL Declaration – Variant 2 – Watermarks
13
-- Add watermarks
CREATE TABLE Transaction (…, WATERMARK FOR ts AS ts – INTERVAL '5' SECONDS);
CREATE TABLE Payment (…, WATERMARK FOR ts AS ts – INTERVAL '5' SECONDS);
CREATE TABLE Matched (tid BIGINT, amount INT, type STRING);
-- Join two tables based on key within time and store in target table
INSERT INTO Matched
SELECT T.tid, T.amount, P.type
FROM Transaction T JOIN Payment P ON T.tid = P.tid
WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES;
SQL Declaration – Variant 3 – Updating
14
-- Transactions can be aborted -> Result is updating
CREATE TABLE Transaction (…, PRIMARY KEY (tid) NOT ENFORCED) WITH ('changelog.mode' = 'upsert');
CREATE TABLE Payment (ts TIMESTAMP(3), tid BIGINT, type STRING);
CREATE TABLE Matched (…, PRIMARY KEY (tid) NOT ENFORCED) WITH ('changelog.mode' = 'upsert');
-- Join two tables based on key within time and store in target table
INSERT INTO Matched
SELECT T.tid, T.amount, P.type
FROM Transaction T JOIN Payment P ON T.tid = P.tid
WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES;
Every Keyword can trigger an Avalanche
CREATE TABLE
• Defines the connector (e.g., Kafka) and the changelog mode (i.e. append/retract/upsert)
àFor the engine: What needs to be processed? What should be produced to the target table?
WATERMARK FOR
• Defines a completeness marker (i.e. "I have seen everything up to time t.")
àFor the engine: Can intermediate results be discarded? Can I trigger result computation?
PRIMARY KEY
• Defines a uniqueness constraint (i.e. "Event with key k will occur one time or will be updated.")
àFor the engine: How do I guarantee the constraint for the target table?
SELECT, JOIN, WHERE
àFor the engine: What needs to be done? How much freedom do I have? How much can I optimize? 15
SQL Planning
17
Parsing & Validation
{SQL String, Catalogs, Modules, Session Config} à Calcite Logical Tree
• Break the SQL text into a tree
• Lookup catalogs, databases, tables, views, functions,
and their types
• Resolve all identifiers for columns and fields
e.g. SELECT pi, pi.pi FROM pi.pi.pi;
• Validate input/output columns and arguments/return types
Main drivers: FlinkSqlParserImpl, SqlValidatorImpl, SqlToRelConverter
Output: SqlNode then RelNode
18
Parsing & Validation
{SQL String, Catalogs, Modules, Session Config} à Calcite Logical Tree
Rule-based Logical Rewrites
à Calcite Logical Tree
• Rewrite subqueries to joins (e.g.: EXISTS or IN)
• Apply query decorrelation
• Simplify expressions
• Constant folding (e.g.: functions with literal args)
• Initial filter push down
Main drivers: FlinkStreamProgram, FlinkStreamRuleSets
Output: RelNode
19
Parsing & Validation
{SQL String, Catalogs, Modules, Session Config} à Calcite Logical Tree
Rule-based Logical Rewrites
à Calcite Logical Tree
• rewrite subqueries to joins (e.g. EXISTS or IN)
• apply query decorrelation
• simplify expressions
• constant folding (e.g. functions with literal args)
• filter push down (e.g. all the way into the source)
Main drivers: FlinkStreamProgram, FlinkBatchProgram
Output: RelNode
Cost-based Logical Optimization
à Flink Logical Tree
• Projection filter/push down (e.g.: all the way into the source)
• Push aggregate through join
• Reduce aggregate functions (e.g.: AVG -> SUM/COUNT)
• Remove unnecessary sort, aggregate, union, etc.
• …
Main drivers: FlinkStreamProgram, FlinkStreamRuleSets
Output: FlinkLogicalRel (RelNode)
20
Parsing & Validation
{SQL String, Catalogs, Modules, Session Config} à Calcite Logical Tree
Rule-based Logical Rewrites
à Calcite Logical Tree
• rewrite subqueries to joins (e.g. EXISTS or IN)
• apply query decorrelation
• simplify expressions
• constant folding (e.g. functions with literal args)
• filter push down (e.g. all the way into the source)
Main drivers: FlinkStreamProgram, FlinkBatchProgram
Output: RelNode
Cost-based Logical Optimization
à Flink Logical Tree
• watermark push down (e.g.: all the way into the source)
• projection push down (e.g.: all the way into the source)
• push aggregate through join
• reduce aggregate functions (e.g.: AVG -> SUM/COUNT)
• remove unnecessary sort, aggregate, union, etc.
Main drivers: FlinkStreamProgram, FlinkStreamRuleSets
Output: RelNode (FlinkLogicalRel)
Flink Rule-based Logical Rewrites
à Flink Logical Tree
• Watermark push down, more projection push down
• Transpose calc past rank to reduce rank input fields
• Transform over window to top-n node
• …
• Sync timestamp columns with watermarks
Main drivers: FlinkStreamProgram, FlinkStreamRuleSets
Output: FlinkLogicalRel (RelNode)
21
Parsing & Validation
{SQL String, Catalogs, Modules, Session Config} à Calcite Logical Tree
Rule-based Logical Rewrites
à Calcite Logical Tree
• rewrite subqueries to joins (e.g. EXISTS or IN)
• apply query decorrelation
• simplify expressions
• constant folding (e.g. functions with literal args)
• filter push down (e.g. all the way into the source)
Main drivers: FlinkStreamProgram, FlinkBatchProgram
Output: RelNode
Cost-based Logical Optimization
à Flink Logical Tree
• watermark push down (e.g.: all the way into the source)
• projection push down (e.g.: all the way into the source)
• push aggregate through join
• reduce aggregate functions (e.g.: AVG -> SUM/COUNT)
• remove unnecessary sort, aggregate, union, etc.
Main drivers: FlinkStreamProgram, FlinkStreamRuleSets
Output: RelNode (FlinkLogicalRel)
Flink Rule-based Logical Rewrites
à Flink Logical Tree
• watermark push down, more projection push down
• transpose calc past rank to reduce rank input fields
• transform over window to top-n node
• …
• sync timestamp columns with watermarks
Main drivers: FlinkStreamProgram, FlinkStreamRuleSets
Output: RelNode (FlinkLogicalRel)
Flink Physical Optimization and Rewrites
à Flink Physical Tree
• Convert to matching physical node
• Add changelog normalize node
• Push watermarks past these special operators
• …
• Changelog mode inference (i.e. is update before necessary?)
Main drivers: FlinkStreamProgram, FlinkStreamRuleSets
Output: FlinkPhysicalRel (RelNode)
Physical Tree à Stream Execution Nodes
22
AsyncCalc Calc ChangelogNormalize Correlate Deduplicate DropUpdateBefore Exchange
Expand GlobalGroupAggregate GlobalWindowAggregate GroupAggregate
GroupTableAggregate GroupWindowAggregate IncrementalGroupAggregate IntervalJoin
Join Limit LocalGroupAggregate LocalWindowAggregate LookupJoin Match
OverAggregate Rank Sink Sort SortLimit TableSourceScan TemporalJoin TemporalSort
Union Values WatermarkAssigner WindowAggregate WindowAggregateBase
WindowDeduplicate WindowJoin WindowRank WindowTableFunction
• Recipes for DAG subparts (i.e. StreamExecNodes are templates for stream transformations)
• JSON serializable and stability guarantees (i.e. CompiledPlan)
à End of the SQL stack:
next are JobGraph (incl. concrete runtime classes) and ExecutionGraph (incl. cluster information)
Examples
Example 1 – No watermarks, no updates
24
INSERT INTO Matched
SELECT T.tid, T.amount, P.type
FROM Transaction T JOIN Payment P ON T.tid = P.tid
WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES;
Logical Tree
LogicalSink(table=[Matched], fields=[tid, amount, type])
+- LogicalProject(tid=[$1], amount=[$2], type=[$5])
+- LogicalFilter(condition=[AND(>=($3, $0), <=($3, +($0, 600000)))])
+- LogicalJoin(condition=[=($1, $4)], joinType=[inner])
:- LogicalTableScan(table=[Transaction])
+- LogicalTableScan(table=[Payment])
Physical Tree
Sink(table=[Matched], fields=[tid, amount, type], changelogMode=[NONE])
+- Calc(select=[tid, amount, type], changelogMode=[I])
+- Join(joinType=[InnerJoin], where=[AND(=(tid, tid0), >=(ts0, ts), <=(ts0, +(ts, 600000)))], …, changelogMode=[I])
:- Exchange(distribution=[hash[tid]], changelogMode=[I])
: +- TableSourceScan(table=[Transaction], fields=[ts, tid, amount], changelogMode=[I])
+- Exchange(distribution=[hash[tid]], changelogMode=[I])
+- TableSourceScan(table=[Payment], fields=[ts, tid, type], changelogMode=[I])
Kafka Reader
Join Calc Kafka Writer
Kafka Reader
Kafka Reader
Join Calc Kafka Writer
Kafka Reader
Kafka Reader
Join Calc Kafka Writer
Kafka Reader
Example 1 – No watermarks, no updates
25
INSERT INTO Matched
SELECT T.tid, T.amount, P.type
FROM Transaction T JOIN Payment P ON T.tid = P.tid
WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES;
Kafka Reader
Join Calc Kafka Writer
offset
Kafka Reader
offset
left side
right side txn id
txn
+I[<T>]
+I[<P>]
+I[<T>, <P>] +I[tid, amount, type] Message<k, v>
Example 2 – With watermarks, no updates
26
Logical Tree
LogicalSink(table=[Matched], fields=[tid, amount, type])
+- LogicalProject(tid=[$1], amount=[$2], type=[$5])
+- LogicalFilter(condition=[AND(>=($3, $0), <=($3, +($0, 600000)))])
+- LogicalJoin(condition=[=($1, $4)], joinType=[inner])
:- LogicalWatermarkAssigner(rowtime=[ts], watermark=[-($0, 2000)])
: +- LogicalTableScan(table=[Transaction])
+- LogicalWatermarkAssigner(rowtime=[ts], watermark=[-($0, 2000)])
+- LogicalTableScan(table=[Payment])
Physical Tree
Sink(table=[Matched], fields=[tid, amount, type], changelogMode=[NONE])
+- Calc(select=[tid, amount, type], changelogMode=[I])
+- IntervalJoin(joinType=[InnerJoin], windowBounds=[leftLowerBound=…, leftTimeIndex=…, …], …, changelogMode=[I])
:- Exchange(distribution=[hash[tid]], changelogMode=[I])
: +- TableSourceScan(table=[Transaction, watermark=[-(ts, 2000), …]], fields=[ts, tid, amount], changelogMode=[I])
+- Exchange(distribution=[hash[tid]], changelogMode=[I])
+- TableSourceScan(table=[Payment, watermark=[-(ts, 2000), …]], fields=[ts, tid, type], changelogMode=[I])
INSERT INTO Matched
SELECT T.tid, T.amount, P.type
FROM Transaction T JOIN Payment P ON T.tid = P.tid
WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES;
Kafka Reader
Join Calc Kafka Writer
Kafka Reader
Kafka Reader
Join Calc Kafka Writer
Kafka Reader
Kafka Reader
Join Calc Kafka Writer
Kafka Reader
Example 2 – With watermarks, no updates
27
INSERT INTO Matched
SELECT T.tid, T.amount, P.type
FROM Transaction T JOIN Payment P ON T.tid = P.tid
WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES;
Kafka Reader
Interval Join Calc Kafka Writer
offset
Kafka Reader
offset
left side
right side
txn id
txn
+I[<T>]
+I[<P>]
+I[<T>, <P>] +I[tid, amount, type] Message<k, v>
W[12:00]
W[11:55]
W[11:55]
timers
Example 3 – With updates, no watermarks
28
INSERT INTO Matched
SELECT T.tid, T.amount, P.type
FROM Transaction T JOIN Payment P ON T.tid = P.tid
WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES;
Logical Tree
LogicalSink(table=[Matched], fields=[tid, amount, type])
+- LogicalProject(tid=[$1], amount=[$2], type=[$5])
+- LogicalFilter(condition=[AND(>=($3, $0), <=($3, +($0, 600000)))])
+- LogicalJoin(condition=[=($1, $4)], joinType=[inner])
:- LogicalTableScan(table=[Transaction])
+- LogicalTableScan(table=[Payment])
Physical Tree
Sink(table=[Matched], fields=[tid, amount, type], changelogMode=[NONE])
+- Calc(select=[tid, amount, type], changelogMode=[I,UA,D])
+- Join(joinType=[InnerJoin], where=[…], …, leftInputSpec=[JoinKeyContainsUniqueKey], changelogMode=[I,UA,D])
:- Exchange(distribution=[hash[tid]], changelogMode=[I,UA,D])
: +- ChangelogNormalize(key=[tid], changelogMode=[I,UA,D])
: +- Exchange(distribution=[hash[tid]], changelogMode=[I,UA,D])
: +- TableSourceScan(table=[Transaction], fields=[ts, tid, amount], changelogMode=[I,UA,D])
+- Exchange(distribution=[hash[tid]], changelogMode=[I])
+- TableSourceScan(table=[Payment], fields=[ts, tid, type], changelogMode=[I])
Kafka Reader
Join Calc Kafka Writer
Kafka Reader
Kafka Reader
Join Calc Kafka Writer
Kafka Reader
Kafka Reader
Join Calc Kafka Writer
Kafka Reader
Example 3 – With updates, no watermarks
29
INSERT INTO Matched
SELECT T.tid, T.amount, P.type
FROM Transaction T JOIN Payment P ON T.tid = P.tid
WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES;
Kafka Reader
Join Calc Kafka Writer
offset
Kafka Reader
offset
left side
right side txn id
txn
Join
Join
Join
Changelog
Normalize
last seen
+U[<T>]
+I[<P>]
+I[tid, amount, type] Message<k, v>
+I[<T>, <P>]
+I[<T>]
+U[<T>]
+I[<P>]
+U[tid, amount, type] Message<k, v>
+U[<T>,<P>]
+U[<T>]
Flink SQL on Confluent Cloud
Open Source Flink but Cloud-Native
Serverless
• Complexities of infrastructure management are abstracted away
• Automatic upgrades
• Stable APIs
Zero Knobs
• Auto-scaling
• Auto-watermarking
Usage-based
• Scale-to-zero
• Pay only for what you use
31
-- No connection or credentials required
CREATE TABLE MyTable (
uid BIGINT,
name STRING,
PRIMARY KEY (uid) NOT ENFORCED);
-- No tuning required
SELECT * FROM MyTable;
-- Stable for long running statements
INSERT INTO OtherTable SELECT * FROM MyTable;
Open source Flink but Cloud-Native & Complete
One Unified Platform
• Kafka and Flink fully integrated
• Automatic inference or manual creation of topics
• Metadata management via Schema Registry – bidirectional for Avro, JSON, Protobuf
• Consistent semantics across storage and processing - changelogs with append-only, upsert, retract
32
Confluent Cloud should feel like a database. But for streaming!
Confluent Cloud
CLI
Confluent Cloud
SQL Workspace
Thank you!
Feel free to follow:
@twalthr
Flink's SQL Engine: Let's Open the Engine Room!

More Related Content

PDF
CDC Stream Processing With Apache Flink With Timo Walther | Current 2022
HostedbyConfluent
 
PDF
Changelog Stream Processing with Apache Flink
Flink Forward
 
PPTX
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward
 
PPTX
Google cloud Dataflow & Apache Flink
Iván Fernández Perea
 
PDF
CDC Stream Processing with Apache Flink
Timo Walther
 
PPTX
Why and how to leverage the power and simplicity of SQL on Apache Flink
Fabian Hueske
 
PDF
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Julian Hyde
 
PPTX
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Flink Forward
 
CDC Stream Processing With Apache Flink With Timo Walther | Current 2022
HostedbyConfluent
 
Changelog Stream Processing with Apache Flink
Flink Forward
 
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward
 
Google cloud Dataflow & Apache Flink
Iván Fernández Perea
 
CDC Stream Processing with Apache Flink
Timo Walther
 
Why and how to leverage the power and simplicity of SQL on Apache Flink
Fabian Hueske
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Julian Hyde
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Flink Forward
 

Similar to Flink's SQL Engine: Let's Open the Engine Room! (20)

PDF
Overview of Oracle database12c for developers
Getting value from IoT, Integration and Data Analytics
 
PDF
Fast federated SQL with Apache Calcite
Chris Baynes
 
PDF
London Apache Kafka Meetup (Jan 2017)
Landoop Ltd
 
PDF
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | Prometheus
InfluxData
 
PDF
Towards sql for streams
Radu Tudoran
 
PDF
Developing streaming applications with apache apex (strata + hadoop world)
Apache Apex
 
PDF
Streaming SQL
Julian Hyde
 
PDF
Hive_p
Samchu Li
 
PPTX
Batch Apex in Salesforce
David Helgerson
 
PPTX
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
Flink Forward
 
PDF
Performance improvements in PostgreSQL 9.5 and beyond
Tomas Vondra
 
PDF
Flink 2.0: Navigating the Future of Unified Stream and Batch Processing
HostedbyConfluent
 
PDF
Writing an Interactive Interface for SQL on Flink
Eventador
 
PDF
10 Reasons to Start Your Analytics Project with PostgreSQL
Satoshi Nagayasu
 
PPTX
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward
 
PDF
DBMS_Chap_09_Transaction.pptx (1).pdf
badboy624277
 
PDF
Streaming SQL
Julian Hyde
 
PDF
Writing Asynchronous Programs with Scala & Akka
Yardena Meymann
 
PPTX
Master tuning
Thomas Kejser
 
PDF
Distributed Real-Time Stream Processing: Why and How 2.0
Petr Zapletal
 
Overview of Oracle database12c for developers
Getting value from IoT, Integration and Data Analytics
 
Fast federated SQL with Apache Calcite
Chris Baynes
 
London Apache Kafka Meetup (Jan 2017)
Landoop Ltd
 
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | Prometheus
InfluxData
 
Towards sql for streams
Radu Tudoran
 
Developing streaming applications with apache apex (strata + hadoop world)
Apache Apex
 
Streaming SQL
Julian Hyde
 
Hive_p
Samchu Li
 
Batch Apex in Salesforce
David Helgerson
 
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
Flink Forward
 
Performance improvements in PostgreSQL 9.5 and beyond
Tomas Vondra
 
Flink 2.0: Navigating the Future of Unified Stream and Batch Processing
HostedbyConfluent
 
Writing an Interactive Interface for SQL on Flink
Eventador
 
10 Reasons to Start Your Analytics Project with PostgreSQL
Satoshi Nagayasu
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward
 
DBMS_Chap_09_Transaction.pptx (1).pdf
badboy624277
 
Streaming SQL
Julian Hyde
 
Writing Asynchronous Programs with Scala & Akka
Yardena Meymann
 
Master tuning
Thomas Kejser
 
Distributed Real-Time Stream Processing: Why and How 2.0
Petr Zapletal
 
Ad

More from HostedbyConfluent (20)

PDF
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
PDF
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
PDF
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
PDF
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
PDF
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
PDF
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
PDF
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
PDF
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
PDF
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
PDF
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
PDF
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
PDF
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
PDF
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
PDF
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
PDF
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
PDF
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
PDF
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
PDF
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
PDF
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Ad

Recently uploaded (20)

PPTX
ChatGPT's Deck on The Enduring Legacy of Fax Machines
Greg Swan
 
PDF
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
PDF
NewMind AI Monthly Chronicles - July 2025
NewMind AI
 
PDF
Enable Enterprise-Ready Security on IBM i Systems.pdf
Precisely
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PDF
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
PDF
Chapter 2 Digital Image Fundamentals.pdf
Getnet Tigabie Askale -(GM)
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
CIFDAQ'S Market Insight: BTC to ETH money in motion
CIFDAQ
 
PDF
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
PDF
madgavkar20181017ppt McKinsey Presentation.pdf
georgschmitzdoerner
 
PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
DevOps & Developer Experience Summer BBQ
AUGNYC
 
PDF
Software Development Company | KodekX
KodekX
 
PPTX
The Power of IoT Sensor Integration in Smart Infrastructure and Automation.pptx
Rejig Digital
 
PPTX
Smart Infrastructure and Automation through IoT Sensors
Rejig Digital
 
PDF
Software Development Methodologies in 2025
KodekX
 
ChatGPT's Deck on The Enduring Legacy of Fax Machines
Greg Swan
 
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
NewMind AI Monthly Chronicles - July 2025
NewMind AI
 
Enable Enterprise-Ready Security on IBM i Systems.pdf
Precisely
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
Chapter 2 Digital Image Fundamentals.pdf
Getnet Tigabie Askale -(GM)
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
CIFDAQ'S Market Insight: BTC to ETH money in motion
CIFDAQ
 
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
Doc9.....................................
SofiaCollazos
 
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
madgavkar20181017ppt McKinsey Presentation.pdf
georgschmitzdoerner
 
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
DevOps & Developer Experience Summer BBQ
AUGNYC
 
Software Development Company | KodekX
KodekX
 
The Power of IoT Sensor Integration in Smart Infrastructure and Automation.pptx
Rejig Digital
 
Smart Infrastructure and Automation through IoT Sensors
Rejig Digital
 
Software Development Methodologies in 2025
KodekX
 

Flink's SQL Engine: Let's Open the Engine Room!

  • 1. Flink's SQL Engine: Let's Open the Engine Room! Timo Walther, Principal Software Engineer 2024-03-19
  • 2. About me Open Source • Long-term committer since 2014 (before ASF) • Member of the project management committee (PMC) • Top 5 contributor (commits), top 1 contributor (additions) • Among core architects of Flink SQL Career • Early Software Engineer @ DataArtisans (acquired by Alibaba) • SDK Team, SQL Team Lead @ Ververica • Co-Founder @ Immerok (acquired by Confluent) • Principal Software Engineer @ Confluent 2
  • 3. What is Apache Flink®?
  • 4. Snapshots • Backup • Version • Fork • A/B test • Time-travel • Restore State • Store • Buffer • Cache • Model • Grow • Expire Time • Synchronize • Progress • Wait • Timeout • Fast-forward • Replay Building Blocks for Stream Processing 4 Streams • Pipeline • Distribute • Join • Enrich • Control • Replay
  • 6. Flink SQL in a Nutshell 6 Properties • Abstract the building blocks for stream processing • Operator topology is determined by planner and optimizer • Business logic is declared in ANSI SQL • Internally, the engine works on binary data • Conceptually a table, but a changelog under the hood! SELECT 'Hello World'; SELECT * FROM (VALUES (1), (2), (3)); SELECT * FROM MyTable; SELECT * FROM Orders o JOIN Payments p ON o.id = p.order;
  • 7. How do I Work with Streams in Flink SQL? 7 • You don’t. You work with dynamic tables! • A concept similar to materialized views CREATE TABLE Revenue (name STRING, total INT) WITH (…) INSERT INTO Revenue SELECT name, SUM(amount) FROM Transactions GROUP BY name CREATE TABLE Transactions (name STRING, amount INT) WITH (…) name amount Alice 56 Bob 10 Alice 89 name total Alice 145 Bob 10 So, is Flink SQL a database? No, bring your own data!
  • 8. Stream-Table Duality - Example 8 An applied changelog becomes a real (materialized) table. name amount Alice 56 Bob 10 Alice 89 name total Alice 56 Bob 10 changelog +I[Alice, 89] +I[Bob, 10] +I[Alice, 56] +U[Alice, 145] -U[Alice, 56] +I[Bob, 10] +I[Alice, 56] 145 materialization CREATE TABLE Revenue (name STRING, total INT) WITH (…) INSERT INTO Revenue SELECT name, SUM(amount) FROM Transactions GROUP BY name CREATE TABLE Transactions (name STRING, amount INT) WITH (…)
  • 9. Stream-Table Duality - Example 9 An applied changelog becomes a real (materialized) table. name amount Alice 56 Bob 10 Alice 89 name total Alice 56 Bob 10 +I[Alice, 89] +I[Bob, 10] +I[Alice, 56] +U[Alice, 145] -U[Alice, 56] +I[Bob, 10] +I[Alice, 56] 145 materialization CREATE TABLE Revenue (PRIMARY KEY(name) …) WITH (…) INSERT INTO Revenue SELECT name, SUM(amount) FROM Transactions GROUP BY name CREATE TABLE Transactions (name STRING, amount INT) WITH (…) Save ~50% of traffic if the downstream system supports upserting!
  • 10. Let's Open the Engine Room!
  • 12. SQL Declaration – Variant 1 – Basic 12 -- Example tables CREATE TABLE Transaction (ts TIMESTAMP(3), tid BIGINT, amount INT); CREATE TABLE Payment (ts TIMESTAMP(3), tid BIGINT, type STRING); CREATE TABLE Matched (tid BIGINT, amount INT, type STRING); -- Join two tables based on key within time and store in target table INSERT INTO Matched SELECT T.tid, T.amount, P.type FROM Transaction T JOIN Payment P ON T.tid = P.tid WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES;
  • 13. SQL Declaration – Variant 2 – Watermarks 13 -- Add watermarks CREATE TABLE Transaction (…, WATERMARK FOR ts AS ts – INTERVAL '5' SECONDS); CREATE TABLE Payment (…, WATERMARK FOR ts AS ts – INTERVAL '5' SECONDS); CREATE TABLE Matched (tid BIGINT, amount INT, type STRING); -- Join two tables based on key within time and store in target table INSERT INTO Matched SELECT T.tid, T.amount, P.type FROM Transaction T JOIN Payment P ON T.tid = P.tid WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES;
  • 14. SQL Declaration – Variant 3 – Updating 14 -- Transactions can be aborted -> Result is updating CREATE TABLE Transaction (…, PRIMARY KEY (tid) NOT ENFORCED) WITH ('changelog.mode' = 'upsert'); CREATE TABLE Payment (ts TIMESTAMP(3), tid BIGINT, type STRING); CREATE TABLE Matched (…, PRIMARY KEY (tid) NOT ENFORCED) WITH ('changelog.mode' = 'upsert'); -- Join two tables based on key within time and store in target table INSERT INTO Matched SELECT T.tid, T.amount, P.type FROM Transaction T JOIN Payment P ON T.tid = P.tid WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES;
  • 15. Every Keyword can trigger an Avalanche CREATE TABLE • Defines the connector (e.g., Kafka) and the changelog mode (i.e. append/retract/upsert) àFor the engine: What needs to be processed? What should be produced to the target table? WATERMARK FOR • Defines a completeness marker (i.e. "I have seen everything up to time t.") àFor the engine: Can intermediate results be discarded? Can I trigger result computation? PRIMARY KEY • Defines a uniqueness constraint (i.e. "Event with key k will occur one time or will be updated.") àFor the engine: How do I guarantee the constraint for the target table? SELECT, JOIN, WHERE àFor the engine: What needs to be done? How much freedom do I have? How much can I optimize? 15
  • 17. 17 Parsing & Validation {SQL String, Catalogs, Modules, Session Config} à Calcite Logical Tree • Break the SQL text into a tree • Lookup catalogs, databases, tables, views, functions, and their types • Resolve all identifiers for columns and fields e.g. SELECT pi, pi.pi FROM pi.pi.pi; • Validate input/output columns and arguments/return types Main drivers: FlinkSqlParserImpl, SqlValidatorImpl, SqlToRelConverter Output: SqlNode then RelNode
  • 18. 18 Parsing & Validation {SQL String, Catalogs, Modules, Session Config} à Calcite Logical Tree Rule-based Logical Rewrites à Calcite Logical Tree • Rewrite subqueries to joins (e.g.: EXISTS or IN) • Apply query decorrelation • Simplify expressions • Constant folding (e.g.: functions with literal args) • Initial filter push down Main drivers: FlinkStreamProgram, FlinkStreamRuleSets Output: RelNode
  • 19. 19 Parsing & Validation {SQL String, Catalogs, Modules, Session Config} à Calcite Logical Tree Rule-based Logical Rewrites à Calcite Logical Tree • rewrite subqueries to joins (e.g. EXISTS or IN) • apply query decorrelation • simplify expressions • constant folding (e.g. functions with literal args) • filter push down (e.g. all the way into the source) Main drivers: FlinkStreamProgram, FlinkBatchProgram Output: RelNode Cost-based Logical Optimization à Flink Logical Tree • Projection filter/push down (e.g.: all the way into the source) • Push aggregate through join • Reduce aggregate functions (e.g.: AVG -> SUM/COUNT) • Remove unnecessary sort, aggregate, union, etc. • … Main drivers: FlinkStreamProgram, FlinkStreamRuleSets Output: FlinkLogicalRel (RelNode)
  • 20. 20 Parsing & Validation {SQL String, Catalogs, Modules, Session Config} à Calcite Logical Tree Rule-based Logical Rewrites à Calcite Logical Tree • rewrite subqueries to joins (e.g. EXISTS or IN) • apply query decorrelation • simplify expressions • constant folding (e.g. functions with literal args) • filter push down (e.g. all the way into the source) Main drivers: FlinkStreamProgram, FlinkBatchProgram Output: RelNode Cost-based Logical Optimization à Flink Logical Tree • watermark push down (e.g.: all the way into the source) • projection push down (e.g.: all the way into the source) • push aggregate through join • reduce aggregate functions (e.g.: AVG -> SUM/COUNT) • remove unnecessary sort, aggregate, union, etc. Main drivers: FlinkStreamProgram, FlinkStreamRuleSets Output: RelNode (FlinkLogicalRel) Flink Rule-based Logical Rewrites à Flink Logical Tree • Watermark push down, more projection push down • Transpose calc past rank to reduce rank input fields • Transform over window to top-n node • … • Sync timestamp columns with watermarks Main drivers: FlinkStreamProgram, FlinkStreamRuleSets Output: FlinkLogicalRel (RelNode)
  • 21. 21 Parsing & Validation {SQL String, Catalogs, Modules, Session Config} à Calcite Logical Tree Rule-based Logical Rewrites à Calcite Logical Tree • rewrite subqueries to joins (e.g. EXISTS or IN) • apply query decorrelation • simplify expressions • constant folding (e.g. functions with literal args) • filter push down (e.g. all the way into the source) Main drivers: FlinkStreamProgram, FlinkBatchProgram Output: RelNode Cost-based Logical Optimization à Flink Logical Tree • watermark push down (e.g.: all the way into the source) • projection push down (e.g.: all the way into the source) • push aggregate through join • reduce aggregate functions (e.g.: AVG -> SUM/COUNT) • remove unnecessary sort, aggregate, union, etc. Main drivers: FlinkStreamProgram, FlinkStreamRuleSets Output: RelNode (FlinkLogicalRel) Flink Rule-based Logical Rewrites à Flink Logical Tree • watermark push down, more projection push down • transpose calc past rank to reduce rank input fields • transform over window to top-n node • … • sync timestamp columns with watermarks Main drivers: FlinkStreamProgram, FlinkStreamRuleSets Output: RelNode (FlinkLogicalRel) Flink Physical Optimization and Rewrites à Flink Physical Tree • Convert to matching physical node • Add changelog normalize node • Push watermarks past these special operators • … • Changelog mode inference (i.e. is update before necessary?) Main drivers: FlinkStreamProgram, FlinkStreamRuleSets Output: FlinkPhysicalRel (RelNode)
  • 22. Physical Tree à Stream Execution Nodes 22 AsyncCalc Calc ChangelogNormalize Correlate Deduplicate DropUpdateBefore Exchange Expand GlobalGroupAggregate GlobalWindowAggregate GroupAggregate GroupTableAggregate GroupWindowAggregate IncrementalGroupAggregate IntervalJoin Join Limit LocalGroupAggregate LocalWindowAggregate LookupJoin Match OverAggregate Rank Sink Sort SortLimit TableSourceScan TemporalJoin TemporalSort Union Values WatermarkAssigner WindowAggregate WindowAggregateBase WindowDeduplicate WindowJoin WindowRank WindowTableFunction • Recipes for DAG subparts (i.e. StreamExecNodes are templates for stream transformations) • JSON serializable and stability guarantees (i.e. CompiledPlan) à End of the SQL stack: next are JobGraph (incl. concrete runtime classes) and ExecutionGraph (incl. cluster information)
  • 24. Example 1 – No watermarks, no updates 24 INSERT INTO Matched SELECT T.tid, T.amount, P.type FROM Transaction T JOIN Payment P ON T.tid = P.tid WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES; Logical Tree LogicalSink(table=[Matched], fields=[tid, amount, type]) +- LogicalProject(tid=[$1], amount=[$2], type=[$5]) +- LogicalFilter(condition=[AND(>=($3, $0), <=($3, +($0, 600000)))]) +- LogicalJoin(condition=[=($1, $4)], joinType=[inner]) :- LogicalTableScan(table=[Transaction]) +- LogicalTableScan(table=[Payment]) Physical Tree Sink(table=[Matched], fields=[tid, amount, type], changelogMode=[NONE]) +- Calc(select=[tid, amount, type], changelogMode=[I]) +- Join(joinType=[InnerJoin], where=[AND(=(tid, tid0), >=(ts0, ts), <=(ts0, +(ts, 600000)))], …, changelogMode=[I]) :- Exchange(distribution=[hash[tid]], changelogMode=[I]) : +- TableSourceScan(table=[Transaction], fields=[ts, tid, amount], changelogMode=[I]) +- Exchange(distribution=[hash[tid]], changelogMode=[I]) +- TableSourceScan(table=[Payment], fields=[ts, tid, type], changelogMode=[I])
  • 25. Kafka Reader Join Calc Kafka Writer Kafka Reader Kafka Reader Join Calc Kafka Writer Kafka Reader Kafka Reader Join Calc Kafka Writer Kafka Reader Example 1 – No watermarks, no updates 25 INSERT INTO Matched SELECT T.tid, T.amount, P.type FROM Transaction T JOIN Payment P ON T.tid = P.tid WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES; Kafka Reader Join Calc Kafka Writer offset Kafka Reader offset left side right side txn id txn +I[<T>] +I[<P>] +I[<T>, <P>] +I[tid, amount, type] Message<k, v>
  • 26. Example 2 – With watermarks, no updates 26 Logical Tree LogicalSink(table=[Matched], fields=[tid, amount, type]) +- LogicalProject(tid=[$1], amount=[$2], type=[$5]) +- LogicalFilter(condition=[AND(>=($3, $0), <=($3, +($0, 600000)))]) +- LogicalJoin(condition=[=($1, $4)], joinType=[inner]) :- LogicalWatermarkAssigner(rowtime=[ts], watermark=[-($0, 2000)]) : +- LogicalTableScan(table=[Transaction]) +- LogicalWatermarkAssigner(rowtime=[ts], watermark=[-($0, 2000)]) +- LogicalTableScan(table=[Payment]) Physical Tree Sink(table=[Matched], fields=[tid, amount, type], changelogMode=[NONE]) +- Calc(select=[tid, amount, type], changelogMode=[I]) +- IntervalJoin(joinType=[InnerJoin], windowBounds=[leftLowerBound=…, leftTimeIndex=…, …], …, changelogMode=[I]) :- Exchange(distribution=[hash[tid]], changelogMode=[I]) : +- TableSourceScan(table=[Transaction, watermark=[-(ts, 2000), …]], fields=[ts, tid, amount], changelogMode=[I]) +- Exchange(distribution=[hash[tid]], changelogMode=[I]) +- TableSourceScan(table=[Payment, watermark=[-(ts, 2000), …]], fields=[ts, tid, type], changelogMode=[I]) INSERT INTO Matched SELECT T.tid, T.amount, P.type FROM Transaction T JOIN Payment P ON T.tid = P.tid WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES;
  • 27. Kafka Reader Join Calc Kafka Writer Kafka Reader Kafka Reader Join Calc Kafka Writer Kafka Reader Kafka Reader Join Calc Kafka Writer Kafka Reader Example 2 – With watermarks, no updates 27 INSERT INTO Matched SELECT T.tid, T.amount, P.type FROM Transaction T JOIN Payment P ON T.tid = P.tid WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES; Kafka Reader Interval Join Calc Kafka Writer offset Kafka Reader offset left side right side txn id txn +I[<T>] +I[<P>] +I[<T>, <P>] +I[tid, amount, type] Message<k, v> W[12:00] W[11:55] W[11:55] timers
  • 28. Example 3 – With updates, no watermarks 28 INSERT INTO Matched SELECT T.tid, T.amount, P.type FROM Transaction T JOIN Payment P ON T.tid = P.tid WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES; Logical Tree LogicalSink(table=[Matched], fields=[tid, amount, type]) +- LogicalProject(tid=[$1], amount=[$2], type=[$5]) +- LogicalFilter(condition=[AND(>=($3, $0), <=($3, +($0, 600000)))]) +- LogicalJoin(condition=[=($1, $4)], joinType=[inner]) :- LogicalTableScan(table=[Transaction]) +- LogicalTableScan(table=[Payment]) Physical Tree Sink(table=[Matched], fields=[tid, amount, type], changelogMode=[NONE]) +- Calc(select=[tid, amount, type], changelogMode=[I,UA,D]) +- Join(joinType=[InnerJoin], where=[…], …, leftInputSpec=[JoinKeyContainsUniqueKey], changelogMode=[I,UA,D]) :- Exchange(distribution=[hash[tid]], changelogMode=[I,UA,D]) : +- ChangelogNormalize(key=[tid], changelogMode=[I,UA,D]) : +- Exchange(distribution=[hash[tid]], changelogMode=[I,UA,D]) : +- TableSourceScan(table=[Transaction], fields=[ts, tid, amount], changelogMode=[I,UA,D]) +- Exchange(distribution=[hash[tid]], changelogMode=[I]) +- TableSourceScan(table=[Payment], fields=[ts, tid, type], changelogMode=[I])
  • 29. Kafka Reader Join Calc Kafka Writer Kafka Reader Kafka Reader Join Calc Kafka Writer Kafka Reader Kafka Reader Join Calc Kafka Writer Kafka Reader Example 3 – With updates, no watermarks 29 INSERT INTO Matched SELECT T.tid, T.amount, P.type FROM Transaction T JOIN Payment P ON T.tid = P.tid WHERE P.ts BETWEEN T.ts AND T.ts + INTERVAL '10' MINUTES; Kafka Reader Join Calc Kafka Writer offset Kafka Reader offset left side right side txn id txn Join Join Join Changelog Normalize last seen +U[<T>] +I[<P>] +I[tid, amount, type] Message<k, v> +I[<T>, <P>] +I[<T>] +U[<T>] +I[<P>] +U[tid, amount, type] Message<k, v> +U[<T>,<P>] +U[<T>]
  • 30. Flink SQL on Confluent Cloud
  • 31. Open Source Flink but Cloud-Native Serverless • Complexities of infrastructure management are abstracted away • Automatic upgrades • Stable APIs Zero Knobs • Auto-scaling • Auto-watermarking Usage-based • Scale-to-zero • Pay only for what you use 31 -- No connection or credentials required CREATE TABLE MyTable ( uid BIGINT, name STRING, PRIMARY KEY (uid) NOT ENFORCED); -- No tuning required SELECT * FROM MyTable; -- Stable for long running statements INSERT INTO OtherTable SELECT * FROM MyTable;
  • 32. Open source Flink but Cloud-Native & Complete One Unified Platform • Kafka and Flink fully integrated • Automatic inference or manual creation of topics • Metadata management via Schema Registry – bidirectional for Avro, JSON, Protobuf • Consistent semantics across storage and processing - changelogs with append-only, upsert, retract 32 Confluent Cloud should feel like a database. But for streaming! Confluent Cloud CLI Confluent Cloud SQL Workspace
  • 33. Thank you! Feel free to follow: @twalthr