Webinar Think Right - Shift Left - 19-03-2025.pptx

Créer des Data Product réutilisables avec
intégration Data Warehouse / Data Lake :
Think Right, Shift Left
Olivier Laplace: Staff Solutions Engineer - Confluent
olaplace@confluent.io
19 March 2025

The Rise of Event Streaming
2010
Apache Kafka
created at LinkedIn by
Confluent founders
2014
2023
2017
Confluent
Cloud

SELF-MANAGED SOFTWARE
Confluent Platform
The Enterprise Distribution of Apache Kafka
Deploy on-premises or in your private cloud
VM
FULLY MANAGED CLOUD SERVICE
Confluent Cloud
Cloud-native Data Streaming Platform built by the
founders of Apache Kafka
Available on the leading public clouds marketplaces
Deploy Confluent where your business requires it

9
….But Without A Data Streaming Platform, Bad Data Lands And Spreads
Across Your Organization
Just like leaving muddy
tracks in your lake house!
Data Warehouse Data Lake “Lakehouse”
Scalable and high
performance for queries
and historical analyses
Scalable and flexible
for storing
unstructured data
Combines the advantages
of DWH and DL

Today’s Data Pipeline Approaches Are the Root of Your Data
Problems
Domain 1
Database
Domain 2
Database
Custom
Apps
Data Lake
Lake House
Data Mart
Data
Warehouse
ML/AI
Reports &
Dashboards
Saas
Apps
OPERATIONAL
SYSTEMS
ETL/ELT PIPELINES
ANALYTICAL
SYSTEMS

DATA WAREHOUSE / DATA LAKE
ML/AI
Dashboards
OPERATIONAL DATA
Poor decision making
with stale data
5 / 30 / 60 min batch ingestion
Poor lineage and governance
and increasing pipeline sprawl
Cascading data pollution and failures
Time
Batch 1
Process
Batch 2
Process
Batch 3
Process
Batch 4
Process
Time
Batch 1
Process
Batch 2
Process
Batch 3
Process
Batch 4
Process
Time
Batch 1
Process
Batch 2
Process
Batch 3
Process
Batch 4
Process
Time
Batch 1
Process
Batch 2
Process
Batch 3
Process
Batch 4
Process
Complex remodelling and reprocessing = $$$
‘JUST-ENOUGH’ CLEANSED
DATA
READY-TO-USE BUSINESS
DATA
RAW DATA DUMPS
ANIMATED SLIDE
Reports
Problem 1: ELT Pipelines Are Brittle, Slow and Inefficient

Domain 1
Database
Domain 2
Database
Custom Apps
Data Lake
Lake House
Data Mart
Data
Warehouse
ML/AI
Reports &
Dashboards
Saas Apps
OPERATIONAL
SYSTEMS
ETL/ELT PIPELINES
ANALYTICAL
SYSTEMS
Problem 2: For 50 Years Data Has Moved in One Direction…

Domain 1
Database
Domain 2
Database
Data Lake
Lake House
Data Mart
Data
Warehouse
ML/AI
Reports &
Dashboards
OPERATIONAL
SYSTEMS
ETL/ELT PIPELINES
ANALYTICAL
SYSTEMS
REVERSE ETL
More batch tools are bolted on
to reverse the flow of data – from
data warehouses and data lakes
back to operational systems and
apps – for “real-time” use cases
...But Modern Applications Need Data to Flow ‘Upstream’ Too
Custom Apps Saas Apps

In Summary, Batch Pipelines Pose Significant Challenges
STALE DATA
A giant mess of monolithic point-to-point connections with data fidelity and governance challenges
due to batch ingest and duplicative processing at the destination
Operational
Databases and Apps
ELT
ETL
Raw Cleansed
Business-
ready
Raw Cleansed
Data Warehouse / Data Lake
rETL
rETL
ML/AI
Reports &
Dashboards
EXPENSIVE (RE)PROCESSING MANUAL BREAK FIX
SILOED AND REDUNDANT DATASETS

Operational
Databases And Apps
Business-
ready
Data Warehouse / Data Lake
PROCESS
GOVERN
STREAM
Universal
Data Products
Operational Databases, SaaS Apps,
Custom Apps, AI Systems…
Cleansed
Microservices
ML/AI
Reports &
Dashboards
Cleansed
CONNECT
CONNECT
CONNECT
Shift Left to Unlock Faster Data Value for Analytics and AI
ROI POSITIVE
REAL-TIME RELIABLE
REUSABLE
Build your data once, make it trustworthy and use it anywhere by shifting the processing
and governance of your data at the source

Stream
Governance
This Is Possible Because We Unify Important Standards
ANIMATED SLIDE
Kafka
The standard for
operational streaming
Flink
The standard for
stream processing
Iceberg and Delta Lake
The standard table
formats for analytics

Domain 1
Database
Domain 2
Database
OPERATIONAL
SYSTEMS
Data Lake
Lake House
ML/AI/GenAI
Models
Data
Warehouse
ANALYTICAL
SYSTEMS
Custom
Apps
Saas Apps
Data Cleansing, Aggregation, Normalization
Generated Insights Flow Back to Applications
This Unification Ensures High Value Data for Analytics and AI Is
Always Fresh

GA Today!
The Confluent Data Streaming Platform Advantage
Streaming
Continuously capture and share
real-time data everywhere - to
your data warehouse, data lake and
operational systems and apps
Schema Management
Reduce faulty data downstream
by enforcing quality checks
and controls in the pipeline
with data contracts
Flink Stream Processing
Continuously optimize the
treatment of data, the moment
it’s created, for well-curated
reusable data products
Data Portal
Enable anyone with the right
access controls to effortlessly
explore and use real-time
data products for greater
data autonomy
Tableflow
Simplify representing
your operational data as a
ready-to-use Iceberg table
in just one-click
Stream Lineage
Understand the complex
data relationships and the
data journey to ensure trustworthiness

How we do it
Write Your Data Once, Read It as a Stream or Table
In-stream processing
Data Stream Data Product
Schema Registry
Tableflow
(Iceberg/Delta)
Third Party Compute Engines
Databases
Log data & messaging
systems
Custom Apps &
Microservices
Operational Apps &
Data Systems
Stream (Kafka)
Event-Driven
Design
Decoupled
Architecture
Connect
Connect
Connect
Data Warehouses /
Data Lakes
Stream (Kafka)
COMING
SOON
READ
AS
READ
AS
Stream
Lineage
Stream
Catalog
Data
Portal
Immutable
Logs

OPERATIONAL ESTATE ANALYTICAL ESTATE
Apache Kafka is the standard to
connect and organize business data as
data streams
Apache Iceberg / Delta = standards
for managing tables that feed
the analytical estate

STREAM INGEST PREP
Convert to
parquet
Schema
evolution
Type conversion
Compaction
Data quality
rules
Sync metadata to
catalogue
Ingest
Workflow
Silver & Gold
Tables
Business-specific
rules and logic
CDC
materialization
Deduplication
Filtering
Raw
Tables
Object
Storage
S3
GCS
ABS
Current state: Converting streams to tables
is a lot of manual work

SERVE
External Catalog
Or
Direct Access
of Metadata
Ready-to-use
Iceberg/Delta
Tables
3rd Party
Compute
Engines
Confluent’s Tableflow simplifies converting streaming data
to Apache Iceberg tables
STREAM + INGEST + PREP
AUTOMATIC
✓Convert to
parquet
✓Schema
evolution
✓Type mapping /
conversion
✓CDC
materialization
• Compaction
• Data quality /
rules (SR)
• Sync metadata to
catalogs

Incrementally evolve your data integration approach
from batch pipelines…
Databases
Custom Apps
SaaS
Data Lake
DWH
Data Lake
Queries
Analytics
Interactions
Batch pipelines (ELT, ELT)
Processing and
governance
Processing and
governance
Processing and
governance

... to Confluent Data Streaming Platform,
one use case at a time
Databases
Custom Apps
SaaS
Data Lake
DWH
Data Lake
Queries
Analytics
Interactions
Processing and
governance
Processing and
governance
Processing and
governance
Batch pipelines (ELT, ELT)

Transform to a Real-Time Streaming Data Architecture Across
Your Enterprise (at Your Pace)
Databases
Custom Apps
SaaS
Data Lake
DWH
Data Lake
Queries
Analytics
Interactions
Connect
Govern
Process

VP of Data,
Global Small
Business
Platform
“Your insights on the "Shift Left" philosophy and the integration of
kafka, flink, tableflow, iceberg, and stream governance are spot on.
The amount of pain that can be prevented by managing data from a single
logical location is incredible...simplifying regulatory compliance,
promoting schema evolution (vs proliferation) and reducing data
duplication.
Low-latency data streams with efficient bulk-query tables, give you the
flexibility to address a wide variety of use cases without a wide variety of
systems.”

Other Customer Testimonials
Data Platform Lead,
Sports Technology company
“[Data cleaning] It’s a pricey way of pushing it down to the Deltalake. De-
duplication within Confluent is a cheaper way of doing it. We can only do it
once.”
Data Strategy Supervisor, Auto
Parts Retailer
“I love the vision of this [Shift Left]. This is how we would make datasets more
discoverable. I knew that Confluent had an integration with Alation but it's
awesome to hear that you have other ways [Data Portal] of enabling those
capabilities.”
Digital Solution Architect
Integration Specialist,
Global Manufacturer
“What I'm hearing is that, moving data from left is good, but we also have
flink in-stream processing for the transformation in a manner that is
presentable for the right side consistently. In-stream processing is a value
add for the data quality.”

Webinar Think Right - Shift Left - 19-03-2025.pptx

Webinar Think Right - Shift Left - 19-03-2025.pptx

More Related Content

Similar to Webinar Think Right - Shift Left - 19-03-2025.pptx (20)

More from confluent (20)

Recently uploaded (20)

Webinar Think Right - Shift Left - 19-03-2025.pptx

Editor's Notes