Webinar Think Right - Shift Left - 19-03-2025.pptx
1. Créer des Data Product réutilisables avec
intégration Data Warehouse / Data Lake :
Think Right, Shift Left
Olivier Laplace: Staff Solutions Engineer - Confluent
[email protected]
19 March 2025
2. The Rise of Event Streaming
2010
Apache Kafka
created at LinkedIn by
Confluent founders
2014
2023
2017
Confluent
Cloud
3. SELF-MANAGED SOFTWARE
Confluent Platform
The Enterprise Distribution of Apache Kafka
Deploy on-premises or in your private cloud
VM
FULLY MANAGED CLOUD SERVICE
Confluent Cloud
Cloud-native Data Streaming Platform built by the
founders of Apache Kafka
Available on the leading public clouds marketplaces
Deploy Confluent where your business requires it
5. 9
….But Without A Data Streaming Platform, Bad Data Lands And Spreads
Across Your Organization
Just like leaving muddy
tracks in your lake house!
Data Warehouse Data Lake “Lakehouse”
Scalable and high
performance for queries
and historical analyses
Scalable and flexible
for storing
unstructured data
Combines the advantages
of DWH and DL
6. Today’s Data Pipeline Approaches Are the Root of Your Data
Problems
Domain 1
Database
Domain 2
Database
Custom
Apps
Data Lake
Lake House
Data Mart
Data
Warehouse
ML/AI
Reports &
Dashboards
Saas
Apps
OPERATIONAL
SYSTEMS
ETL/ELT PIPELINES
ANALYTICAL
SYSTEMS
7. DATA WAREHOUSE / DATA LAKE
ML/AI
Dashboards
OPERATIONAL DATA
Poor decision making
with stale data
5 / 30 / 60 min batch ingestion
Poor lineage and governance
and increasing pipeline sprawl
Cascading data pollution and failures
Time
Batch 1
Process
Batch 2
Process
Batch 3
Process
Batch 4
Process
Time
Batch 1
Process
Batch 2
Process
Batch 3
Process
Batch 4
Process
Time
Batch 1
Process
Batch 2
Process
Batch 3
Process
Batch 4
Process
Time
Batch 1
Process
Batch 2
Process
Batch 3
Process
Batch 4
Process
Complex remodelling and reprocessing = $$$
‘JUST-ENOUGH’ CLEANSED
DATA
READY-TO-USE BUSINESS
DATA
RAW DATA DUMPS
ANIMATED SLIDE
Reports
Problem 1: ELT Pipelines Are Brittle, Slow and Inefficient
8. Domain 1
Database
Domain 2
Database
Custom Apps
Data Lake
Lake House
Data Mart
Data
Warehouse
ML/AI
Reports &
Dashboards
Saas Apps
OPERATIONAL
SYSTEMS
ETL/ELT PIPELINES
ANALYTICAL
SYSTEMS
Problem 2: For 50 Years Data Has Moved in One Direction…
9. Domain 1
Database
Domain 2
Database
Data Lake
Lake House
Data Mart
Data
Warehouse
ML/AI
Reports &
Dashboards
OPERATIONAL
SYSTEMS
ETL/ELT PIPELINES
ANALYTICAL
SYSTEMS
REVERSE ETL
More batch tools are bolted on
to reverse the flow of data – from
data warehouses and data lakes
back to operational systems and
apps – for “real-time” use cases
...But Modern Applications Need Data to Flow ‘Upstream’ Too
Custom Apps Saas Apps
10. In Summary, Batch Pipelines Pose Significant Challenges
STALE DATA
A giant mess of monolithic point-to-point connections with data fidelity and governance challenges
due to batch ingest and duplicative processing at the destination
Operational
Databases and Apps
ELT
ETL
Raw Cleansed
Business-
ready
Raw Cleansed
Data Warehouse / Data Lake
rETL
rETL
ML/AI
Reports &
Dashboards
EXPENSIVE (RE)PROCESSING MANUAL BREAK FIX
SILOED AND REDUNDANT DATASETS
11. Operational
Databases And Apps
Business-
ready
Data Warehouse / Data Lake
PROCESS
GOVERN
STREAM
Universal
Data Products
Operational Databases, SaaS Apps,
Custom Apps, AI Systems…
Cleansed
Microservices
ML/AI
Reports &
Dashboards
Cleansed
CONNECT
CONNECT
CONNECT
Shift Left to Unlock Faster Data Value for Analytics and AI
ROI POSITIVE
REAL-TIME RELIABLE
REUSABLE
Build your data once, make it trustworthy and use it anywhere by shifting the processing
and governance of your data at the source
12. Stream
Governance
This Is Possible Because We Unify Important Standards
ANIMATED SLIDE
Kafka
The standard for
operational streaming
Flink
The standard for
stream processing
Iceberg and Delta Lake
The standard table
formats for analytics
13. Domain 1
Database
Domain 2
Database
OPERATIONAL
SYSTEMS
Data Lake
Lake House
ML/AI/GenAI
Models
Data
Warehouse
ANALYTICAL
SYSTEMS
Custom
Apps
Saas Apps
Data Cleansing, Aggregation, Normalization
Generated Insights Flow Back to Applications
This Unification Ensures High Value Data for Analytics and AI Is
Always Fresh
14. GA Today!
The Confluent Data Streaming Platform Advantage
Streaming
Continuously capture and share
real-time data everywhere - to
your data warehouse, data lake and
operational systems and apps
Schema Management
Reduce faulty data downstream
by enforcing quality checks
and controls in the pipeline
with data contracts
Flink Stream Processing
Continuously optimize the
treatment of data, the moment
it’s created, for well-curated
reusable data products
Data Portal
Enable anyone with the right
access controls to effortlessly
explore and use real-time
data products for greater
data autonomy
Tableflow
Simplify representing
your operational data as a
ready-to-use Iceberg table
in just one-click
Stream Lineage
Understand the complex
data relationships and the
data journey to ensure trustworthiness
15. How we do it
Write Your Data Once, Read It as a Stream or Table
In-stream processing
Data Stream Data Product
Schema Registry
Tableflow
(Iceberg/Delta)
Third Party Compute Engines
Databases
Log data & messaging
systems
Custom Apps &
Microservices
Operational Apps &
Data Systems
Stream (Kafka)
Event-Driven
Design
Decoupled
Architecture
Connect
Connect
Connect
Data Warehouses /
Data Lakes
Stream (Kafka)
COMING
SOON
READ
AS
READ
AS
Stream
Lineage
Stream
Catalog
Data
Portal
Immutable
Logs
16. OPERATIONAL ESTATE ANALYTICAL ESTATE
Apache Kafka is the standard to
connect and organize business data as
data streams
Apache Iceberg / Delta = standards
for managing tables that feed
the analytical estate
17. STREAM INGEST PREP
Convert to
parquet
Schema
evolution
Type conversion
Compaction
Data quality
rules
Sync metadata to
catalogue
Ingest
Workflow
Silver & Gold
Tables
Business-specific
rules and logic
CDC
materialization
Deduplication
Filtering
Raw
Tables
Object
Storage
S3
GCS
ABS
Current state: Converting streams to tables
is a lot of manual work
18. SERVE
External Catalog
Or
Direct Access
of Metadata
Ready-to-use
Iceberg/Delta
Tables
3rd Party
Compute
Engines
Confluent’s Tableflow simplifies converting streaming data
to Apache Iceberg tables
STREAM + INGEST + PREP
AUTOMATIC
✓Convert to
parquet
✓Schema
evolution
✓Type mapping /
conversion
✓CDC
materialization
• Compaction
• Data quality /
rules (SR)
• Sync metadata to
catalogs
19. Incrementally evolve your data integration approach
from batch pipelines…
Databases
Custom Apps
SaaS
Data Lake
DWH
Data Lake
Queries
Analytics
Interactions
Batch pipelines (ELT, ELT)
Processing and
governance
Processing and
governance
Processing and
governance
20. ... to Confluent Data Streaming Platform,
one use case at a time
Databases
Custom Apps
SaaS
Data Lake
DWH
Data Lake
Queries
Analytics
Interactions
Processing and
governance
Processing and
governance
Processing and
governance
Batch pipelines (ELT, ELT)
21. Transform to a Real-Time Streaming Data Architecture Across
Your Enterprise (at Your Pace)
Databases
Custom Apps
SaaS
Data Lake
DWH
Data Lake
Queries
Analytics
Interactions
Connect
Govern
Process
22. VP of Data,
Global Small
Business
Platform
“Your insights on the "Shift Left" philosophy and the integration of
kafka, flink, tableflow, iceberg, and stream governance are spot on.
The amount of pain that can be prevented by managing data from a single
logical location is incredible...simplifying regulatory compliance,
promoting schema evolution (vs proliferation) and reducing data
duplication.
Low-latency data streams with efficient bulk-query tables, give you the
flexibility to address a wide variety of use cases without a wide variety of
systems.”
23. Other Customer Testimonials
Data Platform Lead,
Sports Technology company
“[Data cleaning] It’s a pricey way of pushing it down to the Deltalake. De-
duplication within Confluent is a cheaper way of doing it. We can only do it
once.”
Data Strategy Supervisor, Auto
Parts Retailer
“I love the vision of this [Shift Left]. This is how we would make datasets more
discoverable. I knew that Confluent had an integration with Alation but it's
awesome to hear that you have other ways [Data Portal] of enabling those
capabilities.”
Digital Solution Architect
Integration Specialist,
Global Manufacturer
“What I'm hearing is that, moving data from left is good, but we also have
flink in-stream processing for the transformation in a manner that is
presentable for the right side consistently. In-stream processing is a value
add for the data quality.”
Editor's Notes
#2:The rise of Event Streaming can be traced back to 2010, when Apache Kafka was created by the future Confluent founders in Silicon Valley. From there, Kafka began spreading throughout Silicon Valley and across the US West coast. [CLICK] Then, in 2014, Confluent was created with the goal to turn Kafka into an enterprise-ready software stack and cloud offering, after which the adoption of Kafka started to really accelerate. [CLICK] Fast forward to 2020, tens of thousands of companies across the world and across all kinds of industries are using Kafka for event streaming.
What I am telling my family and friends is: You are a Kafka user, whether you know it or not. When you use a smartphone, shop online, make a payment, read the news, listen to music, drive a car, book a flight—it’s very likely that this is powered by Kafka behind the scenes. Kafka is applied even to use cases that I personally would have never predicted, like by scientists for research on astrophysics, where Kafka is used for automatically coordinating globally-distributed, large telescopes to record interstellar phenomenons! Not even the sky’s the limit!
#4:While for some teams, a data product is not anything new, for many organizations, it’s quite an abstract concept. So, I want to take a moment to address how to apply a product thinking to your data.
Imagine all the entities in your business - customers, accounts, claims, inventory, shipments… Each of these entities is a data product.
And, instead of querying data from dead rows in a database, each of these data products are live. They are being continuously enriched, continuously governed and continuously shared so your teams can build with trustworthy data assets faster and drive greater reuse, unlocking the full potential of data the moment it’s created.
#5:You have access to schema registry and schema validation (at the topic level to ensure broker/registry coordination by verifying schemas tied to incoming messages are both valid and assigned to the specific destination) but also specific features like stream catalog allowing you to apply taggs on data and have a self-service data discovery allowing your teams to classify, organise and look for specific data or data streams
Also stream lineage can help you understand data relationships with interactive insights and an end-tp-end maps of all you data streams
#6:We recently added a managed Flink offering to our platform, allowing our customers to deploy Fink pools in a serverless way allowing them to filter, join, enrich your data streams so in fact when you create a Kafka topic, Flink instantly sees it and is able to process the data and as soon as the data is processed you are able to export it in another Kafka topic without any manual action while when you manage Flink you have to build that integration on your own.
And if we compare Flink’s adoption with that of Kafka during a similar period of time, what you can see is that these two open source projects are on similar trajectories - just shifted by a few years - and Flink is already seen as the new defacto standard for real time data processing.
We are delivering specific workshops around Flink to compare it to other solutions like Kafka Streams or ksqlDB or more hands on workshops to test the product, we could organise one in the future if that’s something you are interested in
#7:This is exactly the problem Confluent solves for - with a radically transformative approach.
Confluent Data Streaming Platform enables you to construct a unified real-time knowledge base of your customer by tapping into feeds of information as they change.
With native predictive and Generative AI capabilities built into our platform, we can make your data AI-ready in real-time. Bridging your legacy and AI stack with fully managed connectors, an open API, native stream processing and governance and our hybrid and multicloud capabilities, Confluent can secure and govern all your AI data and set it in motion to unlock the power of AI in your organization.
So, how do we do it?
#8:We want to start by saying that we love Data Warehouses, Data Lakes and Lakehouses. These systems has played and will continue to play a big role as a central component in the tech stack for countless organizations for many years to come.
#9:But what’s also undeniable is that most organizations use these systems to directly store huge volumes of raw unprocessed data before cleaning it. And bad data, once it lands in those systems quickly profilerates across different teams that rely on those systems and troubleshooting and fixing the issue becomes very challenging. This has been a consistent theme over the past few decades starting from the days of on-prem systems from Teradata to cloud based data warehouse, data lakes and lakehouses from vendors like Databricks or Snowflake.
Think of these systems almost like a beautiful data lakehouse on a beautiful alpine lake. It’s expensive but powerful and loved in your organization. But it is so well loved, that usage has gotten out of control. Duplication of data and compute is rampant and customers need to get more efficient.
What’s actually happening with their lakehouse is that the users are spending the day on the lake and at the beach. They’re collecting dirt and mud on their clothes and on their shoes and today they’re walking all of that mud right in the front door of their beautify lakehouse and tracking it across all the rooms, the carpets, the chairs and the furniture.
Data teams are spending tremendous amounts of time and money cleaning and re-cleaning the same data over and over in their lakehouse and they’re not using their lakehouse for the AI use cases they thought they were buying, they’re spending a lot of their time cleaning that data that is far more efficiently done in streaming.
#10:The root of your data challenges stem from their current data integration approaches, particularly the ETL / ELT pipelines. Pipelines are essential plumbing that extract and process data but the reality is that these batch-based pipelines cause more problems than they solve.
Given the recent popularity of ELT pipelines, let’s dive in to take a look at how it actually works.
#11:When data is extracted and processed in batches you end up with low fidelity snapshots, frustrating inconsistencies and stale information and any downstream use of that data is based on outdated inputs.
<click>
The second challenge is the cost and complexity of remodeling and reprocessing:
Teams often create local versions of the data that they can apply processing logic to to meet various use case demands. But as data gets reprocessed over and over again, there’s a high cost - both in terms of compute and wasted hours maintaining different data sets that don’t match one another.
In addition, when data arrives in micro-batches, you often have to stitch together the incremental changes with additional processing logic. Now, this can be simple if the data comes from just one source or is used in just one downstream location. But that’s not the case. So, it’s up to the engineer to figure out how to make these incremental updates everywhere that data is used and that is incredibly hard to do.
<click>
Third is data quality and trustworthiness of the data. If the application changes or the schema drifts, you end up with garbage data in the data warehouse / data lake. This means all the systems and applications that depend on this data are building off dirty data. And fixing it is usually a multi-step manual and tedious process.
<click>
The other common challenge we hear from data engineers is pipeline inheritance. Often, the data engineer ends up inheriting pipelines but there’s no knowledge or documentation about the pipeline; no clear lineage. As a result, engineers are fearful of changing existing pipelines and instead prefer to add just one more. And that worsens the pipeline sprawl.
#12:If you look at what’s happened over the past 50 years, data have moved “left to right” from operational to analytical systems across your org. And there was a good reason for this- Data used to be generated from operational databases and custom and SaaS apps like salesforce and SAP. Data from these systems were in varying formats and needed to be collected and integrated into a cohesive structure. Which led to the emergence of data warehouse and lakehouses.
#13:But, the reality is that the data warehouse and data lake is used for far more than just dashboards and reports. This data is now required back in the operational estate to power “real-time” and GenAI applications and chatbots that your app developer teams in the operational domain build to support customers
So, you tack on more tools such as reverse ETL pipelines and add more brittle point-to-point integrations that send data from analytical systems back to operational systems and applications. This means not only are you creating more technical debt, you operational systems and apps are relying on dirty and stale data.
#14:[Instructions for sellers: This is an optional slide. Pull this slide into your presentation if your customer is struggling with high costs due to multiple ETL and ELT pipelines/vendors in their data stack]
This is the landscape of data integration and processing you’re working with, with the ELT paradigm.
Tools like Fivetran, or Stitch or Airbyte capture data from all various sources and load raw or partially transformed data into the cloud data warehouse or data lake.
Once that data is in the warehouse, your data teams use tools like dbt to transform the data and use SQL to start building the data flow logic.
With the data transformed in the data warehouse, you create additional pipelines to connect to one of the many data visualization / BI software depending on the use case.
And given your operational systems now need the same view of data as your analytical systems in real-time, there’s am emergence of Reverse ETL (rETL) tools, such as Hightouch or Grouparoo… that focus on reversing the pattern of data movement - from your data warehouses or data lakes, back to the operational systems, such as your databases and SaaS applications.
Not to mention that you have to bolt on data governance and orchestration tools like Collibra or orchestration tools like Alation on top of this already complex stack for governance of your data, that’s now on the move across multiple silos of tools.
#15:This is why it’s so hard to unlock the value of data from your data warehouse and data lakes - because there’s tedious data preparation required to get access to high quality, trustworthy data.
Data is often stale and unreliable. You have duplicate data that should, in reality, “agree” with one another but don’t. The right data can’t be found and is instead recreated (incorrectly) over and over again, increasing your compute costs, data quality issues and maintenance complexity. You spend lots of time and resources in acquiring and preparing data and worst of all, you’re designing your customer experiences to rely on slow batch-based processes and unreliable data.
There’s a great deal of complexity in acquiring and preparing data, that’s increasing your costs and impeding your ability to be more agile and innovative.
#16:We believe that there’s a much better way to unlock data value and that begins with shifting the processing and governance of your data to data streaming, so you can build your data once, build it right and reuse it anywhere within milliseconds of its creation. By shifting left, you can eliminate the data inconsistency challenges, reduce the duplication of processing and associated costs, prevent data quality issues before they become problematic downstream and maximize the ROI of your data warehouse and data lakes.
With Confluent, we turn your data problems at the head and ensure that your data downstream is always fresh and up-to-date; that It’s trustworthy, reliable, discoverable and instantly usable so your teams can build new applications more easily.
So, how do we do it?
#17:The reason we’ve able to do this is because we unify 3 different open source standards in our platform
<CLICK> Firstly, Kafka has been and will continue to be the standard for operational systems like databases, SaaS and customer Apps to communicate with each other with data streams.
<CLICK> Secondly, Flink has become the de facto stream processing engine to process, clean, and enrich data streams on-the-fly
<CLICK> Iceberg and Delta Lake have quickly become the open source standards for open table formats across numerous compute engines like Spark, BigQuery, Snowflake, Trino among other.
<CLICK> In our platform, this is all underpinned by a layer of Stream Governance which means that your data across both the operational and analytical domains is secured and unified within one platform.
This in turn, means that you have the flexibility to offer multiple entry points or “APIs” for your data teams to work on secure and trusted data. For example application developer teams can build applications in the language of their choosing like Python or Java
Your Flink developers can work with Flink using familiar languages like SQL to deduplicate and cleanse data and created ready-to-use data products.
Your data engineers teams can work in Iceberg to take those ready to use datasets to enrich and transform this further and use this with any Iceberg compliant compute engine of their choice.
#18:Ultimately, this means that you have fresh high value datasets for analytics and AI that always flows freely in both directions where
Operational systems can continuously feed cleansed, aggregated and normalized data into analytical platforms
Insights generated in analytical systems can be immediately pushed back to operational systems.
Let’s walk through an example of an e-commerce use-case where:
A customer's browsing behavior is instantly captured
Machine learning models immediately analyze this behavior
Personalized recommendations are generated in milliseconds
These recommendations are instantly fed back to the customer facing app
This is just one example of how you can create a continuous loop of data generation, analysis, and action - transforming how your business can understand and respond to your customers.
#19:Streaming: Instead of data ingested and processed in batches, Confluent delivers a fundamentally different paradigm where your pipeline is a network of event streams that’s continuously flowing everywhere it’s needed - whether it’s to your data warehouse, data lakes or your operational systems and apps. So, every downstream consumer has a consistent view of the most up to date data. This is what makes it extremely suitable as a real-time data pipeline.
Schema Management: With Confluent, you can create data contracts - an explicit agreement between the producer of data and the consumer of data formalizing the expected structure, semantics and enforce policies in the pipeline, so bad data doesn’t seep in.
Flink Stream Processing: We deliver native Stream processing with Flink, the defacto standard for stream processing. This means data can be continuously transformed, filtered, aggregated and enriched with other data sets to create new well curated views of the data. This eliminates the cost and complexity of redundant processing and enables your teams to build your data once, build it right and drive greater reuse.
Data Portal: Confluent’s Data Portal enables your engineers to securely search, discover and explore existing data assets across the organization, effortlessly request access to these streams directly within the GUI and easily build and manage data products to power real-time pipelines or applications. This helps improve developer productivity and agility and bring new applications to market faster.
Stream Lineage: Stream Lineage gives you that view of the journey of the data on Confluent - it’s like a google maps for your real-time data. You get both a birds eye view and a drill-down magnification for answering complex data relationships and questions in visual graphs, so you can learn details about your data quickly and make more informed decisions about its trustworthiness. You can clearly see where streams are coming from and where they are going to and how they are being consumed. This means, your teams no longer have to fear changing or evolving existing pipelines as all of the information is neatly catalogued within Confluent.
Tableflow: And finally, we’re building native integrations into Iceberg, so every stream, with a simple click, can have a corresponding Iceberg representation that’s ready-to-use. This integration greatly simplifies access to the data you need in exactly the format your analytics query engines need it in, saving you a ton of money and manual work that’s required to make data accessible to the S3 ecosystem.
#20:Here’s a single view of how all of this works together.
We have the best ecosystem of zero code connectors for data streaming. We also have various CDC connectors to synchronize your change data downstream. Once that data is in Confluent, you can apply quality controls with schema registry, use Flink stream processing for data enrichment and transformation while your data is in-flight and create ready-to-use data products that can be simultaneously consumed by your operational and analytical systems and applications. And with the simplification of operational data access in the analytical estate with Tableflow, we bring a truly unified view of your data, as a stream or as a table - bringing about a true convergence of the operational and analytical data estates.
#21:I want to touch upon the simplification and unification of data access across the operational and analytical data estates.
On the operational side, Kafka has become the data standard for real-time data streams. Kafka is great because it works easily with any data format, from any system, anywhere to power real-time use cases. And at Confluent, we’ve simplified a lot of the operational burden of using Kafka.
On the analytical side - there are various data lakes, warehouses, etc. that generally like things in table format. And, Apache Iceberg has emerged as a widely adopted standard to manage the tables of data that can feed into data lakes, warehouses.
As you might imagine, users of Kafka want to be able to use Iceberg to feed their data lakes, etc. with streaming data.
#22:Today, if you want to share the data with the vibrant ecosystem of S3 tools, you would have to do a bit of work.
Kafka streams need to be copied over into the lakehouse or your data warehouse and you have to manually map each stream into a table. You’d have to spin up infrastructure to consume the data out of Kafka. You’d have to convert the data into a universally accessible format like parquet. Then you’d have to make sure schemas are being properly applied and the types are being properly converted. And that’s just the work to get it into the data lake. After that, there's even more work to make the data performant. This is a ton of cost and compute just to get the data in a raw state in your data lake. It’s wasteful, brittle and error-prone.
___________________________ NOTES FOR AE/SE ON ICEBERG INTEGRATION___________________________
Setting up the infrastructure to consume and stream the data from Apache Kafka, including:
Configuring consumer groups or connectors
Ensuring the consumer groups are balanced and properly sized for the throughput and number of partitions in your topic
Feeding the data through a series of jobs that:
Convert data into a universally accessible format like parquet
Hook into Schema Registry or governance tooling that understands the expected schema, evolves the schema if necessary, and handles type conversions.
Constantly compacting and cleaning up the small files that are generated from continuous streaming data as they land in object storage to maintain acceptable read performance
If the data is a change log, materializing and applying the changes so the data is more useful to downstream users
This is just streaming and ingesting the data into object storage, which then provides raw tables potentially in Iceberg format, into your the lakehouse, where you’ll need to prep your data with filtering, enrichment, deduplication, and much more to get it ready to use for analytical purposes. At the end of all of that come your tables that are actually ready for consumption.
#23:In the analytical estate, it’s clear that Apache Iceberg is becoming the dominant standard for analytical data.
To automate a lot of this data work required for analytics, we’re adopting this emerging standard and natively integrating it into Confluent, so every stream, with a simple click, can have a corresponding Iceberg representation that’s ready-to-use. We’re making it easier than ever before to specify schemas, metadata, rules, semantic context and access the data you need in exactly the format your analytics query engines need it in.
This will help you save a ton of money and manual work required to make your data accessible to the S3 ecosystem – and by extension accessible in Iceberg tables and data lakes.
With the simplification of data access across both the operational and analytical estates, we bring a truly unified view of your data, as a stream or as a table - bringing about a true convergence of the operational and analytical data estates.
#24:We understand that shifting your processing and governance left can be a journey that happens incrementally and you can’t do this in one go.
#25:You can start the shift left journey one use case at a time or one data warehouse or data lake at a time. Pick a use case that your teams spend a lot of time on and begin shifting the processing and governance for that use case.
For scenarios where migrations are not immediately possible or desirable, you can use Confluent to augment existing systems and maintain a hybrid architecture, providing real-time interoperability with a modern architecture while slowly decommissioning your legacy approach.
~~~~ Discovery for the AE/ SE to find the right set of use cases / workloads to shift-left ~~~~~~
Where are you doing the most repetitive processing in Snowflake, where you license bill is escalating?
Where are you struggling with out of data data or data inconsistencies or data quality issues?
Look at stream catalog and identify what data ‘products” are already being sent to Confluent. Determine if those data sets are required for other use cases, currently being reprocessed in Snowflake or Databricks. Drive reuse of existing data sets and bring in new data sets for this new use case to increase product attach and consumption.
#26:You can then rinse and repeat this process for your other workloads and use cases and scale this out across your organization and over time, reduce the number of batch pipelines, the complexity of data preparation and the cost of your data warehouses or data lakes.