SlideShare a Scribd company logo
Building Streaming Applications with
Streaming SQL
Technical Lead, WSO2
Mohanadarshan Vivekanandalingam
Lead Solutions Engineer, WSO2
Vanjikumaran Sivajothy
ALL ABOUT
STREAMING APPS &
WSO2 STREAM PROCESSOR
Agenda
Streaming Application
An Application that provides
analytical operators to
orchestrate data flow, calculate
analytics, and detect patterns on
event data from multiple,
disparate live data sources to
allow developers to build
applications that sense, think, and
act in real-time.
- Forrester
Challenges !
Write Code
Complex Deployment
(5-6 nodes)
Inability to Change
Fast
WSO2 Streaming
Analytics
Solutions
- Streaming SQL + graphical
editor
- 2 node minimum HA
deployment (scale beyond
with distributed deployment)
- Templated SQL scripts + drag
& drop UI
WSO2 Stream Processor
● Lightweight, lean & cloud native
● Easy to learn Streaming SQL
● High performance analytics with just 2 nodes (HA)
● Native support for streaming Machine Learning
● Long term aggregations without batch analytics
● Highly scalable deployment with exactly-once processing
● Tools for development and monitoring
● Tools for business users to write their own rules
Overview of WSO2 Stream Processor
WSO2 Stream Processor
• Editor/Studio - Developer environment
• Worker/Resource - Resource node
• Dashboard
– Portal - Business dashboard
– Business Rules Manager - Management
console for business users
– Status Dashboard - Monitoring dashboard
• Manager - Job manager for distributed
processing
Profiles
WSO2 Analytics - Users
Use Case:
Online Shopping Application
Use Case
Sweet Factory Management
Use Case
• Monitor supply, production and sales
• Optimize resource utilization
• Detect and alert failures
• Predict demand
• Manage processing rules online
• Visualize real-time performance
Sweet Factory Management
Let’s start from the basics
Phases in Stream Processing
Receive
- Define Streams
- Configure
- Event Sources
- Event Mappings
- Define Mappings
Analyze
- Stream Processing
- Long-term Incremental
Processing
- Complex Event Processing
- Machine Learning
- Storage Integration
Report
- Define Streams
- Configure
- Event Sinks
- Event Mappings
- Publish results
- View results in dashboard
Streaming Processing
With WSO2 Stream Processor
Siddhi Streaming App
- Process events in streaming manner
- Isolated unit with set of queries, input and
output streams
- SQL Like Query Language
from Sales#window.time(1 hour)
select region, brand, avg(quantity) as AvgQuantity
group by region, brand
insert into LastHourSales ;
Stream
Processor
Siddhi App
{ Siddhi }
Input Streams Output Streams
Filter Aggregate
JoinTransform
Pattern
Siddhi Extensions
Developer Tools
Developer Studio
Supports both Drag n Drop
& Source Editor
Editor
Debugger
Simulation
Testing
Building Streaming Applications with Streaming SQL
Stream Processor Studio
• Writing Siddhi applications
– Syntax highlighting
– Auto completion
– Error reporting
– Documentation support
• Debugging Siddhi apps
– Inspect events
– Inspect query states
Developer Environment
Stream Processor Studio
• Testing Siddhi apps via Event Simulation
– Send Event by Event
– Simulate Random Data
– Simulate via CSV file
– Simulate from Database
• Support for running and testing on Python
– as PySiddhi
• IDE Tools
– Intellij Idea Plugin
Developer Environment
Let’s write some queries
Create a Siddhi App
@app:name(‘Sweet-Factory-Analytics’)
Name the Siddhi application
Define Input Streams
@app:name(‘Sweet-Factory-Analytics’)
define stream RawMaterialStream(name string, amount double);
define stream SweetProductionStream (name string, amount double);
Define input event streams
Define Event Sources
@app:name(‘Sweet-Factory-Analytics’)
@source(type = ‘http’, @map(type = ‘json’))
define stream RawMaterialStream(name string, amount double);
Consume and publish events via
MQTT, HTTP, TCP, Kafka, JMS, RabbitMQ, etc.
Define Event Sources
Default Mapping
@app:name(‘Sweet-Factory-Analytics’)
@source(type = ‘http’, @map(type = ‘json’))
define stream RawMaterialStream(name string, amount double);
Using default mapping
{
"event": {
"name": "sugar",
"amount": 75.5
}
}
Supported Mapping
Text, XML, JSON, Binary, KeyValue,
WSO2Event, etc.
Define Event Sources
Custom Mapping
@app:name(‘Sweet-Factory-Analytics’)
@source(type = ‘http’, @map(type = ‘json’))
define stream RawMaterialStream
(name string, amount double);
@source(type = ‘http’, @map(type = ‘json’,
@attributes( name = ‘$.item.id’,
amount = ‘$.item.amount’)))
define stream SweetProductionStream (name string, amount double);
Using custom mapping
{
"item": {
"id": "toffees",
"amount": 30.0
}
}
Define Event Sinks
To Log Events
@app:name(‘Sweet-Factory-Analytics’)
@source(type = ‘http’, @map(type = ‘json’))
@sink(type =‘log’, level = ‘info’)
define stream RawMaterialStream(name string, amount double);
@source(type = ‘http’, @map(type = ‘json’,
@attributes( name = ‘$.item.id’,
amount = ‘$.item.amount’)))
@sink(type =‘log’, level = ‘info’)
define stream SweetProductionStream (name string, amount double);
Logger sink
to log events on console
Use Case 1 :
Production at each factory should not
reduce below 5000 units per hour !
1.1 Monitor and Identify events that
indicate low production
Total Amount Produced
define stream SweetProductionStream (name string, amount double);
from SweetProductionStream
select sum(amount) as hourlyTotal
insert into LowProducitonAlertStream ;
Calculate total amount
produced forever
Total Amount Produced in the Last Hour
define stream SweetProductionStream (name string, amount double);
from SweetProductionStream#window.time(1 hour)
select sum(amount) as hourlyTotal
insert into LowProducitonAlertStream ; Calculate total amount
produced for last hour
Amount Produced Per Product
define stream SweetProductionStream (name string, amount double);
from SweetProductionStream#window.time(1 hour)
select name, sum(amount) as hourlyTotal
group by name
insert into LowProducitonAlertStream ;
Calculate total amount
produced for each product
Identify Low Production Rates
define stream SweetProductionStream (name string, amount double);
from SweetProductionStream#window.time(1 hour)
select name, sum(amount) as hourlyTotal
group by name
having hourlyTotal < 5000
insert into LowProducitonAlertStream ;
Filter events where produced
amount is less than 5000
Consider Working Hours for Calculation
define stream SweetProductionStream (name string, amount double);
from SweetProductionStream#window.time(1 hour)
select name, sum(amount) as hourlyTotal,
time:extract(currentTimeMillis(), 'HOUR') as currentHour
group by name
having hourlyTotal < 5000 and
currentHour > 9 and currentHour < 17
insert into LowProducitonAlertStream ;
Use functions to extract the
hour of event arrival time
Rate Limit Low Production Alerts
define stream SweetProductionStream (name string, amount double);
from SweetProductionStream#window.time(1 hour)
select name, sum(amount) as hourlyTotal,
time:extract(currentTimeMillis(), 'HOUR') as currentHour
group by name
having hourlyTotal < 5000 and
currentHour > 9 and currentHour < 17
output last every 15 min
insert into LowProducitonAlertStream ;
Send alerts every 15 minutes
1.2 Send alerts to factory managers
via email
Send Alerts via Email
@sink(type =‘email’, to=‘manager@sf.com’,
subject=‘Low Production of {{name}}!’,
@map (type=‘text’, @payload(“““
Hi Manager,
Production of {{name}} has gone down to {{hourlyTotal}}
in last hour!
From Sweet Factory”””)))
define stream LowProducitonAlertStream (name string, hourlyTotal double,
currentHour int);
Context sensitive
email
Use Case 2 :
Raw material storage at the factories
should be closely monitored
2.1 Store raw material shipment
details in a data store
Data Store Integration
● Allow to perform operations with the data store while
processing the events on the fly.
Store, Retrieve, Remove and Modify
● Provides a REST endpoint to Query Data Store
● Query Optimizations using Primary and Indexing Keys
● Search ● Insert ● Delete ● Update ● Insert/Update
Store Raw Material Info
@source(type = ‘http’, @map(type = ‘json’))
define stream RawMaterialStream(name string, amount double);
define table LatestShipmentDetailTable (name string, amount double);
In-memory table to
store last shipment of raw
material
Store Data
With Primary Key & Index
@source(type = ‘http’, @map(type = ‘json’))
define stream RawMaterialStream(name string, amount double);
@primaryKey(‘name’)
@Index(‘amount’)
define table LatestShipmentDetailTable (name string, amount double);
Support for Primary Key and
Index for fast data access
Store in External Data Store
@source(type = ‘http’, @map(type = ‘json’))
define stream RawMaterialStream(name string, amount double);
@store(type=‘rdbms’, … )
@primaryKey(‘name’)
@Index(‘amount’)
define table LatestShipmentDetailTable (name string, amount double);
Table backed by
RDBMS, MongoDB, HBase, Cassandra, Solr,
Hazelcast, etc.
Insert Events into Table
@source(type = ‘http’, @map(type = ‘json’))
define stream RawMaterialStream(name string, amount double);
@store(type=‘rdbms’, … )
@primaryKey(‘name’)
@Index(‘amount’)
define table LatestShipmentDetailTable (name string, amount double);
from RawMaterialStream
select name, amount
insert into LatestShipmentDetailTable ;
Insert into table from stream
Update-Insert Events into Table
@source(type = ‘http’, @map(type = ‘json’))
define stream RawMaterialStream(name string, amount double);
@store(type=‘rdbms’, … )
@primaryKey(‘name’)
@Index(‘amount’)
define table LatestShipmentDetailTable (name string, amount double);
from RawMaterialStream
select name, amount
update or insert into LatestShipmentDetailTable
on LatestShipmentDetailTable.name == name ;
Update or Insert into
the table with stream
2.2 Calculate hourly to yearly raw
material storage by each type
Streaming Data Summarization
Aggregations Over Long Time Periods
• Incremental Aggregation for every
– Seconds, Minutes, Hours, Days, …, Year
• Support for out-of-order event arrival
• Fast data retrieval from memory and disk for
real time update
Current Min
Current Hour
Sec
Min
Hour
0 - 1 - 5 ...
- 1
- 2 - 3 - 4 - 64 - 65 ...
- 2
- 124
Define Aggregation
define stream RawMaterialStream(name string, amount double);
define aggregation RawMaterialAggregation
from RawMaterialStream
select name, sum(amount) as totalAmount, avg(amount) as
averageAmount
group by name
aggregate every min … year
Calculate total and average amount
for each
min upto year
Define Aggregation ...
define stream RawMaterialStream(name string, amount double);
@store(type=‘rdbms’, … )
define aggregation RawMaterialAggregation
from RawMaterialStream
select name, sum(amount) as totalAmount, avg(amount) as
averageAmount
group by name
aggregate every min … year
Like Table Store Aggregation in
RDBMS, MongoDB, HBase, Cassandra, Solr, Hazelcast, etc.
Data Retrieval API
• Can perform data search on Data
Stores or pre-defined Aggregations.
• Supports both REST and Java APIs
Retrieve Summarized Data
Perform REST Call
curl -X POST https://localhost:9443/stores/query
-H "content-type: application/json"
-u "admin:admin"
-d '{"appName" : "Sweet-Factory-Analytics-3",
"query" : "from RawMaterialAggregation
on name == 'caramel'
within '2018-**-** **:**:**'
per 'minutes'
select name, totalAmount, averageAmount ;"
}'
2.3 Visualize summary results in a
dashboard
Portal
Dashboard & Widgets for Business Users
• Generate dashboard and widgets
• Fine grained permissions
– Dashboard level
– Widget level
– Data level
• Localization support
• Inter widget communication
• Shareable dashboards with widget state persistence
Dashboard & Widgets
Use Case 3 :
Warehouse managers should be
alerted if there will be a shortage of
raw material for future production
cycles
3.1 Check if the current raw material
input rate is enough for production
Join Raw Material with Production Input
define stream RawMaterialStream (name string, amount double);
define stream ProductionInputStream (name string, amount double);
from ProductionInputStream#window.time(1 hour) as p
join RawMaterialStream#window.time(1 hour) as r
on r.name == p.name
select r.name, sum(r.amount) as totalRawMaterial,
sum(p.amount) as totalConsumption
group by r.name
having (totalConsumption - totalRawMaterial)*100.0 / totalRawMaterial > 5
insert into RawMaterialInputRateAlertStream ;
Identify 5% increase by
joining two streams
Join with External Window
define window RawMaterialWindow (name string, amount double) time(1 hour);
define stream ProductionInputStream (name string, amount double);
from ProductionInputStream#window.time(1 hour) as p
join RawMaterialWindow as r
on r.name == p.name
select r.name, sum(r.amount) as totalRawMaterial,
sum(p.amount) as totalConsumption
group by r.name
having (totalConsumption - totalRawMaterial)*100.0 / totalRawMaterial > 5
insert into RawMaterialInputRateAlertStream ;
Joining a stream with a
defined window
3.2 Predict the future raw material
requirement
Real-time Predictions
Using Machine Learning
• Use pre-created machine learning
models and perform predictions.
– PMML, TensorFlow, etc
• Streaming Machine Learning
– Clustering, Classification,
Regression
– Markov Models, Anomaly
detection, etc...
Using Pre-built PMML Model
define stream ProductionInputStream
(name string, currentHourAmmount double,
previousHourAmount double);
from ProductionInputStream#pmml:predict(‘file/model.pmml’, name,
previousHourAmount, currentHourAmmount )
select name, nextHourAmount, getEventTime() as currentTime
insert into PredictedProdInputStream ;
Predict required raw materials using a
static model
Online Machine Learning
define stream ProductionInputStream (currentHourAmount double,
previousHourAmount double );
define stream ProductionInputResultsStream ( currentHourAmount double,
previousHourAmount double, nextHourAmount double );
from ProductionInputResultsStream#streamingml:updateAMRulesRegressor
(currentHourAmount, previousHourAmount, nextHourAmount )
select *
insert into TrainOutputStream;
from ProductionInputStream#streamingml:AMRulesRegressor
(currentHourAmount, previousHourAmount )
select currentHourAmount, previousHourAmount, prediction as nextHourAmount
insert into PredictedProdInputStream;
Predict required raw materials while
learning in a streaming manner.
3.3 Check predicted raw material
availability with warehouse stocks
and alert if insufficient
Predict & Alert
define window RawMaterialWindow (name string, amount double) time(1 hour);
define stream ProductionInputResultsStream ( currentHourAmount double,
previousHourAmount double, nextHourAmount double );
from ProductionInputResultsStream#streamingml:updateAMRulesRegressor
(currentHourAmount, previousHourAmount, nextHourAmount )
select *
insert into TrainOutputStream;
from PredictedProdInputStream as p join RawMaterialWindow as r
on r.name == p.name
select r.name, p.predictedAmount, sum(r.amount) as totalRawMaterial
having totalRawMaterial < totalConsumption
insert into RawMaterialInputRateAlertStream ;
Use Case 4 :
Factory Managers should be alerted if
production does not start within 15
min from raw material arrival
Non-occurrence through Patterns
define stream RawMaterialStream (name string, amount double);
define stream ProductionInputStream (name string, amount double);
from every (e1 = RawMaterialStream)
-> not ProductionInputStream[name == e1.name and
amount == e1.amount] for 15 min
select e1.name, e1.amount
insert into ProductionStartDelayed ; Identity non-occurrence Pattern
Use Case 5 :
Alert factory managers if rate of
production continuously decreases for
X time period
5.1 Identify production trends over a
time period
Identify Trends
define stream SweetProductionStream(name string, amount double);
from SweetProductionStream#window.timeBatch(1 min)
select name, sum(amount) as amount, currentTimeMillis() as timestamp
group by name
insert into LastMinProdStream;
partition with (name of LastMinProdStream)
begin
from every e1=LastMinProdStream,
e2=LastMinProdStream[timestamp - e1.timestamp < 10 * 60000
and e1.amount > amount]*,
e3=LastMinProdStream[timestamp - e1.timestamp > 10 * 60000
and e2[last].amount > amount]
select e1.name, e1.amount as initalamout, e3.amount as finalAmount
insert into ContinousProdReductionStream ;
end;
Identify decreasing trends
for 10 mins
5.2 Make this rule configurable
Business Rules Manager
• Hide Siddhi app creation complexity to business users
• Build rules via a simple web-based UI
– From scratch
Build custom filters to event streams
– From a template
Build rules from developer created template
Dashboard for Rule Management
Business Rules Manager
Business Rules Manager
Template as Business Rules
define stream SweetProductionStream(...);
…
partition by (name of LastMinProdStream)
begin
from every e1=LastMinProdStream,
e2=LastMinProdStream[timestamp - e1.timestamp < $TimeInMin * 60000
and e1.amount > amount]*,
e3=LastMinProdStream[timestamp - e1.timestamp > $TimeInMin * 60000
and e2[last].amount > amount]
select e1.name, e1.amount as initalamout, e3.amount as finalAmount
insert into ContinousProdReductionStream ;
end;
Identify decreasing trend for
X mins
Deploying Stream Processor
Minimum HA with 2 Nodes
Stream Processor
Stream Processor
• High Performance
– Process around 100k
events/sec
– Just 2 nodes
– While most others need 5+
• Zero Downtime
• Zero Event Loss
• Simple deployment with RDBMS
– No zookeeper, kafka, etc
• Multi Data Center Support
Event Sources
Siddhi App
Siddhi App
Siddhi App
Siddhi App
Siddhi App
Siddhi App
Event
Store
• Exactly-once processing
• Fault tolerance
• Highly scalable
• No back pressure
• Distributed development configurations via annotations
• Pluggable distribution options (YARN, K8, etc.)
Distributed Deployment
Distributed Deployment with Kafka
Event
Source
Event
Sink
Siddhi
App
Siddhi
App
Siddhi
App
Siddhi
App
Siddhi
App
Siddhi
App
Siddhi
App
Siddhi
App
Siddhi
App
Siddhi
App
Kafka Topic
Kafka Topic
Kafka Topic
Kafka
Topic
Kafka
Topic
Siddhi App for Distributed Deployment
@source(type = ‘kafka’,..., @map(type = ‘json’))
define stream ProductionStream (name string, amount double, factoryId int);
@dist(parallel = ‘4’, execGroup = ‘gp1’)
from ProductionStream[amount > 100]
select *
insert into HighProductionStream ;
@dist(parallel = ‘2’, execGroup = ‘gp2’)
partition with (factoryId of HighProductionStream)
Begin
from HighProductionStream#window.timeBatch(1 min)
select factoryId, sum(amount) as amount
group by factoryId
insert into ProdRateStream ;
end;
Filter
Source
FilterFilterFilter
PartitionPartition
Building Streaming Applications with Streaming SQL
Extensions & Supporting
Tools
Analytics Extension Store
https://store.wso2.com/store/assets/analyticsextension/list
Monitoring Streaming Apps
Status Dashboard
• Understand system performance via
– Throughput
– Latency
– CPU, memory utilizations
• Monitor in various scales
– Node Level
– Siddhi App Level
– Siddhi Query Level
Monitor resource nodes and Siddhi apps
Building Streaming Applications with Streaming SQL
Building Streaming Applications with Streaming SQL
Inbuilt Support for
Analytics Use Cases
HTTP Analytics
Message Tracing
Analytics Solutions
• Finance and Banking
• Retail
• Location
• Operational
• Smart Energy
• Social Media
• System and Network
• Healthcare
Available Options
Running Siddhi on the Edge
● Lightweight and lean
● OOTB support for consuming events from Android
sensors
● Support for Python
○ https://github.com/wso2/PySiddhi/
In Android & Raspberry Pi
WSO2 Stream Processor 4.2.0 Release.
Distribution URL: https://wso2.com/analytics
Download, Try & Comment
THANK YOU
wso2.com

More Related Content

PDF
SQL on everything, in memory
PDF
Streaming SQL
PDF
Streaming SQL
PDF
Why you care about
 relational algebra (even though you didn’t know it)
PDF
Streaming SQL
PDF
Apache Calcite: One Frontend to Rule Them All
PDF
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
PPTX
Cost-based query optimization in Apache Hive
SQL on everything, in memory
Streaming SQL
Streaming SQL
Why you care about
 relational algebra (even though you didn’t know it)
Streaming SQL
Apache Calcite: One Frontend to Rule Them All
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Cost-based query optimization in Apache Hive

What's hot (20)

PDF
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
PPTX
Salesforce Summer 14 Release
PDF
Harnessing the power of YARN with Apache Twill
PDF
The Mechanics of Testing Large Data Pipelines (QCon London 2016)
PDF
Streaming SQL
PPTX
Apache Beam (incubating)
PDF
Implementing Server Side Data Synchronization for Mobile Apps
PPTX
Omid: scalable and highly available transaction processing for Apache Phoenix
PDF
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
PDF
Implementing data sync apis for mibile apps @cloudconf
PPTX
Apache Beam: A unified model for batch and stream processing data
PPTX
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch - Dynami...
PDF
Server side data sync for mobile apps with silex
PDF
Hive 3 a new horizon
PDF
WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0
PDF
Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data...
PDF
WSO2 Complex Event Processor
PDF
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
PDF
Joining the Club: Using Spark to Accelerate Big Data at Dollar Shave Club
PDF
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Salesforce Summer 14 Release
Harnessing the power of YARN with Apache Twill
The Mechanics of Testing Large Data Pipelines (QCon London 2016)
Streaming SQL
Apache Beam (incubating)
Implementing Server Side Data Synchronization for Mobile Apps
Omid: scalable and highly available transaction processing for Apache Phoenix
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Implementing data sync apis for mibile apps @cloudconf
Apache Beam: A unified model for batch and stream processing data
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch - Dynami...
Server side data sync for mobile apps with silex
Hive 3 a new horizon
WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0
Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data...
WSO2 Complex Event Processor
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Joining the Club: Using Spark to Accelerate Big Data at Dollar Shave Club
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Ad

Similar to Building Streaming Applications with Streaming SQL (20)

PDF
Patterns for Building Streaming Apps
PDF
[WSO2Con USA 2018] Patterns for Building Streaming Apps
PDF
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
PPTX
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
PDF
WSO2 Analytics Platform: The one stop shop for all your data needs
PDF
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise
PDF
[WSO2Con Asia 2018] Patterns for Building Streaming Apps
PDF
WSO2 Analytics Platform - The one stop shop for all your data needs
PPTX
Api Statistics- The Scalable Way
PDF
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
PDF
KPI definition with Business Activity Monitor 2.0
PPTX
Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...
PDF
Complex Event Processor 3.0.0 - An overview of upcoming features
PPTX
QSpiders - Installation and Brief Dose of Load Runner
PPTX
Introduction to WSO2 Data Analytics Platform
PPTX
Azure Stream Analytics : Analyse Data in Motion
PPTX
Performance eng prakash.sahu
PDF
WSO2 Product Release Webinar - WSO2 Complex Event Processor
PPTX
Microsoft Windows Server AppFabric
PPTX
Implementing Real-Time IoT Stream Processing in Azure
Patterns for Building Streaming Apps
[WSO2Con USA 2018] Patterns for Building Streaming Apps
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2 Analytics Platform: The one stop shop for all your data needs
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise
[WSO2Con Asia 2018] Patterns for Building Streaming Apps
WSO2 Analytics Platform - The one stop shop for all your data needs
Api Statistics- The Scalable Way
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
KPI definition with Business Activity Monitor 2.0
Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...
Complex Event Processor 3.0.0 - An overview of upcoming features
QSpiders - Installation and Brief Dose of Load Runner
Introduction to WSO2 Data Analytics Platform
Azure Stream Analytics : Analyse Data in Motion
Performance eng prakash.sahu
WSO2 Product Release Webinar - WSO2 Complex Event Processor
Microsoft Windows Server AppFabric
Implementing Real-Time IoT Stream Processing in Azure
Ad

Recently uploaded (20)

PDF
Company Presentation pada Perusahaan ADB.pdf
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Report The-State-of-AIOps 20232032 3.pdf
PPTX
Logistic Regression ml machine learning.pptx
PPTX
artificial intelligence deeplearning-200712115616.pptx
PPTX
Trading Procedures (1).pptxcffcdddxxddsss
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
PPTX
LESSON-1-NATURE-OF-MATHEMATICS.pptx patterns
PPTX
Global journeys: estimating international migration
PPT
Reliability_Chapter_ presentation 1221.5784
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
lec_5(probability).pptxzzjsjsjsjsjsjjsjjssj
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Linux OS guide to know, operate. Linux Filesystem, command, users and system
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Presentation1.pptxvhhh. H ycycyyccycycvvv
PPTX
Measurement of Afordability for Water Supply and Sanitation in Bangladesh .pptx
PDF
Data Science Trends & Career Guide---ppt
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Company Presentation pada Perusahaan ADB.pdf
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Report The-State-of-AIOps 20232032 3.pdf
Logistic Regression ml machine learning.pptx
artificial intelligence deeplearning-200712115616.pptx
Trading Procedures (1).pptxcffcdddxxddsss
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
LESSON-1-NATURE-OF-MATHEMATICS.pptx patterns
Global journeys: estimating international migration
Reliability_Chapter_ presentation 1221.5784
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
lec_5(probability).pptxzzjsjsjsjsjsjjsjjssj
Miokarditis (Inflamasi pada Otot Jantung)
Linux OS guide to know, operate. Linux Filesystem, command, users and system
IB Computer Science - Internal Assessment.pptx
Presentation1.pptxvhhh. H ycycyyccycycvvv
Measurement of Afordability for Water Supply and Sanitation in Bangladesh .pptx
Data Science Trends & Career Guide---ppt
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”

Building Streaming Applications with Streaming SQL

  • 1. Building Streaming Applications with Streaming SQL Technical Lead, WSO2 Mohanadarshan Vivekanandalingam Lead Solutions Engineer, WSO2 Vanjikumaran Sivajothy
  • 2. ALL ABOUT STREAMING APPS & WSO2 STREAM PROCESSOR Agenda
  • 3. Streaming Application An Application that provides analytical operators to orchestrate data flow, calculate analytics, and detect patterns on event data from multiple, disparate live data sources to allow developers to build applications that sense, think, and act in real-time. - Forrester
  • 4. Challenges ! Write Code Complex Deployment (5-6 nodes) Inability to Change Fast
  • 6. Solutions - Streaming SQL + graphical editor - 2 node minimum HA deployment (scale beyond with distributed deployment) - Templated SQL scripts + drag & drop UI
  • 8. ● Lightweight, lean & cloud native ● Easy to learn Streaming SQL ● High performance analytics with just 2 nodes (HA) ● Native support for streaming Machine Learning ● Long term aggregations without batch analytics ● Highly scalable deployment with exactly-once processing ● Tools for development and monitoring ● Tools for business users to write their own rules Overview of WSO2 Stream Processor
  • 9. WSO2 Stream Processor • Editor/Studio - Developer environment • Worker/Resource - Resource node • Dashboard – Portal - Business dashboard – Business Rules Manager - Management console for business users – Status Dashboard - Monitoring dashboard • Manager - Job manager for distributed processing Profiles
  • 13. Use Case • Monitor supply, production and sales • Optimize resource utilization • Detect and alert failures • Predict demand • Manage processing rules online • Visualize real-time performance Sweet Factory Management
  • 14. Let’s start from the basics
  • 15. Phases in Stream Processing Receive - Define Streams - Configure - Event Sources - Event Mappings - Define Mappings Analyze - Stream Processing - Long-term Incremental Processing - Complex Event Processing - Machine Learning - Storage Integration Report - Define Streams - Configure - Event Sinks - Event Mappings - Publish results - View results in dashboard
  • 16. Streaming Processing With WSO2 Stream Processor Siddhi Streaming App - Process events in streaming manner - Isolated unit with set of queries, input and output streams - SQL Like Query Language from Sales#window.time(1 hour) select region, brand, avg(quantity) as AvgQuantity group by region, brand insert into LastHourSales ; Stream Processor Siddhi App { Siddhi } Input Streams Output Streams Filter Aggregate JoinTransform Pattern Siddhi Extensions
  • 18. Developer Studio Supports both Drag n Drop & Source Editor Editor Debugger Simulation Testing
  • 20. Stream Processor Studio • Writing Siddhi applications – Syntax highlighting – Auto completion – Error reporting – Documentation support • Debugging Siddhi apps – Inspect events – Inspect query states Developer Environment
  • 21. Stream Processor Studio • Testing Siddhi apps via Event Simulation – Send Event by Event – Simulate Random Data – Simulate via CSV file – Simulate from Database • Support for running and testing on Python – as PySiddhi • IDE Tools – Intellij Idea Plugin Developer Environment
  • 23. Create a Siddhi App @app:name(‘Sweet-Factory-Analytics’) Name the Siddhi application
  • 24. Define Input Streams @app:name(‘Sweet-Factory-Analytics’) define stream RawMaterialStream(name string, amount double); define stream SweetProductionStream (name string, amount double); Define input event streams
  • 25. Define Event Sources @app:name(‘Sweet-Factory-Analytics’) @source(type = ‘http’, @map(type = ‘json’)) define stream RawMaterialStream(name string, amount double); Consume and publish events via MQTT, HTTP, TCP, Kafka, JMS, RabbitMQ, etc.
  • 26. Define Event Sources Default Mapping @app:name(‘Sweet-Factory-Analytics’) @source(type = ‘http’, @map(type = ‘json’)) define stream RawMaterialStream(name string, amount double); Using default mapping { "event": { "name": "sugar", "amount": 75.5 } } Supported Mapping Text, XML, JSON, Binary, KeyValue, WSO2Event, etc.
  • 27. Define Event Sources Custom Mapping @app:name(‘Sweet-Factory-Analytics’) @source(type = ‘http’, @map(type = ‘json’)) define stream RawMaterialStream (name string, amount double); @source(type = ‘http’, @map(type = ‘json’, @attributes( name = ‘$.item.id’, amount = ‘$.item.amount’))) define stream SweetProductionStream (name string, amount double); Using custom mapping { "item": { "id": "toffees", "amount": 30.0 } }
  • 28. Define Event Sinks To Log Events @app:name(‘Sweet-Factory-Analytics’) @source(type = ‘http’, @map(type = ‘json’)) @sink(type =‘log’, level = ‘info’) define stream RawMaterialStream(name string, amount double); @source(type = ‘http’, @map(type = ‘json’, @attributes( name = ‘$.item.id’, amount = ‘$.item.amount’))) @sink(type =‘log’, level = ‘info’) define stream SweetProductionStream (name string, amount double); Logger sink to log events on console
  • 29. Use Case 1 : Production at each factory should not reduce below 5000 units per hour !
  • 30. 1.1 Monitor and Identify events that indicate low production
  • 31. Total Amount Produced define stream SweetProductionStream (name string, amount double); from SweetProductionStream select sum(amount) as hourlyTotal insert into LowProducitonAlertStream ; Calculate total amount produced forever
  • 32. Total Amount Produced in the Last Hour define stream SweetProductionStream (name string, amount double); from SweetProductionStream#window.time(1 hour) select sum(amount) as hourlyTotal insert into LowProducitonAlertStream ; Calculate total amount produced for last hour
  • 33. Amount Produced Per Product define stream SweetProductionStream (name string, amount double); from SweetProductionStream#window.time(1 hour) select name, sum(amount) as hourlyTotal group by name insert into LowProducitonAlertStream ; Calculate total amount produced for each product
  • 34. Identify Low Production Rates define stream SweetProductionStream (name string, amount double); from SweetProductionStream#window.time(1 hour) select name, sum(amount) as hourlyTotal group by name having hourlyTotal < 5000 insert into LowProducitonAlertStream ; Filter events where produced amount is less than 5000
  • 35. Consider Working Hours for Calculation define stream SweetProductionStream (name string, amount double); from SweetProductionStream#window.time(1 hour) select name, sum(amount) as hourlyTotal, time:extract(currentTimeMillis(), 'HOUR') as currentHour group by name having hourlyTotal < 5000 and currentHour > 9 and currentHour < 17 insert into LowProducitonAlertStream ; Use functions to extract the hour of event arrival time
  • 36. Rate Limit Low Production Alerts define stream SweetProductionStream (name string, amount double); from SweetProductionStream#window.time(1 hour) select name, sum(amount) as hourlyTotal, time:extract(currentTimeMillis(), 'HOUR') as currentHour group by name having hourlyTotal < 5000 and currentHour > 9 and currentHour < 17 output last every 15 min insert into LowProducitonAlertStream ; Send alerts every 15 minutes
  • 37. 1.2 Send alerts to factory managers via email
  • 38. Send Alerts via Email @sink(type =‘email’, to=‘[email protected]’, subject=‘Low Production of {{name}}!’, @map (type=‘text’, @payload(“““ Hi Manager, Production of {{name}} has gone down to {{hourlyTotal}} in last hour! From Sweet Factory”””))) define stream LowProducitonAlertStream (name string, hourlyTotal double, currentHour int); Context sensitive email
  • 39. Use Case 2 : Raw material storage at the factories should be closely monitored
  • 40. 2.1 Store raw material shipment details in a data store
  • 41. Data Store Integration ● Allow to perform operations with the data store while processing the events on the fly. Store, Retrieve, Remove and Modify ● Provides a REST endpoint to Query Data Store ● Query Optimizations using Primary and Indexing Keys ● Search ● Insert ● Delete ● Update ● Insert/Update
  • 42. Store Raw Material Info @source(type = ‘http’, @map(type = ‘json’)) define stream RawMaterialStream(name string, amount double); define table LatestShipmentDetailTable (name string, amount double); In-memory table to store last shipment of raw material
  • 43. Store Data With Primary Key & Index @source(type = ‘http’, @map(type = ‘json’)) define stream RawMaterialStream(name string, amount double); @primaryKey(‘name’) @Index(‘amount’) define table LatestShipmentDetailTable (name string, amount double); Support for Primary Key and Index for fast data access
  • 44. Store in External Data Store @source(type = ‘http’, @map(type = ‘json’)) define stream RawMaterialStream(name string, amount double); @store(type=‘rdbms’, … ) @primaryKey(‘name’) @Index(‘amount’) define table LatestShipmentDetailTable (name string, amount double); Table backed by RDBMS, MongoDB, HBase, Cassandra, Solr, Hazelcast, etc.
  • 45. Insert Events into Table @source(type = ‘http’, @map(type = ‘json’)) define stream RawMaterialStream(name string, amount double); @store(type=‘rdbms’, … ) @primaryKey(‘name’) @Index(‘amount’) define table LatestShipmentDetailTable (name string, amount double); from RawMaterialStream select name, amount insert into LatestShipmentDetailTable ; Insert into table from stream
  • 46. Update-Insert Events into Table @source(type = ‘http’, @map(type = ‘json’)) define stream RawMaterialStream(name string, amount double); @store(type=‘rdbms’, … ) @primaryKey(‘name’) @Index(‘amount’) define table LatestShipmentDetailTable (name string, amount double); from RawMaterialStream select name, amount update or insert into LatestShipmentDetailTable on LatestShipmentDetailTable.name == name ; Update or Insert into the table with stream
  • 47. 2.2 Calculate hourly to yearly raw material storage by each type
  • 48. Streaming Data Summarization Aggregations Over Long Time Periods • Incremental Aggregation for every – Seconds, Minutes, Hours, Days, …, Year • Support for out-of-order event arrival • Fast data retrieval from memory and disk for real time update Current Min Current Hour Sec Min Hour 0 - 1 - 5 ... - 1 - 2 - 3 - 4 - 64 - 65 ... - 2 - 124
  • 49. Define Aggregation define stream RawMaterialStream(name string, amount double); define aggregation RawMaterialAggregation from RawMaterialStream select name, sum(amount) as totalAmount, avg(amount) as averageAmount group by name aggregate every min … year Calculate total and average amount for each min upto year
  • 50. Define Aggregation ... define stream RawMaterialStream(name string, amount double); @store(type=‘rdbms’, … ) define aggregation RawMaterialAggregation from RawMaterialStream select name, sum(amount) as totalAmount, avg(amount) as averageAmount group by name aggregate every min … year Like Table Store Aggregation in RDBMS, MongoDB, HBase, Cassandra, Solr, Hazelcast, etc.
  • 51. Data Retrieval API • Can perform data search on Data Stores or pre-defined Aggregations. • Supports both REST and Java APIs
  • 52. Retrieve Summarized Data Perform REST Call curl -X POST https://localhost:9443/stores/query -H "content-type: application/json" -u "admin:admin" -d '{"appName" : "Sweet-Factory-Analytics-3", "query" : "from RawMaterialAggregation on name == 'caramel' within '2018-**-** **:**:**' per 'minutes' select name, totalAmount, averageAmount ;" }'
  • 53. 2.3 Visualize summary results in a dashboard
  • 54. Portal Dashboard & Widgets for Business Users • Generate dashboard and widgets • Fine grained permissions – Dashboard level – Widget level – Data level • Localization support • Inter widget communication • Shareable dashboards with widget state persistence
  • 56. Use Case 3 : Warehouse managers should be alerted if there will be a shortage of raw material for future production cycles
  • 57. 3.1 Check if the current raw material input rate is enough for production
  • 58. Join Raw Material with Production Input define stream RawMaterialStream (name string, amount double); define stream ProductionInputStream (name string, amount double); from ProductionInputStream#window.time(1 hour) as p join RawMaterialStream#window.time(1 hour) as r on r.name == p.name select r.name, sum(r.amount) as totalRawMaterial, sum(p.amount) as totalConsumption group by r.name having (totalConsumption - totalRawMaterial)*100.0 / totalRawMaterial > 5 insert into RawMaterialInputRateAlertStream ; Identify 5% increase by joining two streams
  • 59. Join with External Window define window RawMaterialWindow (name string, amount double) time(1 hour); define stream ProductionInputStream (name string, amount double); from ProductionInputStream#window.time(1 hour) as p join RawMaterialWindow as r on r.name == p.name select r.name, sum(r.amount) as totalRawMaterial, sum(p.amount) as totalConsumption group by r.name having (totalConsumption - totalRawMaterial)*100.0 / totalRawMaterial > 5 insert into RawMaterialInputRateAlertStream ; Joining a stream with a defined window
  • 60. 3.2 Predict the future raw material requirement
  • 61. Real-time Predictions Using Machine Learning • Use pre-created machine learning models and perform predictions. – PMML, TensorFlow, etc • Streaming Machine Learning – Clustering, Classification, Regression – Markov Models, Anomaly detection, etc...
  • 62. Using Pre-built PMML Model define stream ProductionInputStream (name string, currentHourAmmount double, previousHourAmount double); from ProductionInputStream#pmml:predict(‘file/model.pmml’, name, previousHourAmount, currentHourAmmount ) select name, nextHourAmount, getEventTime() as currentTime insert into PredictedProdInputStream ; Predict required raw materials using a static model
  • 63. Online Machine Learning define stream ProductionInputStream (currentHourAmount double, previousHourAmount double ); define stream ProductionInputResultsStream ( currentHourAmount double, previousHourAmount double, nextHourAmount double ); from ProductionInputResultsStream#streamingml:updateAMRulesRegressor (currentHourAmount, previousHourAmount, nextHourAmount ) select * insert into TrainOutputStream; from ProductionInputStream#streamingml:AMRulesRegressor (currentHourAmount, previousHourAmount ) select currentHourAmount, previousHourAmount, prediction as nextHourAmount insert into PredictedProdInputStream; Predict required raw materials while learning in a streaming manner.
  • 64. 3.3 Check predicted raw material availability with warehouse stocks and alert if insufficient
  • 65. Predict & Alert define window RawMaterialWindow (name string, amount double) time(1 hour); define stream ProductionInputResultsStream ( currentHourAmount double, previousHourAmount double, nextHourAmount double ); from ProductionInputResultsStream#streamingml:updateAMRulesRegressor (currentHourAmount, previousHourAmount, nextHourAmount ) select * insert into TrainOutputStream; from PredictedProdInputStream as p join RawMaterialWindow as r on r.name == p.name select r.name, p.predictedAmount, sum(r.amount) as totalRawMaterial having totalRawMaterial < totalConsumption insert into RawMaterialInputRateAlertStream ;
  • 66. Use Case 4 : Factory Managers should be alerted if production does not start within 15 min from raw material arrival
  • 67. Non-occurrence through Patterns define stream RawMaterialStream (name string, amount double); define stream ProductionInputStream (name string, amount double); from every (e1 = RawMaterialStream) -> not ProductionInputStream[name == e1.name and amount == e1.amount] for 15 min select e1.name, e1.amount insert into ProductionStartDelayed ; Identity non-occurrence Pattern
  • 68. Use Case 5 : Alert factory managers if rate of production continuously decreases for X time period
  • 69. 5.1 Identify production trends over a time period
  • 70. Identify Trends define stream SweetProductionStream(name string, amount double); from SweetProductionStream#window.timeBatch(1 min) select name, sum(amount) as amount, currentTimeMillis() as timestamp group by name insert into LastMinProdStream; partition with (name of LastMinProdStream) begin from every e1=LastMinProdStream, e2=LastMinProdStream[timestamp - e1.timestamp < 10 * 60000 and e1.amount > amount]*, e3=LastMinProdStream[timestamp - e1.timestamp > 10 * 60000 and e2[last].amount > amount] select e1.name, e1.amount as initalamout, e3.amount as finalAmount insert into ContinousProdReductionStream ; end; Identify decreasing trends for 10 mins
  • 71. 5.2 Make this rule configurable
  • 72. Business Rules Manager • Hide Siddhi app creation complexity to business users • Build rules via a simple web-based UI – From scratch Build custom filters to event streams – From a template Build rules from developer created template Dashboard for Rule Management
  • 75. Template as Business Rules define stream SweetProductionStream(...); … partition by (name of LastMinProdStream) begin from every e1=LastMinProdStream, e2=LastMinProdStream[timestamp - e1.timestamp < $TimeInMin * 60000 and e1.amount > amount]*, e3=LastMinProdStream[timestamp - e1.timestamp > $TimeInMin * 60000 and e2[last].amount > amount] select e1.name, e1.amount as initalamout, e3.amount as finalAmount insert into ContinousProdReductionStream ; end; Identify decreasing trend for X mins
  • 77. Minimum HA with 2 Nodes Stream Processor Stream Processor • High Performance – Process around 100k events/sec – Just 2 nodes – While most others need 5+ • Zero Downtime • Zero Event Loss • Simple deployment with RDBMS – No zookeeper, kafka, etc • Multi Data Center Support Event Sources Siddhi App Siddhi App Siddhi App Siddhi App Siddhi App Siddhi App Event Store
  • 78. • Exactly-once processing • Fault tolerance • Highly scalable • No back pressure • Distributed development configurations via annotations • Pluggable distribution options (YARN, K8, etc.) Distributed Deployment
  • 79. Distributed Deployment with Kafka Event Source Event Sink Siddhi App Siddhi App Siddhi App Siddhi App Siddhi App Siddhi App Siddhi App Siddhi App Siddhi App Siddhi App Kafka Topic Kafka Topic Kafka Topic Kafka Topic Kafka Topic
  • 80. Siddhi App for Distributed Deployment @source(type = ‘kafka’,..., @map(type = ‘json’)) define stream ProductionStream (name string, amount double, factoryId int); @dist(parallel = ‘4’, execGroup = ‘gp1’) from ProductionStream[amount > 100] select * insert into HighProductionStream ; @dist(parallel = ‘2’, execGroup = ‘gp2’) partition with (factoryId of HighProductionStream) Begin from HighProductionStream#window.timeBatch(1 min) select factoryId, sum(amount) as amount group by factoryId insert into ProdRateStream ; end; Filter Source FilterFilterFilter PartitionPartition
  • 85. Status Dashboard • Understand system performance via – Throughput – Latency – CPU, memory utilizations • Monitor in various scales – Node Level – Siddhi App Level – Siddhi Query Level Monitor resource nodes and Siddhi apps
  • 92. • Finance and Banking • Retail • Location • Operational • Smart Energy • Social Media • System and Network • Healthcare Available Options
  • 93. Running Siddhi on the Edge ● Lightweight and lean ● OOTB support for consuming events from Android sensors ● Support for Python ○ https://github.com/wso2/PySiddhi/ In Android & Raspberry Pi
  • 94. WSO2 Stream Processor 4.2.0 Release. Distribution URL: https://wso2.com/analytics Download, Try & Comment