SlideShare a Scribd company logo
The Future of Apache Storm
Hadoop Summit 2016, San Jose, CA
P. Taylor Goetz, Hortonworks
@ptgoetz
About Me
• Tech Staff @ Hortonworks
• PMC Chair, Apache Storm
• ASF Member
• PMC, Apache Incubator, Apache Arrow, Apache
Kylin, Apache Apex
• Mentor/PPMC, Apache Eagle (Incubating), Apache
Mynewt (Incubating), Apache Metron (Incubating),
Apache Gossip (Incubating)
Apache Storm 0.9.x
Storm moves to Apache
Apache Storm 0.9.x
• First official Apache Release
• Storm becomes an Apache TLP
• 0mq to Netty for inter-worker communication
• Expanded Integration (Kafka, HDFS, HBase)
• Dependency conflict reduction (It was a start ;) )
Apache Storm 0.10.x
Enterprise Readiness
Apache Storm 0.10.x
• Security, Multi-Tenancy
• Enable Rolling Upgrades
• Flux (declarative topology wiring/configuration)
• Partial Key Groupings
Apache Storm 0.10.x
• Improved logging (Log4j 2)
• Streaming Ingest to Apache Hive
• Azure Event Hubs Integration
• Redis Integration
• JDBC Integration
Apache Storm 1.0
Maturity and Improved Performance
Release Date: April 12, 2016
Pacemaker
Heartbeat Server
Pacemaker
• Replaces Zookeeper for Heartbeats
• In-Memory key-value store
• Allows Scaling to 2k-3k+ Nodes
• Secure: Kerberos/Digest Authentication
Pacemaker
• Compared to Zookeeper:
• Less Memory/CPU
• No Disk
• Spared the overhead of maintaining consistency
Distributed Cache API
Distributed Cache API
• Topology resources:
• Dictionaries, ML Models, Geolocation Data, etc.
• Typically packaged in topology jar
• Fine for small files
• Large files negatively impact topology startup time
• Immutable: Changes require repackaging and deployment
Distributed Cache API
• Allows sharing of files (BLOBs) among topologies
• Files can be updated from the command line
• Allows for files from several KB to several GB in size
• Files can change over the lifetime of the topology
• Allows for compression (e.g. zip, tar, gzip)
Distributed Cache API
• Two implementations: LocalFsBlobStore and HdfsBlobStore
• Local implementation supports Replication Factor (not needed for
HDFS-backed implementation)
• Both support ACLs
Distributed Cache API
Creating a blob:
storm blobstore create --file dict.txt --acl o::rwa
--repl-fctr 2 key1
Making it available to a topology:
storm jar topo.jar my.topo.Class test_topo -c
topology.blobstore.map=‘{"key1":
{"localname":"dict.txt", "uncompress":"false"}}'
High Availability Nimbus
Before HA Nimbus
ZooKeeperNimbus
Supervisor Supervisor Supervisor Supervisor
Worker* Worker* Worker* Worker*
HA Nimbus
Pacemaker
(ZooKeeper)Nimbus
Supervisor Supervisor Supervisor Supervisor
Worker* Worker* Worker* Worker*
Nimbus
Nimbus
Leader
HA Nimbus - Failover
Pacemaker
(ZooKeeper)Nimbus
Supervisor Supervisor Supervisor Supervisor
Worker* Worker* Worker* Worker*
Nimbus
Nimbus
Leader
X
Leader Election
HA Nimbus - Failover
Pacemaker
(ZooKeeper)Nimbus
Supervisor Supervisor Supervisor Supervisor
Worker* Worker* Worker* Worker*
Nimbus
Nimbus
X
Leader
HA Nimbus
• Increase overall availability of Nimbus
• Nimbus hosts can join/leave at any time
• Leverages Distributed Cache API
• Topology JAR, Config, and Serialized Topology uploaded to
Distributed Cache
• Replication guarantees availability of all files
Native Streaming Windows
Streaming Windows
• Specify Length - Duration or Tuple Count
• Slide Interval - How often to advance the window
Sliding Windows
Windows can overlap
{…} {…} {…} {…} {…} {…} {…} {…} {…}
Time
Window 1 Window 2
Tumbling Windows
Windows do not overlap
{…} {…} {…} {…} {…} {…} {…} {…} {…}
Time
Window 1 Window 2 Window 3
Streaming Windows
• Timestamps (Event Time, Ingestion Time and Processing Time)
• Out of Order Tuples, Late Tuples
• Watermarks
• Window State Checkpointing
Sate Management
Stateful Bolts with Automatic Checkpointing
What you see.
Spout Stateful Bolt 1 Stateful Bolt 2Bolt
State Management
State Management
public class WordCountBolt extends BaseStatefulBolt<KeyValueState> {
private KeyValueState wordCounts;
private OutputCollector collector;
public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
this.collector = collector;
}
public void initState(KeyValueState state) {
this.wordCounts = state;
}
public void execute(Tuple tuple) {
String word = tuple.getString(0);
Integer count = (Integer) wordCounts.get(word, 0);
count++;
wordCounts.put(word, count);
collector.emit(new Values(word, count));
}
}
public class WordCountBolt extends BaseStatefulBolt<KeyValueState> {
private KeyValueState wordCounts;
private OutputCollector collector;
public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
this.collector = collector;
}
public void initState(KeyValueState state) {
this.wordCounts = state;
}
public void execute(Tuple tuple) {
String word = tuple.getString(0);
Integer count = (Integer) wordCounts.get(word, 0);
count++;
wordCounts.put(word, count);
collector.emit(new Values(word, count));
}
}
Initialize State
State Management
public class WordCountBolt extends BaseStatefulBolt<KeyValueState> {
private KeyValueState wordCounts;
private OutputCollector collector;
public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
this.collector = collector;
}
public void initState(KeyValueState state) {
this.wordCounts = state;
}
public void execute(Tuple tuple) {
String word = tuple.getString(0);
Integer count = (Integer) wordCounts.get(word, 0);
count++;
wordCounts.put(word, count);
collector.emit(new Values(word, count));
}
}
Read/Update State
State Management
Sate Management
Automatic Checkpointing
Checkpointing/Snapshotting
• Asynchronous Barrier Snapshotting (ABS) algorithm [1]
• Chandy-Lamport Algorithm [2]
[1] http://arxiv.org/pdf/1506.08603v1.pdf
[2] http://research.microsoft.com/en-us/um/people/lamport/pubs/chandy.pdf
State Management
Checkpointing/Snapshotting: What you see.
Spout Stateful Bolt 1 Stateful Bolt 2Bolt
Storm State Management
execute/update state execute execute/update state
Checkpointing/Snapshotting: What you get.
Spout Stateful Bolt 1 Stateful Bolt 2
Checkpoint Spout ACKER
State Store
Bolt
$chkpt
$chkpt
$chkpt
ACK
ACK
ACK
Storm State Management
Automatic Back Pressure
Automatic Back Pressure
• In previous Storm versions, the only way to throttle topologies was to
enable ACKing and set topology.spout.max.pending.
• If you don’t require at-least-once guarantees, this imposed a
significant performance penalty.**
** In Storm 1.0 this penalty is drastically reduced (more on this later)
Automatic Backpressure
• High/Low Watermarks (expressed as % of buffer size)
• Back Pressure thread monitors buffers
• If High Watermark reached, slow down Spouts
• If Low Watermark reached, stop throttling
• All Spouts Supported
Resource Aware Scheduler
(RAS)
Resource Aware Scheduler
• Specify the resource requirements (Memory/CPU) for individual
topology components (Spouts/Bolts)
• Memory: On-Heap / Off-Heap (if off-heap is used)
• CPU: Point system based on number of cores
• Resource requirements are per component instance (parallelism
matters)
Resource Aware Scheduler
• CPU and Memory availability described in storm.yaml on each
supervisor node. E.g.:



supervisor.memory.capacity.mb: 3072.0

supervisor.cpu.capacity: 400.0
• Convention for CPU capacity is to use 100 for each CPU core
Resource Aware Scheduler
Setting component resource requirements:
SpoutDeclarer spout = builder.setSpout("sp1", new TestSpout(), 10);
//set cpu requirement
spout.setCPULoad(20);
//set onheap and offheap memory requirement
spout.setMemoryLoad(64, 16);
BoltDeclarer bolt1 = builder.setBolt("b1", new MyBolt(), 3).shuffleGrouping("sp1");
//sets cpu requirement. Not neccessary to set both CPU and memory.
//For requirements not set, a default value will be used
bolt1.setCPULoad(15);
BoltDeclarer bolt2 = builder.setBolt("b2", new MyBolt(), 2).shuffleGrouping("b1");
bolt2.setMemoryLoad(100);
Storm Usability Improvements
Enhanced Debugging and Monitoring of Topologies
Dynamic Log Level Settings
Dynamic Log Levels
• Set log level setting for a running topology
• Via Storm UI and CLI
• Optional timeout after which changes will be reverted
• Logs searchable from Storm UI/Logviewer
Dynamic Log Levels
Via Storm UI:
Dynamic Log Levels
Via Storm CLI:
./bin/storm set_log_level [topology name] -l
[logger_name]=[LEVEL]:[TIMEOUT]
Tuple Sampling
• No more debug bolts or Trident functions!
• In Storm UI: Select a Topology or component and click “Debug”
• Specify a sampling percentage (% of tuples to be sampled)
• Click on the “Events” link to view the sample log.
Distributed Log Search
• Search across all log files for a specific topology
• Search in archived (ZIP) logs
• Results include matches from all Supervisor nodes
Dynamic Worker Profiling
• Request worker profile data from Storm UI:
• Heap Dumps
• JStack Output
• JProfile Recordings
• Download generated files for off-line analysis
• Restart workers from UI
Supervisor Health Checks
• Identify Supervisor nodes that are in a bad state
• Automatically decommission bad nodes
• Simple shell script
• You define what constitutes “Unhealthy”
New Integrations
• Cassandra
• Solr
• Elastic Search
• MQTT
Integration Improvements
• Kafka
• HDFS Spout
• Avro Integration for HDFS
• HBase
• Hive
Before I forget...
Performance
Up to 16x faster throughput.
Realistically 3x -- Highly dependent on use case and fault tolerance settings
> 60% Latency Reduction
Bear in mind performance varies
widely depending on the use case.
Consider the origin and motivation
behind any third party benchmark.
The most important benchmarks
are the ones you do.
Storm 1.1.0
Summer 2016
Apache Storm v1.1.0
• Revamped metrics API
• Focus on user-defined metrics
• More metrics available in Storm UI
• Enhanced metrics integration with Apache Ambari
What’s next for Storm?
2.0 and Beyond
Clojure to Java
Broadening the contributor base
Clojure to Java
Alibaba JStorm Contribution
Streaming SQL
Currently Beta/WIP
Apache Beam (incubating) Integration
• Unified API for dealing with bounded/
unbounded data sources (i.e. batch/
streaming)
• One API. Multiple implementations
(execution engines). Called
“Runners” in Beamspeak.
Twitter Heron
2 years late.
Twitter Heron vs. Storm
• Heron benchmarked against pre-Apache WIP version of Storm.
• 2 years late to the game 

(the Apache Storm community has been anything but idle)
• Storm is now ahead in terms of features and on par wrt
performance.
• Apache is about collaboration, and the Storm community is
committed to advancing innovation in stream processing.
Is Apache Storm Dead?
Competitors may say so. But hundreds of successful production deployments, and
a vibrant, growing developer community tell a very different story.
Thank you!
Questions?
P. Taylor Goetz, Hortonworks
@ptgoetz

More Related Content

PDF
Cost-based Query Optimization
PPTX
Next Gen Big Data Analytics with Apache Apex
PDF
From Device to Data Center to Insights
PPTX
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
PDF
Large-Scale Stream Processing in the Hadoop Ecosystem
PDF
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
PDF
Large-Scale Stream Processing in the Hadoop Ecosystem
PPTX
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
Cost-based Query Optimization
Next Gen Big Data Analytics with Apache Apex
From Device to Data Center to Insights
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Large-Scale Stream Processing in the Hadoop Ecosystem
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Large-Scale Stream Processing in the Hadoop Ecosystem
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...

What's hot (20)

PPTX
LLAP: Sub-Second Analytical Queries in Hive
PPTX
Embeddable data transformation for real time streams
PPTX
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
PDF
Next CERN Accelerator Logging Service with Jakub Wozniak
PDF
The Hidden Life of Spark Jobs
PDF
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
PDF
SQL and Search with Spark in your browser
PPTX
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
PDF
Big Migrations: Moving elephant herds by Carlos Izquierdo
PDF
Spark Summit EU talk by Jorg Schad
PPTX
Streaming in the Wild with Apache Flink
PPTX
Big Data Berlin v8.0 Stream Processing with Apache Apex
PPTX
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
PDF
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
PDF
Spark Summit EU talk by Mike Percy
PPTX
Debunking Common Myths in Stream Processing
PPTX
LLAP: Sub-Second Analytical Queries in Hive
PPTX
Unified Batch & Stream Processing with Apache Samza
PDF
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
LLAP: Sub-Second Analytical Queries in Hive
Embeddable data transformation for real time streams
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Next CERN Accelerator Logging Service with Jakub Wozniak
The Hidden Life of Spark Jobs
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
SQL and Search with Spark in your browser
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Big Migrations: Moving elephant herds by Carlos Izquierdo
Spark Summit EU talk by Jorg Schad
Streaming in the Wild with Apache Flink
Big Data Berlin v8.0 Stream Processing with Apache Apex
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Spark Summit EU talk by Mike Percy
Debunking Common Myths in Stream Processing
LLAP: Sub-Second Analytical Queries in Hive
Unified Batch & Stream Processing with Apache Samza
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Ad

Viewers also liked (20)

PDF
Hadoop Summit Europe 2014: Apache Storm Architecture
PPTX
Resource Aware Scheduling in Apache Storm
PDF
Apache storm vs. Spark Streaming
PDF
Storm Persistence and Real-Time Analytics
KEY
Real Time BI with Hadoop
PPTX
Omid: A Transactional Framework for HBase
PPTX
Apache Storm
PDF
IoT Crash Course Hadoop Summit SJ
PPTX
Using Hadoop for Cognitive Analytics
PDF
Making the leap to BI on Hadoop by Mariani, dave @ atscale
PDF
Learning Stream Processing with Apache Storm
PPTX
Curb your insecurity with HDP
PPTX
The Path to Wellness through Big Data
PPTX
Combining Machine Learning frameworks with Apache Spark
PPTX
What the #$* is a Business Catalog and why you need it
PPTX
HIPAA Compliance in the Cloud
PPTX
Real Time Machine Learning Visualization with Spark
PPTX
Open Source Ingredients for Interactive Data Analysis in Spark
PDF
Machine Learning for Any Size of Data, Any Type of Data
PPTX
A New "Sparkitecture" for modernizing your data warehouse
Hadoop Summit Europe 2014: Apache Storm Architecture
Resource Aware Scheduling in Apache Storm
Apache storm vs. Spark Streaming
Storm Persistence and Real-Time Analytics
Real Time BI with Hadoop
Omid: A Transactional Framework for HBase
Apache Storm
IoT Crash Course Hadoop Summit SJ
Using Hadoop for Cognitive Analytics
Making the leap to BI on Hadoop by Mariani, dave @ atscale
Learning Stream Processing with Apache Storm
Curb your insecurity with HDP
The Path to Wellness through Big Data
Combining Machine Learning frameworks with Apache Spark
What the #$* is a Business Catalog and why you need it
HIPAA Compliance in the Cloud
Real Time Machine Learning Visualization with Spark
Open Source Ingredients for Interactive Data Analysis in Spark
Machine Learning for Any Size of Data, Any Type of Data
A New "Sparkitecture" for modernizing your data warehouse
Ad

Similar to The Future of Apache Storm (20)

PDF
The Future of Apache Storm
PPTX
The Future of Apache Storm
PPTX
Past, Present, and Future of Apache Storm
PPT
Real-Time Streaming with Apache Spark Streaming and Apache Storm
PDF
StormCrawler at Bristech
PPTX
Real-Time Big Data with Storm, Kafka and GigaSpaces
PPTX
Introduction to Storm
PDF
Storm Anatomy
PPTX
Integrate Solr with real-time stream processing applications
PDF
Streaming Processing with a Distributed Commit Log
PDF
Scaling Apache Storm - Strata + Hadoop World 2014
PPTX
Cleveland HUG - Storm
PPTX
Riak add presentation
PDF
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
PPTX
storm-170531123446.pptx
PDF
Kubernetes for the PHP developer
KEY
Crash reports pycodeconf
PPTX
Big data, just an introduction to Hadoop and Scripting Languages
PDF
iguazio - nuclio overview to CNCF (Sep 25th 2017)
PDF
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
The Future of Apache Storm
The Future of Apache Storm
Past, Present, and Future of Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
StormCrawler at Bristech
Real-Time Big Data with Storm, Kafka and GigaSpaces
Introduction to Storm
Storm Anatomy
Integrate Solr with real-time stream processing applications
Streaming Processing with a Distributed Commit Log
Scaling Apache Storm - Strata + Hadoop World 2014
Cleveland HUG - Storm
Riak add presentation
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
storm-170531123446.pptx
Kubernetes for the PHP developer
Crash reports pycodeconf
Big data, just an introduction to Hadoop and Scripting Languages
iguazio - nuclio overview to CNCF (Sep 25th 2017)
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...

More from DataWorks Summit/Hadoop Summit (20)

PPT
Running Apache Spark & Apache Zeppelin in Production
PPT
State of Security: Apache Spark & Apache Zeppelin
PDF
Unleashing the Power of Apache Atlas with Apache Ranger
PDF
Enabling Digital Diagnostics with a Data Science Platform
PDF
Revolutionize Text Mining with Spark and Zeppelin
PDF
Double Your Hadoop Performance with Hortonworks SmartSense
PDF
Hadoop Crash Course
PDF
Data Science Crash Course
PDF
Apache Spark Crash Course
PDF
Dataflow with Apache NiFi
PPTX
Schema Registry - Set you Data Free
PPTX
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
PDF
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
PPTX
Mool - Automated Log Analysis using Data Science and ML
PPTX
How Hadoop Makes the Natixis Pack More Efficient
PPTX
HBase in Practice
PPTX
The Challenge of Driving Business Value from the Analytics of Things (AOT)
PDF
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
PPTX
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
PPTX
Backup and Disaster Recovery in Hadoop
Running Apache Spark & Apache Zeppelin in Production
State of Security: Apache Spark & Apache Zeppelin
Unleashing the Power of Apache Atlas with Apache Ranger
Enabling Digital Diagnostics with a Data Science Platform
Revolutionize Text Mining with Spark and Zeppelin
Double Your Hadoop Performance with Hortonworks SmartSense
Hadoop Crash Course
Data Science Crash Course
Apache Spark Crash Course
Dataflow with Apache NiFi
Schema Registry - Set you Data Free
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Mool - Automated Log Analysis using Data Science and ML
How Hadoop Makes the Natixis Pack More Efficient
HBase in Practice
The Challenge of Driving Business Value from the Analytics of Things (AOT)
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
Backup and Disaster Recovery in Hadoop

Recently uploaded (20)

PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PPTX
CroxyProxy Instagram Access id login.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Event Presentation Google Cloud Next Extended 2025
PDF
CIFDAQ's Market Wrap: Ethereum Leads, Bitcoin Lags, Institutions Shift
PPTX
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
PDF
Reimagining Insurance: Connected Data for Confident Decisions.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
AI And Its Effect On The Evolving IT Sector In Australia - Elevate
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
CroxyProxy Instagram Access id login.pptx
NewMind AI Weekly Chronicles - August'25 Week I
20250228 LYD VKU AI Blended-Learning.pptx
MYSQL Presentation for SQL database connectivity
“AI and Expert System Decision Support & Business Intelligence Systems”
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Event Presentation Google Cloud Next Extended 2025
CIFDAQ's Market Wrap: Ethereum Leads, Bitcoin Lags, Institutions Shift
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
Reimagining Insurance: Connected Data for Confident Decisions.pdf
Understanding_Digital_Forensics_Presentation.pptx
cuic standard and advanced reporting.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Chapter 3 Spatial Domain Image Processing.pdf
AI And Its Effect On The Evolving IT Sector In Australia - Elevate

The Future of Apache Storm

  • 1. The Future of Apache Storm Hadoop Summit 2016, San Jose, CA P. Taylor Goetz, Hortonworks @ptgoetz
  • 2. About Me • Tech Staff @ Hortonworks • PMC Chair, Apache Storm • ASF Member • PMC, Apache Incubator, Apache Arrow, Apache Kylin, Apache Apex • Mentor/PPMC, Apache Eagle (Incubating), Apache Mynewt (Incubating), Apache Metron (Incubating), Apache Gossip (Incubating)
  • 3. Apache Storm 0.9.x Storm moves to Apache
  • 4. Apache Storm 0.9.x • First official Apache Release • Storm becomes an Apache TLP • 0mq to Netty for inter-worker communication • Expanded Integration (Kafka, HDFS, HBase) • Dependency conflict reduction (It was a start ;) )
  • 6. Apache Storm 0.10.x • Security, Multi-Tenancy • Enable Rolling Upgrades • Flux (declarative topology wiring/configuration) • Partial Key Groupings
  • 7. Apache Storm 0.10.x • Improved logging (Log4j 2) • Streaming Ingest to Apache Hive • Azure Event Hubs Integration • Redis Integration • JDBC Integration
  • 8. Apache Storm 1.0 Maturity and Improved Performance Release Date: April 12, 2016
  • 10. Pacemaker • Replaces Zookeeper for Heartbeats • In-Memory key-value store • Allows Scaling to 2k-3k+ Nodes • Secure: Kerberos/Digest Authentication
  • 11. Pacemaker • Compared to Zookeeper: • Less Memory/CPU • No Disk • Spared the overhead of maintaining consistency
  • 13. Distributed Cache API • Topology resources: • Dictionaries, ML Models, Geolocation Data, etc. • Typically packaged in topology jar • Fine for small files • Large files negatively impact topology startup time • Immutable: Changes require repackaging and deployment
  • 14. Distributed Cache API • Allows sharing of files (BLOBs) among topologies • Files can be updated from the command line • Allows for files from several KB to several GB in size • Files can change over the lifetime of the topology • Allows for compression (e.g. zip, tar, gzip)
  • 15. Distributed Cache API • Two implementations: LocalFsBlobStore and HdfsBlobStore • Local implementation supports Replication Factor (not needed for HDFS-backed implementation) • Both support ACLs
  • 16. Distributed Cache API Creating a blob: storm blobstore create --file dict.txt --acl o::rwa --repl-fctr 2 key1 Making it available to a topology: storm jar topo.jar my.topo.Class test_topo -c topology.blobstore.map=‘{"key1": {"localname":"dict.txt", "uncompress":"false"}}'
  • 18. Before HA Nimbus ZooKeeperNimbus Supervisor Supervisor Supervisor Supervisor Worker* Worker* Worker* Worker*
  • 19. HA Nimbus Pacemaker (ZooKeeper)Nimbus Supervisor Supervisor Supervisor Supervisor Worker* Worker* Worker* Worker* Nimbus Nimbus Leader
  • 20. HA Nimbus - Failover Pacemaker (ZooKeeper)Nimbus Supervisor Supervisor Supervisor Supervisor Worker* Worker* Worker* Worker* Nimbus Nimbus Leader X Leader Election
  • 21. HA Nimbus - Failover Pacemaker (ZooKeeper)Nimbus Supervisor Supervisor Supervisor Supervisor Worker* Worker* Worker* Worker* Nimbus Nimbus X Leader
  • 22. HA Nimbus • Increase overall availability of Nimbus • Nimbus hosts can join/leave at any time • Leverages Distributed Cache API • Topology JAR, Config, and Serialized Topology uploaded to Distributed Cache • Replication guarantees availability of all files
  • 24. Streaming Windows • Specify Length - Duration or Tuple Count • Slide Interval - How often to advance the window
  • 25. Sliding Windows Windows can overlap {…} {…} {…} {…} {…} {…} {…} {…} {…} Time Window 1 Window 2
  • 26. Tumbling Windows Windows do not overlap {…} {…} {…} {…} {…} {…} {…} {…} {…} Time Window 1 Window 2 Window 3
  • 27. Streaming Windows • Timestamps (Event Time, Ingestion Time and Processing Time) • Out of Order Tuples, Late Tuples • Watermarks • Window State Checkpointing
  • 28. Sate Management Stateful Bolts with Automatic Checkpointing
  • 29. What you see. Spout Stateful Bolt 1 Stateful Bolt 2Bolt State Management
  • 30. State Management public class WordCountBolt extends BaseStatefulBolt<KeyValueState> { private KeyValueState wordCounts; private OutputCollector collector; public void prepare(Map conf, TopologyContext context, OutputCollector collector) { this.collector = collector; } public void initState(KeyValueState state) { this.wordCounts = state; } public void execute(Tuple tuple) { String word = tuple.getString(0); Integer count = (Integer) wordCounts.get(word, 0); count++; wordCounts.put(word, count); collector.emit(new Values(word, count)); } }
  • 31. public class WordCountBolt extends BaseStatefulBolt<KeyValueState> { private KeyValueState wordCounts; private OutputCollector collector; public void prepare(Map conf, TopologyContext context, OutputCollector collector) { this.collector = collector; } public void initState(KeyValueState state) { this.wordCounts = state; } public void execute(Tuple tuple) { String word = tuple.getString(0); Integer count = (Integer) wordCounts.get(word, 0); count++; wordCounts.put(word, count); collector.emit(new Values(word, count)); } } Initialize State State Management
  • 32. public class WordCountBolt extends BaseStatefulBolt<KeyValueState> { private KeyValueState wordCounts; private OutputCollector collector; public void prepare(Map conf, TopologyContext context, OutputCollector collector) { this.collector = collector; } public void initState(KeyValueState state) { this.wordCounts = state; } public void execute(Tuple tuple) { String word = tuple.getString(0); Integer count = (Integer) wordCounts.get(word, 0); count++; wordCounts.put(word, count); collector.emit(new Values(word, count)); } } Read/Update State State Management
  • 34. Checkpointing/Snapshotting • Asynchronous Barrier Snapshotting (ABS) algorithm [1] • Chandy-Lamport Algorithm [2] [1] http://arxiv.org/pdf/1506.08603v1.pdf [2] http://research.microsoft.com/en-us/um/people/lamport/pubs/chandy.pdf State Management
  • 35. Checkpointing/Snapshotting: What you see. Spout Stateful Bolt 1 Stateful Bolt 2Bolt Storm State Management execute/update state execute execute/update state
  • 36. Checkpointing/Snapshotting: What you get. Spout Stateful Bolt 1 Stateful Bolt 2 Checkpoint Spout ACKER State Store Bolt $chkpt $chkpt $chkpt ACK ACK ACK Storm State Management
  • 38. Automatic Back Pressure • In previous Storm versions, the only way to throttle topologies was to enable ACKing and set topology.spout.max.pending. • If you don’t require at-least-once guarantees, this imposed a significant performance penalty.** ** In Storm 1.0 this penalty is drastically reduced (more on this later)
  • 39. Automatic Backpressure • High/Low Watermarks (expressed as % of buffer size) • Back Pressure thread monitors buffers • If High Watermark reached, slow down Spouts • If Low Watermark reached, stop throttling • All Spouts Supported
  • 41. Resource Aware Scheduler • Specify the resource requirements (Memory/CPU) for individual topology components (Spouts/Bolts) • Memory: On-Heap / Off-Heap (if off-heap is used) • CPU: Point system based on number of cores • Resource requirements are per component instance (parallelism matters)
  • 42. Resource Aware Scheduler • CPU and Memory availability described in storm.yaml on each supervisor node. E.g.:
 
 supervisor.memory.capacity.mb: 3072.0
 supervisor.cpu.capacity: 400.0 • Convention for CPU capacity is to use 100 for each CPU core
  • 43. Resource Aware Scheduler Setting component resource requirements: SpoutDeclarer spout = builder.setSpout("sp1", new TestSpout(), 10); //set cpu requirement spout.setCPULoad(20); //set onheap and offheap memory requirement spout.setMemoryLoad(64, 16); BoltDeclarer bolt1 = builder.setBolt("b1", new MyBolt(), 3).shuffleGrouping("sp1"); //sets cpu requirement. Not neccessary to set both CPU and memory. //For requirements not set, a default value will be used bolt1.setCPULoad(15); BoltDeclarer bolt2 = builder.setBolt("b2", new MyBolt(), 2).shuffleGrouping("b1"); bolt2.setMemoryLoad(100);
  • 44. Storm Usability Improvements Enhanced Debugging and Monitoring of Topologies
  • 45. Dynamic Log Level Settings
  • 46. Dynamic Log Levels • Set log level setting for a running topology • Via Storm UI and CLI • Optional timeout after which changes will be reverted • Logs searchable from Storm UI/Logviewer
  • 48. Dynamic Log Levels Via Storm CLI: ./bin/storm set_log_level [topology name] -l [logger_name]=[LEVEL]:[TIMEOUT]
  • 49. Tuple Sampling • No more debug bolts or Trident functions! • In Storm UI: Select a Topology or component and click “Debug” • Specify a sampling percentage (% of tuples to be sampled) • Click on the “Events” link to view the sample log.
  • 50. Distributed Log Search • Search across all log files for a specific topology • Search in archived (ZIP) logs • Results include matches from all Supervisor nodes
  • 51. Dynamic Worker Profiling • Request worker profile data from Storm UI: • Heap Dumps • JStack Output • JProfile Recordings • Download generated files for off-line analysis • Restart workers from UI
  • 52. Supervisor Health Checks • Identify Supervisor nodes that are in a bad state • Automatically decommission bad nodes • Simple shell script • You define what constitutes “Unhealthy”
  • 53. New Integrations • Cassandra • Solr • Elastic Search • MQTT
  • 54. Integration Improvements • Kafka • HDFS Spout • Avro Integration for HDFS • HBase • Hive
  • 57. Up to 16x faster throughput. Realistically 3x -- Highly dependent on use case and fault tolerance settings
  • 58. > 60% Latency Reduction
  • 59. Bear in mind performance varies widely depending on the use case.
  • 60. Consider the origin and motivation behind any third party benchmark.
  • 61. The most important benchmarks are the ones you do.
  • 63. Apache Storm v1.1.0 • Revamped metrics API • Focus on user-defined metrics • More metrics available in Storm UI • Enhanced metrics integration with Apache Ambari
  • 64. What’s next for Storm? 2.0 and Beyond
  • 65. Clojure to Java Broadening the contributor base
  • 66. Clojure to Java Alibaba JStorm Contribution
  • 68. Apache Beam (incubating) Integration • Unified API for dealing with bounded/ unbounded data sources (i.e. batch/ streaming) • One API. Multiple implementations (execution engines). Called “Runners” in Beamspeak.
  • 70. Twitter Heron vs. Storm • Heron benchmarked against pre-Apache WIP version of Storm. • 2 years late to the game 
 (the Apache Storm community has been anything but idle) • Storm is now ahead in terms of features and on par wrt performance. • Apache is about collaboration, and the Storm community is committed to advancing innovation in stream processing.
  • 71. Is Apache Storm Dead? Competitors may say so. But hundreds of successful production deployments, and a vibrant, growing developer community tell a very different story.
  • 72. Thank you! Questions? P. Taylor Goetz, Hortonworks @ptgoetz