SlideShare a Scribd company logo
(Big Data)2
How YARN Timeline Service v.2 Unlocks 360-Degree
Pla@orm Insights at Scale
Sangjin Lee @sjlee (Twi5er)
Li Lu (Hortonworks)
Vrushali Channapa5an @vrushalivc (Twi5er)
Outline
• Why v.2?
• Highlights
• Developing for Timeline Service v.2
• SeIng up Timeline Service v.2
• Milestones
• Demo
Why v.2?
• YARN Timeline Service v 1.x
• Gained good adopSon: Tez, HIVE, Pig, etc.
• Keeps improving with v 1.5 APIs and storage implementaSon
• SSll facing some fundamental challenges...
Why v.2?
• Scalability and reliability challenges
• Single instance of Timeline Server
• Storage (single local LevelDB instance)
• Usability
• Flow
• Metrics and configuraSon as first-class ciSzens
• Metrics aggregaSon up the enSty hierarchy
Highlights
v.1 v.2
Single writer/reader Timeline Server Distributed writer/collector architecture
Single local LevelDB storage* Scalable storage (HBase)
v.1 enSty model New v.2 enSty model
No aggregaSon Metrics aggregaSon
REST API Richer query REST API
Architecture
• SeparaSon of writers (“collectors”) and readers
• Distributed collectors: one collector for each app
• Dedicated RM collector for RM-generated data
• Collector discovery via RM
• Pluggable storage with HBase as default storage
Distributed collectors & readers
!meline
reader
!meline
reader
Storage
!meline
reader
AM
!meline
collector
NM
!meline reader pool
app metrics/events
container events/metrics
RM
!meline collector
app/container events
user queries
(worker node running AM)
(worker node running containers)
write flow
read flow
Collector discovery
RM
AM
app id => address
! start AM container
NM
3meline
collector
" node heartbeat
# allocate response
worker node
3meline
client
New enSty model
• Flows and flow runs as parents of YARN applicaSon enSSes
• First-class configuraSon (key-value pairs)
• First-class metrics (single-value or Sme series)
• Designed to handle mulS-cluster environment out of the box
What is a flow?
• A flow is a group of YARN
applicaSons that are launched as
parts of a logical app
• Oozie, Scalding, Pig, etc.
• name:
“frequent_visitor_stat”
• run id: 1466097809000
• version: “b9b9068”
ConfiguraSon and metrics
• Now explicit top-level a5ributes of
enSSes
• Fine-grained updates and queries
made possible
• “update metric A to value x”
• “query enMMes where config A = B”
container 1_1
metric: A = 10
metric: B = 100
config: "Foo" = "bar"
ConfiguraSon and metrics
• Now explicit top-level a5ributes of
enSSes
• Fine-grained updates and queries
made possible
• “update metric A to value x”
• “query enMMes where config A = B”
container 1_1
metric: A = 50
metric: B = 100
config: "Foo" = "bar"
HBase Storage
• Scalable backend
• Row Key structure
• efficient range scans
• KeyPrefixRegionSplitPolicy
• Filter pushdown
• Coprocessors for flow aggregaSon (“readless” aggregaSon)
• Cell tags for metadata (applicaSon id, aggregaSon operaSon)
• Cell Smestamps generated during put
• lei shiied with app id added to avoid overwrites
Tables in HBase
• flow run
• application
• entity
• flow activity
• app to flow
table: flow run
Row key:
clusterId!userName!
flowName!
inverted(flowRunId)
most recent flow run stored first
coprocessor enabled
table: applicaSon
Row key:
clusterId!userName!
flowName!
inverted(flowRunId)!
AppId
applicaSons within a flow run stored
together
most recent flow run stored first
table: enSty
Row key:
userName!clusterId!flowName!
inverted(flowRunId)!AppId!entityType!
entityId
enSSes within an applicaSon within a flow run stored together per
type
• for example, all containers within a yarn applicaSon will be
stored together
pre-split table
stores information per entity run like info, relatesTo, relatedTo,
events, metrics, config
table: flow acSvity
Row key:
clusterId!
inverted(TopOfTheDay)!
userName!flowName
shows the flows that ran on that day
stores informaSon per flow like number of
runs, the run ids, versions
table: appToFlow
Row key:
clusterId!appId
- stores mapping of appId to
flowName and flowRunId
Metrics aggregaSon
• ApplicaSon level
• Rolls up sub-applicaSon metrics
• Performed in real Sme in the collectors in memory
• Flow run level
• Rolls up app level metrics
• Performed in HBase region servers via coprocessors
• Offline aggregaSon (TBD)
• Rolls up on user, queue, and flow offline periodically
• Phoenix tables
Container 1_1
“bytes” : 23
Container 1_2
“bytes” : 135
Container 2_1
“bytes” : 50
Container 3_1
“bytes” : 64
App1
“bytes”: 158
App2
“bytes”: 50
App3
“bytes”: 64
flow1
“bytes”: 208
flow2
“bytes”: 64
user1
“bytes”: 272
queue1
“bytes”: 272
App
aggregation
In collector
flow
aggregation
In hbase
offline
aggregation
FlowRun
Aggrega:on
via the HBase
Coprocessor
App
Metrics
Cells
in
HBase
FlowRun
Metric
Sum
App
Metrics
Cells
in
HBase
FlowRun
Metric
Sum
FlowRun
Aggrega:on
via the HBase
Coprocessor
Reader REST API: paths
• URLs under /ws/v2/Smeline
• Canonical REST style URLs: /ws/v2/Smeline/clusters/cluster_name/
users/user_name/flows/flow_name/runs/run_id
• Path elements may be omi5ed if they can be inferred
• flow context can be inferred by app id
• default cluster is assumed if cluster is omi5ed
Reader REST API: query params
• limit, createdTimeStart, createdTimeEnd: constrain the enSSes
• fields (ALL | EVENTS | INFO | CONFIGS | METRICS | RELATES_TO |
IS_RELATED_TO): limit the contents to return
• metricsToRetrieve, confsToRetrieve: further limit the contents to
return
• metricsLimit: limits the number of values in a Sme series
Reader REST API: query params
• relatesTo, isRelatedTo: filters by associaSon
• *Filters: filters by info, config, metric, event, …
• Supports complex filters including operators
• metricFilter=(((metric1 eq 50) AND (metric2 gt 40)) OR (metric1 lt
20))
Developing: TimelineClient
In your application master:
// create TimelineClient v.2 style
TimelineClient client = TimelineClient.createTimelineClient(appId);
client.init(conf);
client.start();
// bind it to AM/RM client to receive the collector address
amRMClient.registerTimelineClient(client);
// create and write timeline entities
TimelineEntity entity = new TimelineEntity();
client.putEntities(entity);
// when the app is complete, stop the timeline client
client.stop();
Developing: Flow context
In your app submitter:
ApplicationSubmissionContext appContext =
app.getApplicationSubmissionContext();
// set the flow context as YARN application tags
Set<String> tags = new HashSet<>();
tags.add(TimelineUtils.generateFlowNameTag("distributed grep"));
tags.add(TimelineUtils.generateFlowVersionTag(
"3df8b0d6100530080d2e0decf9e528e57c42a90a"));
tags.add(TimelineUtils.generateFlowRunIdTag(System.currentTimeMillis()));
appContext.setApplicationTags(tags);
SeIng up Timeline Service v.2
• Set up the HBase cluster (1.1.x)
• Add the Smeline service jar to HBase
• Install the flow run coprocessor
• Create tables via TimelineSchemaCreator uSlity
• Configure the YARN cluster
• Enable Timeline Service v.2
• Add hbase-site.xml for the Smeline collector and readers
• Start the Smeline reader daemon
Milestone 1 ("Alpha 1")
• Merge discussion (YARN-2928) in progress as we speak!
✓ Complete end-to-end read/write
flow
✓ Real Sme applicaSon and flow
aggregaSon
✓ New enSty model
✓ HBase Storage
✓ Rich REST API
✓ IntegraSon with Distributed Shell
and MapReduce
✓ YARN generic events and system
metrics
Milestones - Future
• Milestone 2 (“Alpha 2”)
• IntegraSon with new YARN
UI
• IntegraSon with more
frameworks
• Beta
• Freeze API and storage schema
• Security
• Collectors as containers
• Storage fault tolerance
• ProducSon-ready
• MigraSon-ready
Demo
Contributors
• Li Lu, Junping Du, Vinod Kumar Vavilapalli (Hortonworks)
• Varun Saxena, Naganarasimha G. R. (Huawei)
• Sangjin Lee, Vrushali Channapa5an, Joep RoInghuis (Twi5er)
• Zhijie Shen (now at Facebook)
• The HBase and Phoenix community!
Thank you!

More Related Content

PDF
Timeline Service Next Gen (YARN-2928): YARN BOF @ Hadoop Summit 2015
PPTX
Hadoop Summit San Jose 2014 - Analyzing Historical Data of Applications on Ha...
PPTX
Application Timeline Server - Past, Present and Future
PDF
Scaling spark on kubernetes at Lyft
PPTX
Schema registry
PDF
Kafka Summit NYC 2017 - Singe Message Transforms are not the Transformations ...
PDF
Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ...
PDF
Spark and Bloomberg by Sudarshan Kadambi and Partha Nageswaran
Timeline Service Next Gen (YARN-2928): YARN BOF @ Hadoop Summit 2015
Hadoop Summit San Jose 2014 - Analyzing Historical Data of Applications on Ha...
Application Timeline Server - Past, Present and Future
Scaling spark on kubernetes at Lyft
Schema registry
Kafka Summit NYC 2017 - Singe Message Transforms are not the Transformations ...
Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ...
Spark and Bloomberg by Sudarshan Kadambi and Partha Nageswaran

What's hot (18)

PDF
So You Want to Write a Connector?
PPTX
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
PPTX
Running Non-MapReduce Big Data Applications on Apache Hadoop
PDF
Performance Tuning RocksDB for Kafka Streams’ State Stores
PPTX
Webinar: Flink SQL in Action - Fabian Hueske
PDF
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
PPTX
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...
PDF
KSQL Intro
PDF
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
PDF
Flink Forward SF 2017: Bill Liu & Haohui Mai - AthenaX : Uber’s streaming pro...
PDF
Kafka Connect by Datio
PDF
AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...
PDF
Mobius: C# Language Binding For Spark
PDF
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
PDF
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
PPTX
KSQL and Kafka Streams – When to Use Which, and When to Use Both
PDF
A Pluggable Autoscaling System @ UCC
PPT
Co 4, session 2, aws analytics services
So You Want to Write a Connector?
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Running Non-MapReduce Big Data Applications on Apache Hadoop
Performance Tuning RocksDB for Kafka Streams’ State Stores
Webinar: Flink SQL in Action - Fabian Hueske
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...
KSQL Intro
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
Flink Forward SF 2017: Bill Liu & Haohui Mai - AthenaX : Uber’s streaming pro...
Kafka Connect by Datio
AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...
Mobius: C# Language Binding For Spark
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
KSQL and Kafka Streams – When to Use Which, and When to Use Both
A Pluggable Autoscaling System @ UCC
Co 4, session 2, aws analytics services
Ad

Viewers also liked (20)

PPTX
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
PDF
Apache Hadoop Crash Course
PPTX
What's new in hadoop 3.0
PPTX
Hadoop Summit Tokyo Apache NiFi Crash Course
PDF
#HSTokyo16 Apache Spark Crash Course
PPTX
Analyzing Historical Data of Applications on YARN for Fun and Profit
PPTX
The Past, Present, and Future of Hadoop at LinkedIn
PPTX
Application Timeline Server - Past, Present and Future
PDF
Native erasure coding support inside hdfs presentation
PPTX
Apache Hive 2.0: SQL, Speed, Scale
PDF
図でわかるHDFS Erasure Coding
PPTX
Procesos lineales e intermitentes
PDF
Data Science Crash Course Hadoop Summit SJ
PPTX
Inferno Scalable Deep Learning on Spark
PPTX
Spark meets Smart Meters
PDF
HDFS Deep Dive
PPTX
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
PPTX
Achieving 100k Queries per Hour on Hive on Tez
PPTX
Advanced Spark Meetup - Jan 12, 2016
PPTX
Apache Tez - A New Chapter in Hadoop Data Processing
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
Apache Hadoop Crash Course
What's new in hadoop 3.0
Hadoop Summit Tokyo Apache NiFi Crash Course
#HSTokyo16 Apache Spark Crash Course
Analyzing Historical Data of Applications on YARN for Fun and Profit
The Past, Present, and Future of Hadoop at LinkedIn
Application Timeline Server - Past, Present and Future
Native erasure coding support inside hdfs presentation
Apache Hive 2.0: SQL, Speed, Scale
図でわかるHDFS Erasure Coding
Procesos lineales e intermitentes
Data Science Crash Course Hadoop Summit SJ
Inferno Scalable Deep Learning on Spark
Spark meets Smart Meters
HDFS Deep Dive
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Achieving 100k Queries per Hour on Hive on Tez
Advanced Spark Meetup - Jan 12, 2016
Apache Tez - A New Chapter in Hadoop Data Processing
Ad

Similar to Timeline Service v.2 (Hadoop Summit 2016) (20)

PPTX
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...
PPTX
Application Timeline Server Past, Present and Future
PDF
Venturing into Large Hadoop Clusters
PDF
Venturing into Hadoop Large Clusters
PPT
Venturing into Large Hadoop Clusters
PDF
YARN: a resource manager for analytic platform
PDF
Near Realtime Processing over HBase
PDF
Elasticsearch + Cascading for Scalable Log Processing
PDF
Hadoop 2 - Beyond MapReduce
PPTX
Streaming map reduce
PDF
Continuuity Weave
PDF
20140202 fosdem-nosql-devroom-hadoop-yarn
PDF
Apache Big Data EU 2015 - HBase
PDF
Reducing Development Time for Production-Grade Hadoop Applications
PPTX
Real time fraud detection at 1+M scale on hadoop stack
PDF
Welcome to Hadoop2Land!
PPTX
ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...
PDF
Hadoop 2 - Going beyond MapReduce
PPTX
Field Notes: YARN Meetup at LinkedIn
PDF
I Love APIs 2015: Building Predictive Apps with Lamda and MicroServices
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...
Application Timeline Server Past, Present and Future
Venturing into Large Hadoop Clusters
Venturing into Hadoop Large Clusters
Venturing into Large Hadoop Clusters
YARN: a resource manager for analytic platform
Near Realtime Processing over HBase
Elasticsearch + Cascading for Scalable Log Processing
Hadoop 2 - Beyond MapReduce
Streaming map reduce
Continuuity Weave
20140202 fosdem-nosql-devroom-hadoop-yarn
Apache Big Data EU 2015 - HBase
Reducing Development Time for Production-Grade Hadoop Applications
Real time fraud detection at 1+M scale on hadoop stack
Welcome to Hadoop2Land!
ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...
Hadoop 2 - Going beyond MapReduce
Field Notes: YARN Meetup at LinkedIn
I Love APIs 2015: Building Predictive Apps with Lamda and MicroServices

Recently uploaded (20)

PPTX
Hire Expert Blazor Developers | Scalable Solutions by OnestopDA
PDF
IEEE-CS Tech Predictions, SWEBOK and Quantum Software: Towards Q-SWEBOK
PDF
ShowUs: Pharo Stream Deck (ESUG 2025, Gdansk)
PPTX
Presentation of Computer CLASS 2 .pptx
PDF
Become an Agentblazer Champion Challenge
PDF
Microsoft Teams Essentials; The pricing and the versions_PDF.pdf
PDF
A REACT POMODORO TIMER WEB APPLICATION.pdf
PDF
Convert Thunderbird to Outlook into bulk
PPTX
10 Hidden App Development Costs That Can Sink Your Startup.pptx
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
Using Bootstrap to Make Accessible Front-Ends(2).pptx
PDF
A Practical Breakdown of Automation in Project Management
PPTX
Odoo Consulting Services by CandidRoot Solutions
PDF
Exploring AI Agents in Process Industries
PDF
Perfecting Gamer’s Experiences with Performance Testing for Gaming Applicatio...
PPTX
How a Careem Clone App Allows You to Compete with Large Mobility Brands
PPTX
Dynamic Solutions Project Pitch Presentation
PDF
How to Confidently Manage Project Budgets
PPTX
Computer Hardware tool: hand tools, diagnostics, ESD and cleaning tools
DOCX
The Five Best AI Cover Tools in 2025.docx
Hire Expert Blazor Developers | Scalable Solutions by OnestopDA
IEEE-CS Tech Predictions, SWEBOK and Quantum Software: Towards Q-SWEBOK
ShowUs: Pharo Stream Deck (ESUG 2025, Gdansk)
Presentation of Computer CLASS 2 .pptx
Become an Agentblazer Champion Challenge
Microsoft Teams Essentials; The pricing and the versions_PDF.pdf
A REACT POMODORO TIMER WEB APPLICATION.pdf
Convert Thunderbird to Outlook into bulk
10 Hidden App Development Costs That Can Sink Your Startup.pptx
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Using Bootstrap to Make Accessible Front-Ends(2).pptx
A Practical Breakdown of Automation in Project Management
Odoo Consulting Services by CandidRoot Solutions
Exploring AI Agents in Process Industries
Perfecting Gamer’s Experiences with Performance Testing for Gaming Applicatio...
How a Careem Clone App Allows You to Compete with Large Mobility Brands
Dynamic Solutions Project Pitch Presentation
How to Confidently Manage Project Budgets
Computer Hardware tool: hand tools, diagnostics, ESD and cleaning tools
The Five Best AI Cover Tools in 2025.docx

Timeline Service v.2 (Hadoop Summit 2016)

  • 1. (Big Data)2 How YARN Timeline Service v.2 Unlocks 360-Degree Pla@orm Insights at Scale Sangjin Lee @sjlee (Twi5er) Li Lu (Hortonworks) Vrushali Channapa5an @vrushalivc (Twi5er)
  • 2. Outline • Why v.2? • Highlights • Developing for Timeline Service v.2 • SeIng up Timeline Service v.2 • Milestones • Demo
  • 3. Why v.2? • YARN Timeline Service v 1.x • Gained good adopSon: Tez, HIVE, Pig, etc. • Keeps improving with v 1.5 APIs and storage implementaSon • SSll facing some fundamental challenges...
  • 4. Why v.2? • Scalability and reliability challenges • Single instance of Timeline Server • Storage (single local LevelDB instance) • Usability • Flow • Metrics and configuraSon as first-class ciSzens • Metrics aggregaSon up the enSty hierarchy
  • 5. Highlights v.1 v.2 Single writer/reader Timeline Server Distributed writer/collector architecture Single local LevelDB storage* Scalable storage (HBase) v.1 enSty model New v.2 enSty model No aggregaSon Metrics aggregaSon REST API Richer query REST API
  • 6. Architecture • SeparaSon of writers (“collectors”) and readers • Distributed collectors: one collector for each app • Dedicated RM collector for RM-generated data • Collector discovery via RM • Pluggable storage with HBase as default storage
  • 7. Distributed collectors & readers !meline reader !meline reader Storage !meline reader AM !meline collector NM !meline reader pool app metrics/events container events/metrics RM !meline collector app/container events user queries (worker node running AM) (worker node running containers) write flow read flow
  • 8. Collector discovery RM AM app id => address ! start AM container NM 3meline collector " node heartbeat # allocate response worker node 3meline client
  • 9. New enSty model • Flows and flow runs as parents of YARN applicaSon enSSes • First-class configuraSon (key-value pairs) • First-class metrics (single-value or Sme series) • Designed to handle mulS-cluster environment out of the box
  • 10. What is a flow? • A flow is a group of YARN applicaSons that are launched as parts of a logical app • Oozie, Scalding, Pig, etc. • name: “frequent_visitor_stat” • run id: 1466097809000 • version: “b9b9068”
  • 11. ConfiguraSon and metrics • Now explicit top-level a5ributes of enSSes • Fine-grained updates and queries made possible • “update metric A to value x” • “query enMMes where config A = B” container 1_1 metric: A = 10 metric: B = 100 config: "Foo" = "bar"
  • 12. ConfiguraSon and metrics • Now explicit top-level a5ributes of enSSes • Fine-grained updates and queries made possible • “update metric A to value x” • “query enMMes where config A = B” container 1_1 metric: A = 50 metric: B = 100 config: "Foo" = "bar"
  • 13. HBase Storage • Scalable backend • Row Key structure • efficient range scans • KeyPrefixRegionSplitPolicy • Filter pushdown • Coprocessors for flow aggregaSon (“readless” aggregaSon) • Cell tags for metadata (applicaSon id, aggregaSon operaSon) • Cell Smestamps generated during put • lei shiied with app id added to avoid overwrites
  • 14. Tables in HBase • flow run • application • entity • flow activity • app to flow
  • 15. table: flow run Row key: clusterId!userName! flowName! inverted(flowRunId) most recent flow run stored first coprocessor enabled
  • 16. table: applicaSon Row key: clusterId!userName! flowName! inverted(flowRunId)! AppId applicaSons within a flow run stored together most recent flow run stored first
  • 17. table: enSty Row key: userName!clusterId!flowName! inverted(flowRunId)!AppId!entityType! entityId enSSes within an applicaSon within a flow run stored together per type • for example, all containers within a yarn applicaSon will be stored together pre-split table stores information per entity run like info, relatesTo, relatedTo, events, metrics, config
  • 18. table: flow acSvity Row key: clusterId! inverted(TopOfTheDay)! userName!flowName shows the flows that ran on that day stores informaSon per flow like number of runs, the run ids, versions
  • 19. table: appToFlow Row key: clusterId!appId - stores mapping of appId to flowName and flowRunId
  • 20. Metrics aggregaSon • ApplicaSon level • Rolls up sub-applicaSon metrics • Performed in real Sme in the collectors in memory • Flow run level • Rolls up app level metrics • Performed in HBase region servers via coprocessors • Offline aggregaSon (TBD) • Rolls up on user, queue, and flow offline periodically • Phoenix tables Container 1_1 “bytes” : 23 Container 1_2 “bytes” : 135 Container 2_1 “bytes” : 50 Container 3_1 “bytes” : 64 App1 “bytes”: 158 App2 “bytes”: 50 App3 “bytes”: 64 flow1 “bytes”: 208 flow2 “bytes”: 64 user1 “bytes”: 272 queue1 “bytes”: 272 App aggregation In collector flow aggregation In hbase offline aggregation
  • 23. Reader REST API: paths • URLs under /ws/v2/Smeline • Canonical REST style URLs: /ws/v2/Smeline/clusters/cluster_name/ users/user_name/flows/flow_name/runs/run_id • Path elements may be omi5ed if they can be inferred • flow context can be inferred by app id • default cluster is assumed if cluster is omi5ed
  • 24. Reader REST API: query params • limit, createdTimeStart, createdTimeEnd: constrain the enSSes • fields (ALL | EVENTS | INFO | CONFIGS | METRICS | RELATES_TO | IS_RELATED_TO): limit the contents to return • metricsToRetrieve, confsToRetrieve: further limit the contents to return • metricsLimit: limits the number of values in a Sme series
  • 25. Reader REST API: query params • relatesTo, isRelatedTo: filters by associaSon • *Filters: filters by info, config, metric, event, … • Supports complex filters including operators • metricFilter=(((metric1 eq 50) AND (metric2 gt 40)) OR (metric1 lt 20))
  • 26. Developing: TimelineClient In your application master: // create TimelineClient v.2 style TimelineClient client = TimelineClient.createTimelineClient(appId); client.init(conf); client.start(); // bind it to AM/RM client to receive the collector address amRMClient.registerTimelineClient(client); // create and write timeline entities TimelineEntity entity = new TimelineEntity(); client.putEntities(entity); // when the app is complete, stop the timeline client client.stop();
  • 27. Developing: Flow context In your app submitter: ApplicationSubmissionContext appContext = app.getApplicationSubmissionContext(); // set the flow context as YARN application tags Set<String> tags = new HashSet<>(); tags.add(TimelineUtils.generateFlowNameTag("distributed grep")); tags.add(TimelineUtils.generateFlowVersionTag( "3df8b0d6100530080d2e0decf9e528e57c42a90a")); tags.add(TimelineUtils.generateFlowRunIdTag(System.currentTimeMillis())); appContext.setApplicationTags(tags);
  • 28. SeIng up Timeline Service v.2 • Set up the HBase cluster (1.1.x) • Add the Smeline service jar to HBase • Install the flow run coprocessor • Create tables via TimelineSchemaCreator uSlity • Configure the YARN cluster • Enable Timeline Service v.2 • Add hbase-site.xml for the Smeline collector and readers • Start the Smeline reader daemon
  • 29. Milestone 1 ("Alpha 1") • Merge discussion (YARN-2928) in progress as we speak! ✓ Complete end-to-end read/write flow ✓ Real Sme applicaSon and flow aggregaSon ✓ New enSty model ✓ HBase Storage ✓ Rich REST API ✓ IntegraSon with Distributed Shell and MapReduce ✓ YARN generic events and system metrics
  • 30. Milestones - Future • Milestone 2 (“Alpha 2”) • IntegraSon with new YARN UI • IntegraSon with more frameworks • Beta • Freeze API and storage schema • Security • Collectors as containers • Storage fault tolerance • ProducSon-ready • MigraSon-ready
  • 31. Demo
  • 32. Contributors • Li Lu, Junping Du, Vinod Kumar Vavilapalli (Hortonworks) • Varun Saxena, Naganarasimha G. R. (Huawei) • Sangjin Lee, Vrushali Channapa5an, Joep RoInghuis (Twi5er) • Zhijie Shen (now at Facebook) • The HBase and Phoenix community!