SlideShare a Scribd company logo
Introduction to 
Apache 
1
Me 
Robert Stupp 
Freelancer, Coder, Architect 
@snazy snazy@snazy.de 
Contributor to Apache Cassandra, 
3.0 UDFs (CASSANDRA-7395 + related) 
Databases, Network, Backend 
2
Agenda 
Apache Cassandra History 
Design Principles 
Outstanding differences 
CQL Intro 
Access C* 
Clusters 
Cassandra Future 
3
Apache Cassandra 
History 
4
Apache Cassandra 
started at Facebook 
inspired by 
Note: Facebook initially had 
two data centers. 
5
2.1 released in Sep 2014 
6
Apache Cassandra 
Design Principles 
7
Hardware failures 
can and will occur! 
Cassandra handles failures. 
From single node to whole data center. 
From client to server. 
8
The complicated part 
when learning Cassandra, 
is to understand 
Cassandra’s simplicity 
9
Keep it simple 
all nodes are equal 
master-less architecture 
no name nodes 
no SPOF (single point of failure) 
no read before modify 
(prevent race conditions) 
10
Keep it running 
No need to take cluster down … e.g. 
during maintenance 
during software update 
Rolling restart is your friend 
11
Outstanding 
Differences 
12
Cassandra 
Highly scalable 
runs with a few nodes 
up to 1000+ nodes cluster! 
Linear scalability (proven!) 
Multi datacenter aware (world-wide!) 
No SPOF 
13
Cassandra @ Apple 
14
Linear Scalability 
15
Scaling Cassandra 
More data? 
-> add more nodes 
Faster access? 
-> add more nodes 
16
Read / Write 
performance 
Reads are fast 
Writes are even faster 
17
Durability 
Writes are durable - period. 
18
Availability @ 
Netflix 
19 
Chaos 
Monkey 
kills nodes randomly
Availability @ 
Netflix 
20 
Chaos 
Gorilla 
kill regions randomly
Availability @ 
Netflix 
Chaos 
Kong 
kills whole data centers 
21
Availability @ 
Netflix 
http://de.slideshare.net/planetcassandra/ 
active-active-c-behind-the-scenes-at-netflix 
22
32 node cluster (Rasperry PIs) 
@DataStax 
23
Most outstanding 
Great documentation 
Many blog posts 
Many presentations 
Many videos 
Regular webinars 
Huge, active and healthy community 
24
Data Distribution 
25
DHT 
Data is organized in a 
„Distributed Hash Table“ 
(hash over row key) 
26
DHT 
0 
27 
1 
2 
3 
4 
5 
6 
7
Replication 
28
Replication Factor 2 
0 
29 
1 
2 
3 
4 
5 
6 
7 
Row A 
Row B
Replication Factor 3 
0 
30 
1 
2 
3 
4 
5 
6 
7 
Row A 
Row B
Consistency 
Consistency defined per request 
Several consistency levels (CLs) 
for different needs 
31
Eventual consistency 
is not 
hopefully consistent 
EC means there’s a time gap until updates 
are consistently readable 
32
Consistency Levels 
ANY (only for writes) 
ONE, LOCAL_ONE, 
TWO, THREE, (not recommended) 
ALL, (not recommended) 
QUORUM, LOCAL_QUORUM, EACH_QUORUM 
SERIAL, LOCAL_SERIAL 
33
Consistency 
Data is always replicated 
CL defines how many replicas must 
fulfill the request 
34
Write 
0 
35 
1 
2 
3 
4 
5 
6 
7 
Write
Write 
0 
36 
1 
2 
3 
4 
5 
6 
7 
Write
Mutli DC setup 
DC 1 DC 2 
37
Multi DC replication 
38 
Write 
DC 1 DC 2
Mutli DC replication 
39 
Write 
DC 1 DC 2
Mutli DC replication 
40 
Write 
DC 1 DC 2
Replication & 
Consistency 
Define # of replicas 
using replication factor 
Define required consistency 
per request 
41
CQL Introduction 
CQL = Cassandra query language 
42
“CQL is SQL 
minus joins, 
minus subqueries, 
plus collections” 
(plus user types, 
plus tuple types) 
43
Why CQL? 
Introduces a schema to Cassandra 
Familiar syntax 
Easy to understand 
DML operations are atomic 
44
Data model 
(hierarchical view) 
Keyspace (schema) 
Table (column family) 
Row 
partition key (part of primary key) 
static columns 
clustering key (part of primary key) 
columns 
45
CQL / DDL 
Similar to SQL 
CREATE TABLE … 
ALTER TABLE … 
DROP TABLE … 
46
CQL / DML 
Similar to SQL 
INSERT … 
UPDATE … 
DELETE … 
SELECT … 
47
CQL / BATCH 
Group related modifications 
(INSERT, UPDATE, DELETE) 
Atomic operation 
48
CQL types 
boolean, int (32bit), bigint (64bit), 
float, double, 
decimal ("BigDecimal"), 
varint ("BigInteger"), 
ascii, text (= varchar), blob, 
inet, timestamp, uuid, timeuuid 
49
CQL collection 
types 
list < foo > 
set < foo > 
map < foo , bar > 
Since C* 2.1 collections can contain 
any type - even other collections. 
50
CQL composite 
types 
user types (C* 2.1) 
are composite types with named fields 
tuple types (C* 2.1) 
are unstructured lists of values 
51
CQL / user types 
CREATE TYPE address ( 
street text, 
zip int, 
city text); 
CREATE TABLE users ( 
username text, 
addresses map<text, address>, 
... 
52
Cassandra 
Data Modeling 
Access by key 
no access by arbitrary WHERE clause 
Duplicate data (it’s ok!) 
Aggregate data 
Build application maintained indexes 
53
RDBMS modeling 
54
C* modeling 
55
Data Modeling 
with RDBMS 
Driven by 
"How can I store 
something right?" 
"What answers 
do I have?" 
56
Data Modeling 
with NoSQL 
Driven by 
"How can I access 
something right?" 
"What questions 
do I have?" 
57
Data Modeling 
Basics 
Work top-down. Think about: 
What does the application do? 
What are the access patterns? 
Now design data model 
58
Data Modeling 
http://de.slideshare.net/planetcassandra/ 
cassandra-day-sv-2014-fundamentals-of- 
apache-cassandra-data-modeling 
http://de.slideshare.net/planetcassandra/ 
data-modeling-with-travis-price 
59
Accessing 
Cassandra 
60
Command Line 
cqlsh 
CQL shell 
nodetool 
node/cluster administration 
61
GUI: DevCenter 
Visual query tool 
62
Stress test? 
Cassandra 2.1 comes with improved 
stress tool 
Simulate read+write workload 
Uses configurable data 
Works against older C* versions, too 
63
DataStax APLv2 
Open Source Drivers 
for Java 
for Python 
for C# 
for Scala / Spark 
https://github.com/datastax/ 
or http://www.datastax.com/download 
64
Native protocol 
C*’s own net protocol for clients 
Request multiplexing 
Schema change notifications 
Cluster change notifications 
65
Third Party Drivers 
for huge number of languages 
66
Mappers 
High level mappers exist at least for 
Java 
Special case: Scala 
due to its strong+complex type 
model (DataStax OSS Spark driver) 
67
Spark + Hadoop 
Yes - works really good 
Note: Spark is about 100x faster 
68
Clusters 
69
Cluster sizes 
C* works with a few nodes 
C* works with several hundred / 
thousand nodes 
70
Cluster setup 
Configure for multiple data centers 
Plan for multi-DC setup :) 
71
Cluster experience 
Remember: A single Cassandra 
clusters works over multiple data 
centers all over the world 
„Desaster proven“ 
Hurricanes 
Amazon DC outages 
72
Apache Cassandra 
Future 
73
Cassandra 3.0 
(in development) 
User Defined Functions 
Aggregate functions 
Functional indexes 
Workload recording + playback 
Better SSTables, Fully off-heap row cache, Better 
serial consistency 
Indexes w/ high cardinality 
74 
Subject 
to 
change!!!
Get active ! 
75
Cassandra Community 
http://cassandra.apache.org/ 
http://planetcassandra.org/ - Blog 
http://www.slideshare.net/ 
planetcassandra/presentations 
http://de.slideshare.net/DataStax/ 
presentations 
76
Cassandra Community 
https://www.youtube.com/user/ 
PlanetCassandra 
https://www.youtube.com/user/DataStax 
http://www.datastax.com/dev/blog/ 
http://www.datastax.com/docs/ 
Users Mailing List 
users@cassandra.apache.org 
77
Free C* Training! 
http://planetcassandra.org/cassandra-training/ 
78
Get involved! 
Ask questions, 
submit RFEs or experiences to 
user mailing list 
user@cassandra.apache.org 
Answers arrive quickly! 
79
Live Demo 
User Defined Functions 
80
C* 3.0 UDFs 
Users create functions using 
CREATE FUNCTION … 
LANGUAGE … 
AS … 
Java, JavaScript, Scala, Groovy, 
JRuby, Jython 
Functions work on all nodes 
81
C* 3.0 UDFs 
Example 
CREATE FUNCTION sin(input double) 
RETURNS double 
LANGUAGE javascript 
AS 'Math.sin(input)'; 
82 
This is JavaScript!
UDFs for what? 
Own aggregation code - e.g. 
SELECT sum(value) FROM table 
WHERE …; 
Functional indexes - e.g. 
CREATE INDEX idx 
ON table ( myFunction(colname) ); 
83 
Targeted for C* 3.0
Thanks 
for your attention 
Download Apache Cassandra at 
http://cassandra.apache.org/ 
Robert Stupp 
@snazy 
snazy@snazy.de 
de.slideshare.net/RobertStupp 
84
Q & A 
85
86
BACKUP SLIDES 
User-Defined-Functions 
Demo 
87
88
89
90
91
92
93
94
95
96
97
98
99

More Related Content

PPTX
An Overview of Apache Cassandra
PPTX
Presentation of Apache Cassandra
PDF
Cassandra Database
PDF
A la découverte de kubernetes
PDF
Systematic Literature Review & Mapping
PPTX
Management Information System PPT
PDF
Introduction to Cassandra
PDF
Data platform architecture
An Overview of Apache Cassandra
Presentation of Apache Cassandra
Cassandra Database
A la découverte de kubernetes
Systematic Literature Review & Mapping
Management Information System PPT
Introduction to Cassandra
Data platform architecture

What's hot (20)

PPTX
Apache Cassandra at the Geek2Geek Berlin
PDF
Intro to Cassandra
PPTX
Cassandra an overview
PDF
Cassandra Introduction & Features
PDF
[pgday.Seoul 2022] PostgreSQL with Google Cloud
PDF
ETL With Cassandra Streaming Bulk Loading
PPTX
Apache Spark Architecture
PDF
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
PDF
Introduction to MongoDB
PDF
Introduction to Cassandra Architecture
PPTX
Appache Cassandra
PDF
PySpark in practice slides
PPTX
Cassandra
PDF
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
PPTX
Kafka 101
PDF
Apache Spark Introduction
PDF
Cassandra 101
PDF
Hadoop Overview & Architecture
 
PPTX
Data Observability Best Pracices
KEY
Introduction to Cassandra: Replication and Consistency
Apache Cassandra at the Geek2Geek Berlin
Intro to Cassandra
Cassandra an overview
Cassandra Introduction & Features
[pgday.Seoul 2022] PostgreSQL with Google Cloud
ETL With Cassandra Streaming Bulk Loading
Apache Spark Architecture
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction to MongoDB
Introduction to Cassandra Architecture
Appache Cassandra
PySpark in practice slides
Cassandra
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Kafka 101
Apache Spark Introduction
Cassandra 101
Hadoop Overview & Architecture
 
Data Observability Best Pracices
Introduction to Cassandra: Replication and Consistency
Ad

Viewers also liked (7)

PDF
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
PPTX
Cql – cassandra query language
PDF
Migrating Netflix from Datacenter Oracle to Global Cassandra
PDF
Solr & Cassandra: Searching Cassandra with DataStax Enterprise
PPT
Introduction to cassandra
PPTX
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
PDF
Cassandra at eBay - Cassandra Summit 2012
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Cql – cassandra query language
Migrating Netflix from Datacenter Oracle to Global Cassandra
Solr & Cassandra: Searching Cassandra with DataStax Enterprise
Introduction to cassandra
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Cassandra at eBay - Cassandra Summit 2012
Ad

Similar to Introduction to Apache Cassandra (20)

PDF
A Microservices approach with Cassandra and Quarkus | DevNation Tech Talk
PDF
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
PPTX
DataStax NYC Java Meetup: Cassandra with Java
PPTX
BigData Developers MeetUp
PPTX
Cassandra - A decentralized storage system
PPTX
Cassandra vs. ScyllaDB: Evolutionary Differences
PPTX
NoSQL Intro with cassandra
PPTX
Cassandra training
PDF
Cassandra and Spark
PPTX
Cassandra implementation for collecting data and presenting data
PDF
Multi-cluster k8ssandra
PPTX
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
PPTX
Scaling opensimulator inventory using nosql
PDF
MySQL Cluster Scaling to a Billion Queries
PDF
Cisco: Cassandra adoption on Cisco UCS & OpenStack
PDF
NewSQL Database Overview
PPTX
Apache Cassandra introduction
PPT
Cassandra - A Distributed Database System
PDF
PPTX
5 Ways to Use Spark to Enrich your Cassandra Environment
A Microservices approach with Cassandra and Quarkus | DevNation Tech Talk
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
DataStax NYC Java Meetup: Cassandra with Java
BigData Developers MeetUp
Cassandra - A decentralized storage system
Cassandra vs. ScyllaDB: Evolutionary Differences
NoSQL Intro with cassandra
Cassandra training
Cassandra and Spark
Cassandra implementation for collecting data and presenting data
Multi-cluster k8ssandra
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Scaling opensimulator inventory using nosql
MySQL Cluster Scaling to a Billion Queries
Cisco: Cassandra adoption on Cisco UCS & OpenStack
NewSQL Database Overview
Apache Cassandra introduction
Cassandra - A Distributed Database System
5 Ways to Use Spark to Enrich your Cassandra Environment

Recently uploaded (20)

PPTX
Measurement of Afordability for Water Supply and Sanitation in Bangladesh .pptx
PPTX
Economic Sector Performance Recovery.pptx
PDF
Data Analyst Certificate Programs for Beginners | IABAC
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
Logistic Regression ml machine learning.pptx
PDF
Mastering Query Optimization Techniques for Modern Data Engineers
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
PDF
Mastering Financial Analysis Materials.pdf
PDF
A Systems Thinking Approach to Algorithmic Fairness.pdf
PPTX
Purple and Violet Modern Marketing Presentation (1).pptx
PDF
Company Presentation pada Perusahaan ADB.pdf
PPTX
1intro to AI.pptx AI components & composition
PDF
Linux OS guide to know, operate. Linux Filesystem, command, users and system
PDF
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
PPTX
Azure Data management Engineer project.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
artificial intelligence deeplearning-200712115616.pptx
PDF
Launch Your Data Science Career in Kochi – 2025
PPT
Performance Implementation Review powerpoint
PDF
Chad Readey - An Independent Thinker
Measurement of Afordability for Water Supply and Sanitation in Bangladesh .pptx
Economic Sector Performance Recovery.pptx
Data Analyst Certificate Programs for Beginners | IABAC
Moving the Public Sector (Government) to a Digital Adoption
Logistic Regression ml machine learning.pptx
Mastering Query Optimization Techniques for Modern Data Engineers
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
Mastering Financial Analysis Materials.pdf
A Systems Thinking Approach to Algorithmic Fairness.pdf
Purple and Violet Modern Marketing Presentation (1).pptx
Company Presentation pada Perusahaan ADB.pdf
1intro to AI.pptx AI components & composition
Linux OS guide to know, operate. Linux Filesystem, command, users and system
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Azure Data management Engineer project.pptx
Business Acumen Training GuidePresentation.pptx
artificial intelligence deeplearning-200712115616.pptx
Launch Your Data Science Career in Kochi – 2025
Performance Implementation Review powerpoint
Chad Readey - An Independent Thinker

Introduction to Apache Cassandra