SALSA - Flash-Optimized Software-Defined Storage

SALSA
Flash-Optimized Software-Defined Storage
Nikolas Ioannou, Ioannis Koltsidas, Roman Pletka, Sasa Tomic,Thomas Weigold
IBM Research – Zurich
Flash Memory Summit 2015 Santa Clara, CA
1
Flash Memory Summit 2015 Santa Clara, CA
New Market Category of Big Data Flash
§  Multiple workloads don’t really need the write performance
and endurance of “good’ Flash
–  In certain environments data is actually immutable
§  What matters is high density, low cost, and good read
performance
–  Current Flash architectures are not a good fit
the number of
rd
/3
1
h
it
w
e
v
li
e
eBay: “We could
rts as long as w
o
p
p
u
s
h
s
a
fl
l
a
writes that norm
the price.”
th
/4
1
r
fo
it
t
e
g
could
§  IDC just introduced a new market category of Big Data Flash
(March 2015)
§  Content repositories, media and streaming services, Big Data
and analytics, NoSQL, Object storage, Web infrastructure.
© 2015 International Business Machines Corporation
Flash Memory Summit 2015 Santa Clara, CA
2
At <1$/GB for raw Flash, total acquisition cost becomes the
same as an HDD-based solution, with much lower TCO.
- IDC
Low-cost Flash technology (c-MLC, TLC)
Can’t we just use low-cost SSDs?
§ Low-cost Flash suffers from high write latency, low endurance
- E.g., TLC, 3D-NAND, c-MLC
Raw low-cost SSDs are
practically unusable in a
real datacenter
§ Low-cost SSDs have limited resources, simple controllers to keep the
cost as low as possible (~ $0.4 /GB!)
§ Therefore, they only employ simple Flash management
- Sufficiently good read performance
- But, limited write endurance, terrible write performance
1000"
900"
800"
700"
600"
500"
400"
300"
200"
100"
0"
2500#
Write$Latency$(usec)$
Read(Performance(
Latency((usec)(
(4kB"random"reads)"
> 50k IOPS
@ 300usec
0"
10"
20"
30"
40"
kIOPS(
© 2015 International Business Machines Corporation
Flash Memory Summit 2015 Santa Clara, CA
50"
60"
2000#
1500#
Write$Latency$
(4kB#random#writes)#
1800#
2200#
Almost as
slow as an
HDD!
1000#
500#
53#
0#
70"
MLC$PCI'e$Card$ SATA$TLC$SSD$ 15k$RPM$HDD$
3
The characteristics of write performance
400
Write Bandwidth with varying block sizes
Write Bandwidth (MB/s)
350
300
250
200
150
100
50
0
256MB seq
4kB rnd
1MB rnd
64MB rnd
Sequential I/O
on newly formatted drive
© 2015 International Business Machines Corporation
Flash Memory Summit 2015 Santa Clara, CA
256MB rnd
512MB rnd 1024MB rnd 1536MB rnd
Multiple overwrites of the
drive with Random I/O
4
SoftwAre Log-Structured Array
What?
A Flash-optimized I/O stack that elevates the performance and endurance of consumer-level
SSDs to enterprise standards.
Why?
Offer cost-effective all-Flash storage in public and private clouds, mainly for read-dominated
workloads, complementing our high-end FlashSystem offerings.
How?
1.  Use high-density, low-cost, off-the-shelf Flash SSDs
2.  Move complexity from hardware to software to reduce cost
3.  Optimize end-to-end for low Write Amplification
4.  Employ aggressive Data Reduction
5.  Natively support Object Storage
Squeeze the most
capacity out of Flash
ü  Implements the state-of-the-art Flash Management in software
SALSA
ü  Runs on Linux, exposes standard interfaces
- 
File-systems and applications run unmodified on top of SALSA
ü  Is ideal for cost-optimized scale-out storage systems like GPFS, CEPH
- 
© 2015 International Business Machines Corporation
Flash Memory Summit 2015 Santa Clara, CA
SALSA enables SDS on low-cost SSDs, offering high performance and endurance
5
SALSA Overview
- Block
- Object
- I/O memory (RDMA)
Interfaces
Logical Layer
SALSA
Software
Stack
- Workload Isolation
- Thin Provisioning
- Compression
- De-duplication
- Recurring Pattern Detection
-  Storage Virtualization
-  Quality of Service
-  Data Reduction
Physical Layer
- 
- 
- 
- 
- 
- Log-structured organization
- Flash-friendly access patterns
- State-of-the-art Garbage Collection
- Zero Read-Modify-Writes
- RAID5-equivalent protection
- Small footprint
Log-Structured Array
Capacity Management
Traffic Shaping
Load Balancing
I/O handling
Low-cost, high-density consumer SSDs
- Limited resources (FPGA, CPU, RAM)
- Light Flash Management, simple GC
SATA
Runs on Linux,
Intel x86 and Power8
TLC
© 2015 International Business Machines Corporation
Flash Memory Summit 2015 Santa Clara, CA
3D NAND
c-MLC
6
Logical Layer
IF
SALSA Stack
Block
I/O Memory
Object
Volume 1
Volume 2
Volume 3
Thin-provisioned space
Physical Layer
De-duplication
Global Garbage Collection
Segment
Grain
Write Destage Buffer
Parity
Generation
Parallel
writes to
SSDs
© 2015 International Business Machines Corporation
Flash Memory Summit 2015 Santa Clara, CA
7
Globally Shared
Overprovisioning
Space
Heat Segregation
Write Stream Separation
Recurring Pattern Detection
SALSA Stack in Linux
User space
Device Mapper
Frontend
SALSA Configuration & Tooling
Kernel
Device Mapper Kernel
Linux Block Layer
SALSA Logical Layer
(Linux Device Mapper devices)
SALSA Physical Layer
(Linux Device Mapper device)
/dev/mapper/vol0
/dev/mapper/vol1
Configuration
& RAS
/dev/mapper/array0
Linux Block Layer
SSD Device Driver
Kernel
Hardware
© 2015 International Business Machines Corporation
Flash Memory Summit 2015 Santa Clara, CA
/dev/mapper/vol2
8
Garbage Collection I/O
Experiments – Block Storage
§ Using SALSA in a commodity Linux server to create an array out of 5 SSDs
-  With RAID5-equivalent parity protection
§ Comparing against RAID0, RAID5 on the same SSDs
0.800
0.450
Random (100/0 R/W) - Reads
0.400
0.700
41x
0.600
13x
RAID0
Latency (msec)
Read Latency (msec)
0.350
Random (80/20 R/W) – Total IOPS
0.300
RAID5
0.250
SALSA
0.200
0.500
0.400
0.300
0.150
0.200
0.100
0.050
0.100
0.000
0.000
RAID0
RAID5
SALSA
0
50
100
150
200
250
300
350
Read Throughput (kIOPS)
0
20
40
60
80
Throughput (kIOPS)
SALSA dramatically improves performance in the presence of writes
© 2015 International Business Machines Corporation
Flash Memory Summit 2015 Santa Clara, CA
9
100
120
CEPH on SALSA
900000"
§  3-node x86 cluster
§  10 Gbit Ethernet network
§  2 x 1TB TLC SSDs per node
§  Replication factor of 3
§  Mixed read/write random I/O
Node 2
700000"
Node 3
CEPH
XFS
XFS
CEPH"on"SALSA"
800000"
(KB/s)'
Throughput)(MB/s))
Node 1
Baseline"(CEPH"on"raw"SSDs)"
600000"
500000"
400000"
300000"
XFS
39x
SSD
SSD
SSD
SALSA
SSD
SALSA
SSD
SALSA
SSD
200000"
at steady state
100000"
0"
4"
204" 404" 604" 804" 1004" 1204" 1404" 1604" 1804" 2004" 2204" 2404" 2604" 2804" 3004" 3204" 3404" 3604"
Time)(seconds))
SALSA can enable CEPH on Flash with high performance at a low cost!
© 2015 International Business Machines Corporation
Flash Memory Summit 2015 Santa Clara, CA
10
Performance – Virtualized TPC-E
120.00#
SALSA vs. RAID5
using 5 x 1TB TLC SSDs
TPC5E)Transac'onal)Throughput)
Linux Guest
DB2
80.00#
3.8x
Host
SALSA#
SSD
20.00#
0.00#
0#
5000#
10000#
15000#
20000#
SSD
SALSA
SSD
40.00#
RAID5#
SSD
higher
transactional
throughput
60.00#
SSD
Transac'ons)Per)Seconds)(tps))
100.00#
Time)(sec))
70.00#
TPC4E)Transac'onal)Latency)
60.00#
Transac'on)Latency)(msec))
TPC-E
§  OLTP benchmark that simulates the workload
of a brokerage firm
§  Running against DB2 in KVM guest
§  90% Reads / 10% Writes
RAID5#
50.00#
SALSA#
40.00#
6.4x
30.00#
lower
transactional
latency
20.00#
10.00#
0.00#
© 2015 International Business Machines Corporation
Flash Memory Summit 2015 Santa Clara, CA
0#
11
5000#
10000#
Time)(sec))
15000#
20000#
Endurance
§  Test using an off-the-shelf low-cost SSD (0.4 $/GB).
§  We measured the wear of the device, as reported by vendor-specific S.M.A.R.T attributes.
§  Comparing the wear incurred by SALSA to the wear incurred using the raw device
140 Device Wear 120 Raw SALSA 100 4.6x
80 60 40 20 0 0 2 4 6 Full Device Writes 8 SALSA prolongs the SSD lifetime by 4.6 times!
© 2015 International Business Machines Corporation
Flash Memory Summit 2015 Santa Clara, CA
12
10 Conclusion
§  Low-cost Flash is in high demand
§  Many workloads could benefit tremendously from capacity-optimized Flash
Write
Amplification
§  SALSA is a Flash-optimized storage virtualization stack for Linux
-  Shifts the complexity of the FTL to software
-  Transforms user access patterns to be as Flash-friendly as possible
-  Elevates the performance and endurance of low-cost SSDs to enterprise standards
§  File systems & applications do not need to be modified
© 2015 International Business Machines Corporation
Flash Memory Summit 2015 Santa Clara, CA
13
Performance
Capacity
Device Lifetime
Questions ?
www.research.ibm.com/labs/zurich/cci/
© 2015 International Business Machines Corporation
Flash Memory Summit 2015 Santa Clara, CA
14