Server-Class Energy and Performance Evaluations

Server-Class Energy and Performance
Evaluations
Erez Zadok
[email protected]
File systems and Storage Lab
Stony Brook University
http://green.filesystems.org/
05/26/2010
Erez Zadok invited talk (ACM SYSTOR 2010)
Motivation
• For every $1 spent on hardware $0.50
•
spent on power and cooling [IDC 2007]
Energy use in U.S. data centers = 1–2% of
total energy in U.S. [EPA 2007]

Growth Rate of 2x per 5 years
• Even more outside the data center
[Forrester 2008]
Build performance- and energy-efficient systems
Evaluate the efficacy of file system in achieving this goal
05/26/2010
Erez Zadok invited talk (ACM SYSTOR 2010)
2
Overview
• Motivation
• Related Work
• Experimental Methodology
• Evaluation Results
 Machine
1 (M1) Results
 Machine 2 (M2) Results
• Conclusion and Future Work
05/26/2010
Erez Zadok invited talk (ACM SYSTOR 2010)
3
Techniques
Reduce Pidle
Reduce Pdynamic
Complementary
Right Sizing
•
•
•
Hardware-based
CPU DVFS
Machine ACPI states

•
•
•
Work Reduction
standby, hibernate, off, etc.
Opportunistic spin-down
DRPM
Virtualization/VMs
05/26/2010
•
•
•
•
Software-based
Aggregation, Localization
Compression, DeDUP
Reconfiguration



Application/Services
File Systems
RAID Levels, etc.
Erez Zadok invited talk (ACM SYSTOR 2010)
4
Right Sizing Techniques
• Techniques to increase disk sleep time
 Massive
Array of Idle disks (MAID)
[Colarelli 2002]
 Popular Data Concentration (PDC)
[Pinheiro 2004]
 Write off-loading [Narayanan 2008]
 GreenFS [Joukov 2008]
 Scale down Hadoop clusters [Leverich 2009]
05/26/2010
Erez Zadok invited talk (ACM SYSTOR 2010)
5
Work Reduction Techniques
• Grouping/replication and prediction
 FS2
[Huang 2005]
 EEFS [Li 2006]
 Predictive Data Grouping [Essary 2008]
• Energy-aware prefetching

[Manzanares 2006]
• Hybrid: Low-powered hardware with
intelligent data-structure
 FAWN
05/26/2010
[Andersen 2009]
Erez Zadok invited talk (ACM SYSTOR 2010)
6
Benchmarking Studies
• Benchmarks
 SPECPower
 Metric: operations/second/watt
 JouleSort
 Metric: sortedrecs/joule
• Benchmark Studies
 RAID
evaluation [Gurumurthi 2003]
 Compression evaluation [Kothiyal 2009]
05/26/2010
Erez Zadok invited talk (ACM SYSTOR 2010)
7
Overview
• Motivation
• Related Work
• Experimental Methodology
• Evaluation Results
• Conclusion and Future Work
05/26/2010
Erez Zadok invited talk (ACM SYSTOR 2010)
8
Experimental Methodology
• Workloads (4)
Web server, Database server,
File server, Mail server
 FileBench emulated workloads

• File Systems (4)
Type: Ext2, Ext3, ReiserFS, XFS
 Mount Options: noatime, notail,

journal=<modes>

Format Options: inode size, blocksize,
allocation/block group count.
• Hardware (2)
We ran a total of 248 benchmarks  414 clock hours!
05/26/2010
Erez Zadok invited talk (ACM SYSTOR 2010)
9
FileBench
• Sun Microsystems, 2005

Used for performance analysis of Solaris OS
• Rich language to emulate complex workloads
• Provide with a few emulated workloads

Application traces

Recommend parameters for server workloads
• Superior to few other benchmarks

E.g., Bonnie, Postmark, Andrew Benchmark, etc.
• We maintain/release new version
05/26/2010
Erez Zadok invited talk (ACM SYSTOR 2010)
10
FileBench Workloads
Server
workload
Avg.
file size
Avg.
directory
depth
No. of
files
I/O size
(R/W)
No. of
threads
R/W
ratio
Mail
16KB
FLAT
50,000
1MB/16KB
100
1:1
Database
0.5GB
FLAT
10
2KB/2KB
200+10
20:1
Web
32KB
3.3
20,000
1MB/16KB
100
10:1
File
256KB
3.6
50,000
1MB/16KB
100
1:2
05/26/2010
Erez Zadok invited talk (ACM SYSTOR 2010)
11
File System Properties
Features
Ext2
Ext3
ReiserFS
XFS
Disk Layout
Linear
Linear
B+ Tree
B+ Tree
Allocation
unit /
strategy
Fixed-sized
blocks
Fixed-sized
blocks
Fixed-sized
blocks
Variablesized extents
(Delayed
allocation)
No. of Files
Fixed
Fixed
Variable
Variable
Journaling
modes
None
Ordered,
writeback,
data
Ordered,
writeback,
data, none
Writeback
Special
Feature
Block groups
Block groups
Tail Packing
Allocation
groups
We used CentOS 5.3 Linux 2.6.18-128.1.16.el5.centos.plus
05/26/2010
Erez Zadok invited talk (ACM SYSTOR 2010)
12
Hardware Setup
Linux Server
Server Power
Readings (USB)
WattsUP Pro ES
(server)
A/C Power Supply
05/26/2010
Erez Zadok invited talk (ACM SYSTOR 2010)
13
Machine Configurations
M1
M2
3+ years (2007)
< 1 year (2009)
CPU Model
Intel Xeon
Intel Nehalem (E5530)
CPU Speed
2.8GHz
2.4GHz
2 dual core
1 quad core
No
Yes
L1 cache size
16KB
128KB
L2 cache size
2MB
1MB
L3 cache size
No
8MB
FSB speed
800 MHz
1066 MHz
RAM size
2048 MB
24GB (used 2GB)
RAM type
DIMM
DIMM
Disk RPM
15K RPM
7.2K RPM
SCSI
SATA
3.2/3.6 ms
10.5/12.5 ms
8MB
16MB
Machine Age
# of CPUs
DVFS
Type of Disk
Average Seek Time (ms)
Disk Cache
05/26/2010
Erez Zadok invited talk (ACM SYSTOR 2010)
14
Overview
• Motivation
• Related Work
• Experimental Methodology
• Evaluation Results
 Machine
1 (M1) Results
 Machine 2 (M2) Results
• Conclusion and Future Work
05/26/2010
Erez Zadok invited talk (ACM SYSTOR 2010)
15
Mail Server (M1)
Performance
1858
1384
1447
1516
1274
1326
1157
2000
ops/sec
1600
1200
1350
1444
946
554
800
781
940
638
596
319
400
1462
1360
1300
1001
970
965
307
329
328
406 377
326
326
312
1221
0
10000
Energy Efficiency
7717
ops/kjoule
8000
6000
4003
4000
2000
6250
5797 6050
5110
3301
2350
1366
2700
4009
2560
5505
4220 4090
1602
1602
5813
6340
3980
1397 1392
1408
1310 1397
5722
5480
4781
6040
5573
1310
0
Higher is better
05/26/2010
Erez Zadok invited talk (ACM SYSTOR 2010)
16
Mail Server (M1)
7%
42%
29%
2000
ops/sec
1600
1200
1350
17– 50%
3.5x
1444
946
781
800
554
940
638
326
319
400
Tail
packing
on by
XFS
bottleneck:
default
– hurting
lookup
Performancesmall file reads
1858
328
326
329
Ext2 bottleneck:
fsync
0
ReiserFS-notail
best for this
configuration
10000
7717
ops/kjoule
8000
5797
6000
6050
4003
3301
4000
2350
2000
1366
4009
Energy Efficiency
2700
1397
1397
1392
1408
0
Linearity between Performance and Energy Efficiency
05/26/2010
Erez Zadok invited talk (ACM SYSTOR 2010)
17
Database Server (M1)
Except for Ext2
other default
FS perform
similarly
2KB block size
boosts the
efficiency by ~2x
Journaling helps in
I/O size
= Block
size
random
writes
500
429
ops/sec
361
377
402
271
300
217
200
392
30%
400
442
429
182
209
220
100
0
Performance
05/26/2010
Erez Zadok invited talk (ACM SYSTOR 2010)
18
442
Web Server (M1)
Ext2: lack of journal
and common inode
updates
2.5x
2.25x
• Tail Packing on
• Small
files fragmented
Reiserfs:
atime updates
take expensive BKL to
search ‘stat’ item
9x
8%
22%
90
71
77
1000 ops/sec
2x
74
71
68
61
58
60
30
30
5
8
0
Performance
05/26/2010
Erez Zadok invited talk (ACM SYSTOR 2010)
19
File Server (M1)
22 – 28%
• Deep directory
• Metadata – data mix
4%
37%
Large average file size
91%
43%
500
443
443
445
400
ops/sec
325
310
285
300
232
222
298
234
200
100
0
Performance
05/26/2010
Erez Zadok invited talk (ACM SYSTOR 2010)
20
File System Selection Matrix (M1)
• Newer hardware  Different results
• Optimal file system often varies with
Workload
changes in
Best File System
(Combination)
Improvement Range
(compared to all default FS)
Ops/sec
Ops/joule
 Software
XFS
(inode-size-1K)
8% – 9.4x
6% – 7.5x
File
Server

Workload
ReiserFS
(default)
0% – 1.9x
0% – 2.0x
Mail Server
ReiserFS
(notail)
29% – 5.8X
28% – 5.7x
Database Server
XFS/Ext3
(BLK-2K)
2.0 – 2.4x
2.0 – 2.4x
 Hardware
Web Server
This recommendation matters but …
05/26/2010
Erez Zadok invited talk (ACM SYSTOR 2010)
21
Overview
• Motivation
• Related Work
• Experimental Methodology
• Evaluation Results
 Machine
1 (M1) Results
 Machine 2 (M2) Results
• Conclusion and Future Work
05/26/2010
Erez Zadok invited talk (ACM SYSTOR 2010)
22
Mail Server (M1 vs. M2)
42%
7%
2000
M1
ops/sec
1600
1200
1350
1858
3.5x
1444
946
781
800
554
940
638
326
319
400
M2 vs. M1
Ext2 (M2):35%
2x disk
– 3x
Memory intensive
cache overcomes
improvement
workload
fsync bottleneck
for all defaults
328
326
329
0
Best Configs
Difference from M1: Increasing
M1: Trend
Reiserfs-notail
changes
allocation
group
decreases
Same trendM2:
as
M1
Ext3-default
across M1(~5-10%)
and M2
performance
30%
M2
1600
ops/sec
2000
1200
800
1786
1823
1623
1247
1229
1245
1100
676
1087
1181
1113
1117
675
400
0
Performance
05/26/2010
Erez Zadok invited talk (ACM SYSTOR 2010)
23
800
Database Server (M1 vs. M2)
ops/sec
600
429
361
400
392
429
377
402
442
442
271
182
200
217
209
220
Performance
trend remains
the same across
Disk
M1 intensive
and M2
workload
0
M1
300
ops/sec
229
200
120
144
146
144
246
234
238
229
247
233
238
M2 vs. M1
Best Configs
for
86%
M1 35%
and –
M2
degradation
Ext3 and
XFS w/
BLK-2K
130
100
2K block size
increases
performance
by ~1.5x
0
M2
05/26/2010
Performance
Erez Zadok invited talk (ACM SYSTOR 2010)
24
Overview
• Motivation
• Related Work
• Experimental Methodology
• Evaluation Results
• Conclusion and Future Work
05/26/2010
Erez Zadok invited talk (ACM SYSTOR 2010)
25
Ongoing Work
• We are evaluating end-to-end impact of
workloads on NFSv4 servers
• Several workloads
• Mix clients and servers
 Same
hardware
 Linux (Ubuntu, CentOS), FreeBSD,
OpenSolaris
05/26/2010
Erez Zadok invited talk (ACM SYSTOR 2010)
26
Results: Web Server, Server-wise
3000
2467
Peak Throughput (ops/sec)
2500
1973
2000
1576
1500
1048
1000
900
744
794
959
887
730
653
621
500
450
397
475
180 178 186 176 201
0
05/26/2010
CentOS-server
Ubuntu-server
FreeBSD-server
OpenSolaris-server
CentOS-client
744
1973
475
180
Ubuntu-client
794
2467
900
178
FreeBSD-client
730
1576
959
186
OpenSolaris-client
621
1048
653
176
LocalFS-client
397
450
887
201
Erez Zadok invited talk (ACM SYSTOR 2010)
27
Results: Mail Server, Server-wise
3000
2668
2560
2527
2447
2500
2356 2347
2270
Peak Throughput (ops/sec)
2197
2000
1692
1500
1356
1262
1052
1000
636
471
457
500
394
329
0
05/26/2010
316
273 254
CentOS-server
Ubuntu-server
FreeBSD-server
OpenSolaris-server
CentOS-client
2560
2270
1262
329
Ubuntu-client
2668
2356
1052
394
FreeBSD-client
2527
2347
1356
316
OpenSolaris-client
2447
2197
1692
273
LocalFS-client
457
636
471
254
Erez Zadok invited talk (ACM SYSTOR 2010)
28
Scaling Web Server Performance
Ext2
Ext3
Reiserfs
XFS
20000
18000
16000
Operations per Second
14000
12000
10000
8000
6000
4000
2000
0
10000
20000
22000
24000
26000
28000
30000
32000
34000
36000
38000
40000 160000
Number of Files
05/26/2010
Erez Zadok invited talk (ACM SYSTOR 2010)
29
Conclusions
• The Bad
Software had gotten too complex
 Workloads drive performance-energy
 Depend also on hardware, software, configurations

• The Good

Significant savings possible
 Small savings accumulate over long run

Commercial & Research opportunities
• The Ugly

05/26/2010
Need workload-specific software
Erez Zadok invited talk (ACM SYSTOR 2010)
30
Ongoing/Future Work
• Study multiple dimensions

New FS, Disk Scheduler, RAID, LVM, etc.

Client/Server Systems

Disk Types: SAS, SSD, etc.

Cluster Storage, SANs, OS
• Develop auto-configuration tools
• Develop workload-specific storage stacks
 I/O
05/26/2010
schedulers, file systems, caching
Erez Zadok invited talk (ACM SYSTOR 2010)
31
Server-Class Energy and Performance
Evaluations
Q&A
Erez Zadok
[email protected]
File systems and Storage Lab
Stony Brook University
http://green.filesystems.org/
05/26/2010
Erez Zadok invited talk (ACM SYSTOR 2010)