Presentation

Optimization of LFS with
Slack Space Recycling and
Lazy Indirect Block Update
Yongseok Oh <[email protected]>
The 3rd Annual Haifa Experimental Systems Conference
May 24, 2010
Haifa, Israel
Yongseok Oh and Donghee Lee (University of Seoul)
Jongmoo Choi (Dankook University)
Eunsam Kim and Sam H. Noh (Hongik University)
SYSTOR2010, Haifa Israel
1
Outline
 Introduction
 Slack Space Recycling and Lazy Indirect Block Update
 Implementation of SSR-LFS
 Performance Evaluation
 Conclusions
SYSTOR2010, Haifa Israel
2
Log-structured File System
A
B
C
D
Segment buffer
Storage
 LFS collects small write requests and writes them sequentially to the
storage device [Rosenblum at el, ACM TOCS ‘91]
 Advantages
■ Superior write performance for random workloads
■ Fast recovery
 Drawbacks
■ On-demand cleaning
■ Cascading meta-data update
SYSTOR2010, Haifa Israel
3
Storage Devices
 Hard Disk Drives
■ Mechanical components – disk head, spindle motor, and platter
■ Poor random read/write performance
 Solid State Drives
■ Consist of NAND flash memory, no mechanical parts
■ High performance, low-power consumption, and shock resistance
■ Sequential writes are faster than random writes
SYSTOR2010, Haifa Israel
4
Outline
 Introduction
 Slack Space Recycling and Lazy Indirect Block Update
 Implementation of SSR-LFS
 Performance Evaluation
 Conclusions
SYSTOR2010, Haifa Israel
5
LFS cleaning
Copy
A
B
Segment 1
C
D
Segment 2
A
B
C
D
Segment 3
 To make free segments
■ LFS cleaner copies valid blocks to other free segment
 On-demand cleaning
■ Overall performance decreases
 Background cleaning
■ It does not affect the performance
SYSTOR2010, Haifa Israel
6
Hole-plugging
Copy
A
B
Segment 1
C
A
B
D
Segment 2
Segment 3
 Matthews et al. employed Hole-plugging in LFS [Matthews et al, ACM OSR ’97]
 The cleaner copies valid blocks to holes of other segments
SYSTOR2010, Haifa Israel
7
Slack Space Recycling (SSR) Scheme
Segment buffer
E
F
G
H
SSR
SSR
A
B
E
Segment 1
F
C
G
H
D
Segment 2
Segment 3
 We proposed SSR scheme that directly recycles slack space to avoid ondemand cleaning
 Slack Space is invalid area in used segment
SYSTOR2010, Haifa Israel
8
Cascading meta-data update
Double
Indirect
i-node
Indirect
A
Data
update
Update
Update D
B’
A’
Data’
1
C’
C’
A’
B
C
update B
A
A
C
B
1
N-1
N
B’
C
D
E
N-1
N
D’
F
A’
Segment 1
B’
C’
D’
Segment 2
 Update of a data block incurs cascading meta-data update
 Modification of a data block consumes 4 blocks
SYSTOR2010, Haifa Israel
9
Lazy Indirect Block Update (LIBU) scheme
Double
Indirect
Indirect
Map
B
C
C
E
Indirect
A
Data
Insert
Insert
A
Update
A
B
Update D
B’
C’
E’
C’
A’
2
B’
IBC (Indirect Block Cache)
C
D
E
1
N-1
N
Insert
B
A’
Data’
i-node
F
A’
Segment 1
1
2
N-1
N
D’
D’
Segment 2
 We propose LIBU scheme to decrease cascading meta-data update
■ LIBU uses IBC (Indirect Block Cache) to absorb the frequent update of indirect
blocks
■ Indirect Map is necessary to terminate cascading meta data update
SYSTOR2010, Haifa Israel
10
Crash Recovery
i-node map
Segment
usage table
indirect
map
flush
Consistent state
Rebuild
Write
Scan
indirect
blocks
RAM
Power failure
Write
Disk
search the last checkpoint
Check point(last)
Check point
 For crash recovery, LFS periodically stores checkpoint information
 If power failure occurs,
■ search the last check point
■ scan all segments written after the last check point
■ rebuild i-node map, segment usage table, indirect map, and indirect blocks in IBC
SYSTOR2010, Haifa Israel
11
Outline
 Introduction
 Slack Space Recycling and Lazy Indirect Block Update
 Implementation of SSR-LFS
 Performance Evaluation
 Conclusions
SYSTOR2010, Haifa Israel
12
Implementation of SSR-LFS
 We implemented SSR-LFS (Slack Space Recycling LFS)
■ Using FUSE (Filesystem in Userspace) framework in Linux
 SSR-LFS selectively chooses either SSR or cleaning
■ When the system is idle, it performs background cleaning
■ When the system is busy, it performs SSR or on-dmenad cleaning
 If average slack size is too small, it selects on-demand cleaning
 Otherwise, it selects SSR
SSR-LFS Core
write(“/mnt/file”)
IBC
buffer
cache
i-node
cache
Syncer
Cleaner
Recycler
libfuse
Userspace
Kernel
VFS
FUSE
Architecture of SSR-LFS
SYSTOR2010, Haifa Israel
13
Outline
 Introduction
 Slack Space Recycling and Lazy Indirect Block Update
 Implementation of SSR-LFS
 Performance Evaluation
 Conclusions
SYSTOR2010, Haifa Israel
14
Experimental Environment
 For comparison, we used several file systems
■ Ext3-FUSE
■ Org-LFS(cleaning)
■ Org-LFS(plugging)
■ SSR-LFS
 Benchmarks
■ IO TEST that generates random write workload
■ Postmark that simulates the workload of a mail server
 Storage Devices
■ SSD – INTEL SSDSA2SH032G1GN
■ HDD – SEAGATE ST3160815AS
SYSTOR2010, Haifa Israel
15
Execution Time(sec)
Random Update Performance
1600
HDD
Ext3-FUSE
Org-LFS(Plugging)
Org-LFS(Cleaning)
SSR-LFS
1400
1200
1000
800
SSD
500
450
400
350
300
250
200
150
100
50
0
600
400
200
Ext3-FUSE
Org-LFS(Plugging)
Org-LFS(Cleaning)
SSR-LFS
10 20 30 40 50 60 70 80 90
10 20 30 40 50 60 70 80 90
Utilization(%)
Utilization(%)
 SSR-LFS shows better than others for a wide range of utilization
 On HDD
■ SSR-LFS and Org-LFS outperform Ext3-FUSE under utilization of 85%,
 On SSD
■ Ext3-FUSE outperforms Org-LFS due to optimization of SSD for random writes
SYSTOR2010, Haifa Israel
16
Postmark Benchmark Result (1)
SSD
6000
700
5000
600
Execution time
Execution time
HDD
4000
3000
2000
1000
500
400
300
200
100
0
0
Org-LFS
SSR-LFS
Ext3-FUSE
Org-LFS
SSR-LFS
Ext3-FUSE
 Medium file size (16KB ~ 256KB)
■ Subdirectories 1000, # of files 100,000, # of transactions 100,000
 SSR-LFS outperforms other file systems on both devices
 Org-LFS shows better performance than Ext3-FUSE on HDD
 Ext3-FUSE shows comparative performance to Org-LFS on SSD
SYSTOR2010, Haifa Israel
17
Postmark Benchmark Result (2)
HDD
SSD
7000
1200
1000
5000
Execution time
Execution time
6000
4000
3000
2000
800
600
400
200
1000
0
0
Org-LFS
SSR-LFS
Ext3-FUSE
Org-LFS
SSR-LFS
Ext3-FUSE
 Small file size (4KB ~ 16KB)
■ Subdirectories 1000, # of files 500,000 ,# of transactions 200,000
 Ext3-FUSE performs better than other file systems on SSD
■ due to meta-data optimization of Ext3 such as hash-based directory
SYSTOR2010, Haifa Israel
18
Outline
 Introduction
 Slack Space Recycling and Lazy Indirect Block Update
 Implementation of SSR-LFS
 Performance Evaluation
 Conclusions
SYSTOR2010, Haifa Israel
19
Conclusions
 SSR-LFS outperforms original style LFS for a wide range of utilization
 Future works
■ Optimization of meta-data structures
■ Cost-based selection between cleaning and SSR
 We plan to release the source code of SSR-LFS this year
■ http://embedded.uos.ac.kr
SYSTOR2010, Haifa Israel
20
Q&A
 Thank you
SYSTOR2010, Haifa Israel
21
Back up slides
SYSTOR2010, Haifa Israel
22
Storage devices
Request Size
16M
8M
4M
Rand
Request Size
16M
8M
4M
2M
1M
512K
256K
0
128K
16M
8M
4M
2M
1M
512K
256K
64K
128K
32K
16K
Seq
50
64K
Rand
0
100
32K
50
150
8K
Seq
200
16K
100
250
4K
150
Throughput(MB/s)
200
8K
2M
Seagate HDD (Read)
250
4K
1M
Request Size
Seagate HDD (Write)
Throughput(MB/s)
512K
4K
16M
8M
4M
2M
1M
512K
256K
128K
64K
32K
8K
16K
Rand
0
256K
Rand
0
Seq
50
128K
50
100
64K
Seq
150
32K
100
200
8K
150
250
16K
200
Throughput(MB/s)
Intel SSD (Read)
250
4K
Throughput(MB/s)
Intel SSD (Write)
Request Size
SYSTOR2010, Haifa Israel
23
Measurement of FUSE overhead
HDD
SSD
80
250
60
50
40
Ext3
30
Ext3-FUSE
20
Throughput(MB/s)
Throughput(MB/s)
70
200
150
Ext3
Ext3-FUSE
100
50
10
0
0
Seq-write
rand-write
seq-read
rand-read
Seq-write
rand-write
seq-read
rand-read
 To identify the performance penalty of a user space implementation
 Ext3-FUSE underperforms Kernel Ext3 for almost patterns
■ Due to FUSE overhead
SYSTOR2010, Haifa Israel
24
Postmark Benchmark Result (1)
SSD
6000
700
5000
600
Execution time
Execution time
HDD
4000
3000
2000
1000
500
400
300
200
100
0
0
Org-LFS
1302 cleanings
SSR-LFS
Ext3-FUSE
Org-LFS
SSR-LFS
Ext3-FUSE
4142 SSRs
 Medium file size (16KB ~ 256KB)
■ Subdirectories 1000, # of files 100,000, # of transactions 100,000
 SSR-LFS outperforms other file systems on both devices
 Org-LFS shows better performance then Ext3-FUSE on HDD
 Ext3-FUSE shows comparative performance to Org-LFS on SSD
SYSTOR2010, Haifa Israel
25
Postmark Benchmark Result (2)
HDD
SSD
7000
1200
1000
5000
Execution time
Execution time
6000
4000
3000
2000
800
600
400
200
1000
0
0
Org-LFS
3018 cleanings
SSR-LFS
Ext3-FUSE
Org-LFS
SSR-LFS
Ext3-FUSE
1692 cleanings
1451 SSRs
 Small file size (4KB ~ 16KB)
■ Subdirectories 1000, # of files 500,000 ,# of transactions 200,000
 Ext3-FUSE performs better than other file systems on SSD
■ due to meta-data optimization of Ext3 such as hash-based directory
SYSTOR2010, Haifa Israel
26
Postmark Benchmark Result (3)
SSD
HDD
500
1400
450
1200
350
Execution time
Execution time
400
300
250
200
150
1000
800
600
400
100
200
50
0
0
Org-LFS
SSR-LFS
Ext3-FUSE
Org-LFS
SSR-LFS
Ext3-FUSE
 Large file size (disappeared in the paper)
■ File size 256KB ~ 1MB
■ Subdirectories 1000
■ # of files 10,000
■ # of transactions 10,000
SYSTOR2010, Haifa Israel
27
Statistics of IO-TEST
SSD (Utilizaton 85%)
18000
18000
16000
16000
14000
14000
12000
10000
8000
6000
4000
written
read
Mega bytes
Mega bytes
HDD (Utilizaton 85%)
12000
10000
8000
6000
written
4000
read
2000
2000
0
0
SYSTOR2010, Haifa Israel
28
Random Update Performance
 IO TEST benchmark
■ 1st stage
 Creates 64KB files until 90% utilization
 Randomly deletes files until target utilization
■ 2nd stage
 Randomly updates up to 4GB of file capacity
 Measures the elapsed time in this stage
 LFS has no free segments
 Including cleaning or SSR cost
Create files
Disk
90%(High threshold)
update
files
20%(desiredRandom
threshold)
Delete
files
SYSTOR2010, Haifa Israel
29
Experimental Environment
Type
Specifics
OS
Ubuntu 9.10 (linux-2.6.31.4)
Processor
AMD 2.1GHz Opteron 1352
RAM
SAMSUNG 4GB DDR-2 800Mhz ECC
SSD
INTEL SSDSA2SH032G1GN
HDD
SEAGATE ST3160815AS
SYSTOR2010, Haifa Israel
30
Comparison of cleaning and SSR scheme
Cleaning case :
Write Request
Tseg
Foreground
x
delay
Ton-demand
Tseg
cleaning
write
write
Tidle
Tback-ground
Background
Free Segments
1
SSR case :
0
1
0
T1
T2
T3
1
3
2
Time
Write Request
Foreground
Tseg
Tssr
write
SSR
Tidle
Tseg
write
Tback-ground
Background
Free Segments
1
0
T1
0
1
2
T2
SYSTOR2010, Haifa Israel
3
T3
Time
31
FIO Benchmark Result (1)
HDD
140
80
120
70
100
80
Org-LFS
SSR-LFS
60
Ext3-FUSE
40
20
Throughput(MB/s)
Throughput(MB/s)
SSD
60
50
Org-LFS
40
SSR-LFS
30
Ext3-FUSE
20
10
0
0
Seq-write
rand-write
seq-read
rand-read
Seq-write
rand-write
seq-read
rand-read
 To measure the performance impact of Lazy Indirect Block Update
 1 worker - File size 1GB, Request size 4KB, Sync I/O mode
 SSR-LFS outperforms Org-LFS and Ext3-FUSE for sequential and
random writes on both devices
SYSTOR2010, Haifa Israel
32
FIO Benchmark Result (2)
HDD
140
90
120
80
100
80
Org-LFS
60
SSR-LFS
Ext3-FUSE
40
20
Throughtput(MB/s)
Throughput(MB/s)
SSD
70
60
50
Org-LFS
40
SSR-LFS
30
Ext3-FUSE
20
10
0
0
Seq-write
rand-write
seq-read
rand-read
Seq-write
rand-write
seq-read
rand-read
 4 workers - File size 1GB, Request size 4KB, Sync mode
 The LIBU scheme has greater impact on performance when four
workers are running
SYSTOR2010, Haifa Israel
33