Optimization of LFS with Slack Space Recycling and Lazy Indirect Block Update Yongseok Oh <[email protected]> The 3rd Annual Haifa Experimental Systems Conference May 24, 2010 Haifa, Israel Yongseok Oh and Donghee Lee (University of Seoul) Jongmoo Choi (Dankook University) Eunsam Kim and Sam H. Noh (Hongik University) SYSTOR2010, Haifa Israel 1 Outline Introduction Slack Space Recycling and Lazy Indirect Block Update Implementation of SSR-LFS Performance Evaluation Conclusions SYSTOR2010, Haifa Israel 2 Log-structured File System A B C D Segment buffer Storage LFS collects small write requests and writes them sequentially to the storage device [Rosenblum at el, ACM TOCS ‘91] Advantages ■ Superior write performance for random workloads ■ Fast recovery Drawbacks ■ On-demand cleaning ■ Cascading meta-data update SYSTOR2010, Haifa Israel 3 Storage Devices Hard Disk Drives ■ Mechanical components – disk head, spindle motor, and platter ■ Poor random read/write performance Solid State Drives ■ Consist of NAND flash memory, no mechanical parts ■ High performance, low-power consumption, and shock resistance ■ Sequential writes are faster than random writes SYSTOR2010, Haifa Israel 4 Outline Introduction Slack Space Recycling and Lazy Indirect Block Update Implementation of SSR-LFS Performance Evaluation Conclusions SYSTOR2010, Haifa Israel 5 LFS cleaning Copy A B Segment 1 C D Segment 2 A B C D Segment 3 To make free segments ■ LFS cleaner copies valid blocks to other free segment On-demand cleaning ■ Overall performance decreases Background cleaning ■ It does not affect the performance SYSTOR2010, Haifa Israel 6 Hole-plugging Copy A B Segment 1 C A B D Segment 2 Segment 3 Matthews et al. employed Hole-plugging in LFS [Matthews et al, ACM OSR ’97] The cleaner copies valid blocks to holes of other segments SYSTOR2010, Haifa Israel 7 Slack Space Recycling (SSR) Scheme Segment buffer E F G H SSR SSR A B E Segment 1 F C G H D Segment 2 Segment 3 We proposed SSR scheme that directly recycles slack space to avoid ondemand cleaning Slack Space is invalid area in used segment SYSTOR2010, Haifa Israel 8 Cascading meta-data update Double Indirect i-node Indirect A Data update Update Update D B’ A’ Data’ 1 C’ C’ A’ B C update B A A C B 1 N-1 N B’ C D E N-1 N D’ F A’ Segment 1 B’ C’ D’ Segment 2 Update of a data block incurs cascading meta-data update Modification of a data block consumes 4 blocks SYSTOR2010, Haifa Israel 9 Lazy Indirect Block Update (LIBU) scheme Double Indirect Indirect Map B C C E Indirect A Data Insert Insert A Update A B Update D B’ C’ E’ C’ A’ 2 B’ IBC (Indirect Block Cache) C D E 1 N-1 N Insert B A’ Data’ i-node F A’ Segment 1 1 2 N-1 N D’ D’ Segment 2 We propose LIBU scheme to decrease cascading meta-data update ■ LIBU uses IBC (Indirect Block Cache) to absorb the frequent update of indirect blocks ■ Indirect Map is necessary to terminate cascading meta data update SYSTOR2010, Haifa Israel 10 Crash Recovery i-node map Segment usage table indirect map flush Consistent state Rebuild Write Scan indirect blocks RAM Power failure Write Disk search the last checkpoint Check point(last) Check point For crash recovery, LFS periodically stores checkpoint information If power failure occurs, ■ search the last check point ■ scan all segments written after the last check point ■ rebuild i-node map, segment usage table, indirect map, and indirect blocks in IBC SYSTOR2010, Haifa Israel 11 Outline Introduction Slack Space Recycling and Lazy Indirect Block Update Implementation of SSR-LFS Performance Evaluation Conclusions SYSTOR2010, Haifa Israel 12 Implementation of SSR-LFS We implemented SSR-LFS (Slack Space Recycling LFS) ■ Using FUSE (Filesystem in Userspace) framework in Linux SSR-LFS selectively chooses either SSR or cleaning ■ When the system is idle, it performs background cleaning ■ When the system is busy, it performs SSR or on-dmenad cleaning If average slack size is too small, it selects on-demand cleaning Otherwise, it selects SSR SSR-LFS Core write(“/mnt/file”) IBC buffer cache i-node cache Syncer Cleaner Recycler libfuse Userspace Kernel VFS FUSE Architecture of SSR-LFS SYSTOR2010, Haifa Israel 13 Outline Introduction Slack Space Recycling and Lazy Indirect Block Update Implementation of SSR-LFS Performance Evaluation Conclusions SYSTOR2010, Haifa Israel 14 Experimental Environment For comparison, we used several file systems ■ Ext3-FUSE ■ Org-LFS(cleaning) ■ Org-LFS(plugging) ■ SSR-LFS Benchmarks ■ IO TEST that generates random write workload ■ Postmark that simulates the workload of a mail server Storage Devices ■ SSD – INTEL SSDSA2SH032G1GN ■ HDD – SEAGATE ST3160815AS SYSTOR2010, Haifa Israel 15 Execution Time(sec) Random Update Performance 1600 HDD Ext3-FUSE Org-LFS(Plugging) Org-LFS(Cleaning) SSR-LFS 1400 1200 1000 800 SSD 500 450 400 350 300 250 200 150 100 50 0 600 400 200 Ext3-FUSE Org-LFS(Plugging) Org-LFS(Cleaning) SSR-LFS 10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90 Utilization(%) Utilization(%) SSR-LFS shows better than others for a wide range of utilization On HDD ■ SSR-LFS and Org-LFS outperform Ext3-FUSE under utilization of 85%, On SSD ■ Ext3-FUSE outperforms Org-LFS due to optimization of SSD for random writes SYSTOR2010, Haifa Israel 16 Postmark Benchmark Result (1) SSD 6000 700 5000 600 Execution time Execution time HDD 4000 3000 2000 1000 500 400 300 200 100 0 0 Org-LFS SSR-LFS Ext3-FUSE Org-LFS SSR-LFS Ext3-FUSE Medium file size (16KB ~ 256KB) ■ Subdirectories 1000, # of files 100,000, # of transactions 100,000 SSR-LFS outperforms other file systems on both devices Org-LFS shows better performance than Ext3-FUSE on HDD Ext3-FUSE shows comparative performance to Org-LFS on SSD SYSTOR2010, Haifa Israel 17 Postmark Benchmark Result (2) HDD SSD 7000 1200 1000 5000 Execution time Execution time 6000 4000 3000 2000 800 600 400 200 1000 0 0 Org-LFS SSR-LFS Ext3-FUSE Org-LFS SSR-LFS Ext3-FUSE Small file size (4KB ~ 16KB) ■ Subdirectories 1000, # of files 500,000 ,# of transactions 200,000 Ext3-FUSE performs better than other file systems on SSD ■ due to meta-data optimization of Ext3 such as hash-based directory SYSTOR2010, Haifa Israel 18 Outline Introduction Slack Space Recycling and Lazy Indirect Block Update Implementation of SSR-LFS Performance Evaluation Conclusions SYSTOR2010, Haifa Israel 19 Conclusions SSR-LFS outperforms original style LFS for a wide range of utilization Future works ■ Optimization of meta-data structures ■ Cost-based selection between cleaning and SSR We plan to release the source code of SSR-LFS this year ■ http://embedded.uos.ac.kr SYSTOR2010, Haifa Israel 20 Q&A Thank you SYSTOR2010, Haifa Israel 21 Back up slides SYSTOR2010, Haifa Israel 22 Storage devices Request Size 16M 8M 4M Rand Request Size 16M 8M 4M 2M 1M 512K 256K 0 128K 16M 8M 4M 2M 1M 512K 256K 64K 128K 32K 16K Seq 50 64K Rand 0 100 32K 50 150 8K Seq 200 16K 100 250 4K 150 Throughput(MB/s) 200 8K 2M Seagate HDD (Read) 250 4K 1M Request Size Seagate HDD (Write) Throughput(MB/s) 512K 4K 16M 8M 4M 2M 1M 512K 256K 128K 64K 32K 8K 16K Rand 0 256K Rand 0 Seq 50 128K 50 100 64K Seq 150 32K 100 200 8K 150 250 16K 200 Throughput(MB/s) Intel SSD (Read) 250 4K Throughput(MB/s) Intel SSD (Write) Request Size SYSTOR2010, Haifa Israel 23 Measurement of FUSE overhead HDD SSD 80 250 60 50 40 Ext3 30 Ext3-FUSE 20 Throughput(MB/s) Throughput(MB/s) 70 200 150 Ext3 Ext3-FUSE 100 50 10 0 0 Seq-write rand-write seq-read rand-read Seq-write rand-write seq-read rand-read To identify the performance penalty of a user space implementation Ext3-FUSE underperforms Kernel Ext3 for almost patterns ■ Due to FUSE overhead SYSTOR2010, Haifa Israel 24 Postmark Benchmark Result (1) SSD 6000 700 5000 600 Execution time Execution time HDD 4000 3000 2000 1000 500 400 300 200 100 0 0 Org-LFS 1302 cleanings SSR-LFS Ext3-FUSE Org-LFS SSR-LFS Ext3-FUSE 4142 SSRs Medium file size (16KB ~ 256KB) ■ Subdirectories 1000, # of files 100,000, # of transactions 100,000 SSR-LFS outperforms other file systems on both devices Org-LFS shows better performance then Ext3-FUSE on HDD Ext3-FUSE shows comparative performance to Org-LFS on SSD SYSTOR2010, Haifa Israel 25 Postmark Benchmark Result (2) HDD SSD 7000 1200 1000 5000 Execution time Execution time 6000 4000 3000 2000 800 600 400 200 1000 0 0 Org-LFS 3018 cleanings SSR-LFS Ext3-FUSE Org-LFS SSR-LFS Ext3-FUSE 1692 cleanings 1451 SSRs Small file size (4KB ~ 16KB) ■ Subdirectories 1000, # of files 500,000 ,# of transactions 200,000 Ext3-FUSE performs better than other file systems on SSD ■ due to meta-data optimization of Ext3 such as hash-based directory SYSTOR2010, Haifa Israel 26 Postmark Benchmark Result (3) SSD HDD 500 1400 450 1200 350 Execution time Execution time 400 300 250 200 150 1000 800 600 400 100 200 50 0 0 Org-LFS SSR-LFS Ext3-FUSE Org-LFS SSR-LFS Ext3-FUSE Large file size (disappeared in the paper) ■ File size 256KB ~ 1MB ■ Subdirectories 1000 ■ # of files 10,000 ■ # of transactions 10,000 SYSTOR2010, Haifa Israel 27 Statistics of IO-TEST SSD (Utilizaton 85%) 18000 18000 16000 16000 14000 14000 12000 10000 8000 6000 4000 written read Mega bytes Mega bytes HDD (Utilizaton 85%) 12000 10000 8000 6000 written 4000 read 2000 2000 0 0 SYSTOR2010, Haifa Israel 28 Random Update Performance IO TEST benchmark ■ 1st stage Creates 64KB files until 90% utilization Randomly deletes files until target utilization ■ 2nd stage Randomly updates up to 4GB of file capacity Measures the elapsed time in this stage LFS has no free segments Including cleaning or SSR cost Create files Disk 90%(High threshold) update files 20%(desiredRandom threshold) Delete files SYSTOR2010, Haifa Israel 29 Experimental Environment Type Specifics OS Ubuntu 9.10 (linux-2.6.31.4) Processor AMD 2.1GHz Opteron 1352 RAM SAMSUNG 4GB DDR-2 800Mhz ECC SSD INTEL SSDSA2SH032G1GN HDD SEAGATE ST3160815AS SYSTOR2010, Haifa Israel 30 Comparison of cleaning and SSR scheme Cleaning case : Write Request Tseg Foreground x delay Ton-demand Tseg cleaning write write Tidle Tback-ground Background Free Segments 1 SSR case : 0 1 0 T1 T2 T3 1 3 2 Time Write Request Foreground Tseg Tssr write SSR Tidle Tseg write Tback-ground Background Free Segments 1 0 T1 0 1 2 T2 SYSTOR2010, Haifa Israel 3 T3 Time 31 FIO Benchmark Result (1) HDD 140 80 120 70 100 80 Org-LFS SSR-LFS 60 Ext3-FUSE 40 20 Throughput(MB/s) Throughput(MB/s) SSD 60 50 Org-LFS 40 SSR-LFS 30 Ext3-FUSE 20 10 0 0 Seq-write rand-write seq-read rand-read Seq-write rand-write seq-read rand-read To measure the performance impact of Lazy Indirect Block Update 1 worker - File size 1GB, Request size 4KB, Sync I/O mode SSR-LFS outperforms Org-LFS and Ext3-FUSE for sequential and random writes on both devices SYSTOR2010, Haifa Israel 32 FIO Benchmark Result (2) HDD 140 90 120 80 100 80 Org-LFS 60 SSR-LFS Ext3-FUSE 40 20 Throughtput(MB/s) Throughput(MB/s) SSD 70 60 50 Org-LFS 40 SSR-LFS 30 Ext3-FUSE 20 10 0 0 Seq-write rand-write seq-read rand-read Seq-write rand-write seq-read rand-read 4 workers - File size 1GB, Request size 4KB, Sync mode The LIBU scheme has greater impact on performance when four workers are running SYSTOR2010, Haifa Israel 33