Tải bản đầy đủ (.pdf) (27 trang)

About SSD - Dongjun Shin Samsung Electronics pot

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.04 MB, 27 trang )

About SSD
Dongjun Shin
Samsung Electronics
Outline

SSD primer

Optimal I/O for SSD

Benchmarking Linux FS on SSD

Case study: ext4, btrfs, xfs

Design consideration for SSD

What’s next?

New interfaces for SSD

Parallel processing of small I/O
SSD Primer (1/2)

Physical unit of flash memory

Page
NAND
– unit for read & write

Block
NAND
– unit for erase (a.k.a erasable block)



Physical characteristics

Erase before re-write

Sequential write within an erasable block
LBA space
(visible to OS)
Flash memory space
NAND page (2-4kB)
NAND block = 64-128 NAND pages
Flash Translation Layer
SSD Primer (2/2)

Internal organization: 2-dimensional (NxM parallelism)

Similar to RAID-0 (stripe size = sector or page
NAND
)

Effective page & block size is multiplied by NxM (max)
SSD
Controller
running
F/W(FTL)
Host I/F
(ex. SATA)
N-channel
(striping)
M-way (pipelining)

0 4 8 12
32364044
1 5 9 13
33374145
2 6 1014
34384246
3 7 1015
35394347
16202428
48525660
17212529
49535761
18222630
50545862
Ch0
Ch1
Ch2
Ch3
Chip0 Chip1
Chip2
Chip3
32364044
64687276
48525660
80848892
Optimal I/O for SSD

Key points

Parallelism


The larger the size of I/O request, the better

Match with physical characteristics

Alignment with page or block size of NAND*
• Segmented sequential write (within an erasable block)

What about Linux?

HDD also favors larger I/O  read-ahead, deferred aggregated write

Segmented FS layout  good if aligned with erasable block boundary

Write optimization  FS dependent (ex. allocation policy)
* Usually, partition layout is not aligned (1st partition at LBA 63)
Test environment (1/2)

Hardware

Intel Core 2 Duo , 1GB RAM

Software

Fedora 7 (Kernel 2.6.24)

Benchmark: postmark

Filesystems


No journaling - ext2

Journaling - ext3, ext4, reiserfs, xfs

ext3, ext4: data=writeback,barrier=1[,extents]

xfs: logbsize=128k

COW, log-structured - btrfs (latest unstable, 4k block), nilfs (testing-8)

SSD

Vendor M (32GB, SATA): read 100MB/s, write 80MB/s

Test partition starts at LBA 16384 (8MB, aligned)
Test environment (2/2)

Postmark workload

Ref: Evaluating Block-level Optimization through the IO Path (USENIX
2007)
9G/17G
9.7G/12G
600M/1.8G
630M/755M*
Total app
read/write
10,0004,2500.1-3MLL
10,0001,0000.1-3MLS
100,000100,0009-15KSL

100,00010,0009-15KSS
# of
transaction
# of file
(work-set)
File sizeWorkload
* Mostly write-only
Benchmark results (1/2)

Small file size (SS, SL)
SS SL
0
500
1000
1500
2000
2500
ext2 ext3 ext4 reiserfs xfs btrfs nilfs
transaction/sec
Benchmark results (2/2)

Large file size (LS, LL)
LS LL
0
5
10
15
20
25
30

ext2 ext3 ext4 reiserfs xfs btrfs nilfs
transaction/sec
I/O statistics (1/2)

Average size of I/O
0
20
40
60
80
100
120
140
SS SL LS LL SS SL LS LL
read write
Avg I/O size (Kbytes)
ext2 ext3 ext4 reiserfs xfs btrfs nilfs
I/O statistics (2/2)

Segmented sequentiality of write I/O (segment: 1MB)
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
12.00%
14.00%
16.00%
18.00%

20.00%
SS SL LS LL
ext2 ext3 ext4 reiserfs xfs btrfs nilfs
100% 100% 100% 100%
Case study - ext4

Condition

data=ordered, allocation: default/noreservation/oldalloc
0
200
400
600
800
1000
1200
SS SL
transaction/sec
ext4-wb ext4-ord ext4-nores ext4-olda
1. Almost no difference
between allocation policies
2. Why data=ordered is
better for SL?
Case study - btrfs

Condition

Block size: 4k/16k, allocation: ssd option on/off
0
200

400
600
800
1000
1200
1400
1600
1800
SS SL LS LL
transaction/sec
btrfs-4k btrfs-16k btrfs-ssd-4k
1. 4k is better than 16k
(sequentiality = 12% : 2%)
2. ssd option is effective
(10-40% improvement)
Case study - xfs

Condition

Mount with barrier on/off
0
100
200
300
400
500
600
700
800
SS SL LS LL

transaction/sec
xfs-bar xfs-nobar
Large barrier overhead
Design consideration for SSD

Lessons from flash FS (ex. logfs)

Sequential writing at multiple logging points

Wandering tree
• Trace-off between sequentiality vs. amount of write

Cf. space map (Sun ZFS)

Need to optimize garbage collection overhead

Either FS itself or FTL in SSD

Next topic: End-to-end optimization

Exchange info with SSD (trim, SSD identification)

Make best use of parallelism
New interfaces for SSD (t13.org)

Trim command

Let device know which LBA range is not used

This will be helpful for optimizing FTL


Should be passed through: FS  bio  scsi  libata

Passing bio with no data
• What about I/O reordering & I/O queuing?

SSD identification (added to “ATA identify”)

Report size of page

and erasable block

Physical or effective?

Useful for FS and volume manager
Parallel processing of small I/O

Make better use of I/O queuing (TCQ or NCQ)

Parallel processing of small I/O

Desktop environment? Barrier?
A B C D A B C D
A
B,C
D
without I/O queuing, 4 steps
with I/O queuing, 2 steps
request
queue

A
B,C
D
Ch0
Ch1
Ch2
Ch3
Ch0
Ch1
Ch2
Ch3
chip is busy
chip is idle
1 2 3 4 1 2
Summary

Optimization for SSD

Alignment is important

Segmented sequentiality

Make better use of parallelism (either small or large)

I/O barrier may stall the pipelined processing

What can you do?

File system: alignment, allocation policy, design (ex. COW)


Block layer: bio w/ hint, barrier, I/O queueing, scheduler(?)

Volume manager: alignment, allocation

Virtual memory: read-ahead
References

T13 spec for SSD

/>
/>
Introduction to SSD and flash memory

/>
/>
/>
FTL description & optimization

BPLRU: A Buffer Management Scheme for Improving Random Writes
in Flash Storage (FAST ’08)
Appendix. I/O Pattern

SS workload – ext4, xfs
Appendix. I/O Pattern

SS workload – btrfs, nilfs
Appendix. I/O Pattern

SL workload – ext4, xfs
Appendix. I/O Pattern


SL workload – btrfs, nilfs
Appendix. I/O Pattern

LS workload – ext4, reiserfs, xfs
Appendix. I/O Pattern

LS workload – btrfs, nilfs

×