CS703 Advanced
Operating Systems
By Mr. Farhan Zaidi
Lecture No.
33
Disk Interaction
Specifying disk requests requires a lot of info:
Cylinder #, surface #, track #, sector #, transfer size, . . .
Current disks provide a higher-level interface (SCSI)
The disk exports its data as a logical array of blocks [0 … N]
Disk maps logical blocks to cylinder/ surface/ track/ sector.
Some useful facts
Disk reads/writes in terms of sectors, not bytes
read/write single sector or adjacent groups
How to write a single byte? “Read-modify-write”
read in sector containing the byte
modify that byte
write entire sector back to disk
key: if cached, don’t need to read in
Sector = unit of atomicity.
sector write done completely, even if crash in middle
(disk saves up enough momentum to complete)
larger atomic units have to be synthesized by OS
Disk Scheduling
Because seeks are so expensive (milliseconds!), the
OS tries to schedule disk requests that are queued
waiting for the disk
FCFS (do nothing)
SSTF (shortest seek time first)
Minimize arm movement (seek time), maximize request rate
Favors middle blocks
SCAN (elevator)
Reasonable when load is low
Long waiting times for long request queues
Service requests in one direction until done, then reverse
C-SCAN
Like SCAN, but only go in one direction (typewriter)
Some useful trends
Disk bandwidth and cost/bit improving exponentially
Seek time and rotational delay improving *very* slowly
similar to CPU speed, memory size, etc.
why? require moving physical object (disk arm)
Some implications:
disk accesses a huge system bottleneck & getting worse
bandwidth increase lets system (pre-)fetch large chunks for
about the same cost as small chunk.
Result? Can improve performance if you can read lots of
related stuff.
How to get related stuff? Cluster together on disk
Memory size increasing faster than typical workload size
More and more of workload fits in file cache
disk traffic changes: mostly writes and new data
BSD 4.4 Fast File system (FFS)
Used a minimum of 4096 size disk block
Records the block size in superblock
Multiple file systems with different block sizes can coreside
Improves performance in several ways
Superblock is replicated to provide fault tolerance
FFS Allocation Policies
Allocate file inodes close to their containing
directories.
1.
For mkdir, select a cylinder group with a more-than-average
number of free inodes.
For creat, place inode in the same group as the parent.
Concentrate related file data blocks in cylinder
groups.
1.
Most files are read and written sequentially.
Place initial blocks of a file in the same group as its inode.
How should we handle directory blocks?
Place adjacent logical blocks in the same cylinder group.
Logical block n+1 goes in the same group as block n.
Switch to a different group for each indirect block.
Representing Small Files
Internal fragmentation in the file system blocks can
waste significant space for small files.
FFS solution: optimize small files for space
efficiency.
Subdivide blocks into 2/ 4/ 8 fragments (or just frags).
Clustering in FFS
Clustering improves bandwidth utilization for large
files read and written sequentially.
FFS can allocate contiguous runs of blocks “most of
the time” on disks with sufficient free space.
FFS consistency and recovery
Reconstructs free list and reference counts on reboot
Enforces two invariants:
directory names always reference valid inodes
no block claimed by more than one inode
Does this with three ordering rules:
write newly allocated inode to disk before name entered in
directory
remove directory name before inode deallocated
write deallocated inode to disk before its blocks are
placed on free list
File creation and deletion take 2 synchronous writes
Why does FFS need third rule? Inode recovery
FFS: inode recovery
Files can be lost if directory destroyed or crash happens
before link can be set
New twist: FFS can find lost inodes
Facts:
FFS pre-allocates inodes in known locations on disk
Free inodes are initialized to all 0s.
So?
Fact 1 lets FFS find all inodes (whether or not there are
any pointers to them)
Fact 2 tells FFS that any inode with non-zero contents is
(probably) still in use.
fsck places unreferenced inodes with non-zero contents
in the lost+found directory