790
4. Section Ref 1 Pfn Ref 46 Mapped Views 4
5. User Ref 0 WaitForDel 0 Flush Count 0
6. File Object 86960228 ModWriteCount 0 System Views 0
7. Flags (8008080) File WasPurged Accessed
8. File: \Program Files\Debugging Tools for Windows (x86)\debugger.chw
Next look at the file object referenced by the control area with this command:
1. lkd> dt nt!_FILE_OBJECT 0x86960228
2. +0x000 Type : 5
3. +0x002 Size : 128
4. +0x004 DeviceObject : 0x84a69a18 _DEVICE_OBJECT
5. +0x008 Vpb : 0x84a63278 _VPB
6. +0x00c FsContext : 0x9ae3e768
7. +0x010 FsContext2 : 0xad4a0c78
8. +0x014 SectionObjectPointer : 0x86724504 _SECTION_OBJECT_POINTERS
9. +0x018 PrivateCacheMap : 0x86b48460
10. +0x01c FinalStatus : 0
11. +0x020 RelatedFileObject : (null)
12. +0x024 LockOperation : 0 ''
13. ...
The private cache map is at offset 0x18:
1. lkd> dt nt!_PRIVATE_CACHE_MAP 0x86b48460
2. +0x000 NodeTypeCode : 766
3. +0x000 Flags : _PRIVATE_CACHE_MAP_FLAGS
4. +0x000 UlongFlags : 0x1402fe
5. +0x004 ReadAheadMask : 0xffff
6. +0x008 FileObject : 0x86960228 _FILE_OBJECT
7. +0x010 FileOffset1 : _LARGE_INTEGER 0x146
8. +0x018 BeyondLastByte1 : _LARGE_INTEGER 0x14a
9. +0x020 FileOffset2 : _LARGE_INTEGER 0x14a
10. +0x028 BeyondLastByte2 : _LARGE_INTEGER 0x156
11. +0x030 ReadAheadOffset : [2] _LARGE_INTEGER 0x0
12. +0x040 ReadAheadLength : [2] 0
13. +0x048 ReadAheadSpinLock : 0
14. +0x04c PrivateLinks : _LIST_ENTRY [ 0x86b48420 - 0x86b48420 ]
15. +0x054 ReadAheadWorkItem : (null)
Finally, you can locate the shared cache map in the SectionObjectPointer field of the file
object and then view its contents:
1. lkd> dt nt!_SECTION_OBJECT_POINTERS 0x86724504
2. +0x000 DataSectionObject : 0x867548f0
3. +0x004 SharedCacheMap : 0x86b48388
4. +0x008 ImageSectionObject : (null)
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
791
5. lkd> dt nt!_SHARED_CACHE_MAP 0x86b48388
6. +0x000 NodeTypeCode : 767
7. +0x002 NodeByteSize : 320
8. +0x004 OpenCount : 1
9. +0x008 FileSize : _LARGE_INTEGER 0x125726
10. +0x010 BcbList : _LIST_ENTRY [ 0x86b48398 - 0x86b48398 ]
11. +0x018 SectionSize : _LARGE_INTEGER 0x140000
12. +0x020 ValidDataLength : _LARGE_INTEGER 0x125726
13. +0x028 ValidDataGoal : _LARGE_INTEGER 0x125726
14. +0x030 InitialVacbs : [4] (null)
15. +0x040 Vacbs : 0x867de330 -> 0x84738b30 _VACB
16. +0x044 FileObjectFastRef : _EX_FAST_REF
17. +0x048 ActiveVacb : 0x84738b30 _VACB
18. ...
Alternatively, you can use the !fileobj command to look up and display much of this
information automatically. For example, using this command on the same file object referenced
earlier results in the following output:
1. lkd> !fileobj 0x86960228
2. \Program Files\Debugging Tools for Windows (x86)\debugger.chw
3. Device Object: 0x84a69a18 \Driver\volmgr
4. Vpb: 0x84a63278
5. Event signalled
6. Access: Read SharedRead SharedWrite
7. Flags: 0xc0042
8. Synchronous IO
9. Cache Supported
10. Handle Created
11. Fast IO Read
12. FsContext: 0x9ae3e768 FsContext2: 0xad4a0c78
13. Private Cache Map: 0x86b48460
14. CurrentByteOffset: 156
15. Cache Data:
16. Section Object Pointers: 86724504
17. Shared Cache Map: 86b48388 File Offset: 156 in VACB number 0
18. Vacb: 84738b30
19. Your data is at: b1e00156
10.5 File System interfaces
The first time a file’s data is accessed for a read or write operation, the file system driver is
responsible for determining whether some part of the file is mapped in the system cache. If it’s not,
the file system driver must call the CcInitializeCacheMap function to set up the perfile data
structures described in the preceding section.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
792
Once a file is set up for cached access, the file system driver calls one of several functions to
access the data in the file. There are three primary methods for accessing cached data, each
intended for a specific situation:
■ The copy method copies user data between cache buffers in system space and a process
buffer in user space.
■ The mapping and pinning method uses virtual addresses to read and write data directly
from and to cache buffers.
■ The physical memory access method uses physical addresses to read and write data directly
from and to cache buffers.
File system drivers must provide two versions of the file read operation—cached and
noncached—to prevent an infinite loop when the memory manager processes a page fault. When
the memory manager resolves a page fault by calling the file system to retrieve data from the file
(via the device driver, of course), it must specify this noncached read operation by setting the “no
cache” flag in the IRP.
Figure 10-10 illustrates the typical interactions between the cache manager, the memory
manager, and file system drivers in response to user read or write file I/O. The cache manager is
invoked by a file system through the copy interfaces (the CcCopyRead and CcCopyWrite paths).
To process a CcFastCopyRead or CcCopyRead read, for example, the cache manager creates a
view in the cache to map a portion of the file being read and reads the file data into the user buffer
by copying from the view. The copy operation generates page faults as it accesses each previously
invalid page in the view, and in response the memory manager initiates noncached I/O into the file
system driver to retrieve the data corresponding to the part of the file mapped to the page that
faulted.
The next three sections explain these cache access mechanisms, their purpose, and how
they’re used.
10.5.1 Copying to and from the Cache
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
793
Because the system cache is in system space, it is mapped into the address space of every
process. As with all system space pages, however, cache pages aren’t accessible from user mode
because that would be a potential security hole. (For example, a process might not have the rights
to read a file whose data is currently contained in some part of the system cache.) Thus, user
application file reads and writes to cached files must be serviced by kernelmode routines that copy
data between the cache’s buffers in system space and the application’s buffers residing in the
process address space. The functions that file system drivers can use to perform this operation are
listed in Table 10-2.
You can examine read activity from the cache via the performance counters or system
perprocessor variables stored in the processor’s control block (KPRCB) listed in Table 10-3.
10.5.2 Caching with the Mapping and Pinning Interfaces
Just as user applications read and write data in files on a disk, file system drivers need to read
and write the data that describes the files themselves (the metadata, or volume structure data).
Because the file system drivers run in kernel mode, however, they could, if the cache manager
were properly informed, modify data directly in the system cache. To permit this optimization, the
cache manager provides the functions shown in Table 10-4. These functions permit the file system
drivers to find where in virtual memory the file system metadata resides, thus allowing direct
modification without the use of intermediary buffers.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
794
If a file system driver needs to read file system metadata in the cache, it calls the cache
manager’s mapping interface to obtain the virtual address of the desired data. The cache manager
touches all the requested pages to bring them into memory and then returns control to the file
system driver. The file system driver can then access the data directly.
If the file system driver needs to modify cache pages, it calls the cache manager’s pinning
services, which keep the pages active in virtual memory so that they cannot be reclaimed.
The pages aren’t actually locked into memory (such as when a device driver locks pages for
direct memory access transfers). Most of the time, a file system driver will mark its metadata
stream “no write”, which instructs the memory manager’s mapped page writer (explained in
Chapter 9) to not write the pages to disk until explicitly told to do so. When the file system driver
unpins (releases) them, the cache manager releases its resources so that it can lazily flush any
changes to disk and release the cache view that the metadata occupied. The mapping and pinning
interfaces solve one thorny problem of implementing a file system: buffer management. Without
directly manipulating cached metadata, a file system must predict the maximum number of buffers
it will need when updating a volume’s structure. By allowing the file system to access and update
its metadata directly in the cache, the cache manager eliminates the need for buffers, simply
updating the volume structure in the virtual memory the memory manager provides. The only
limitation the file system encounters is the amount of available memory.
You can examine pinning and mapping activity in the cache via the performance counters or
per-processor variables stored in the processor’s control block (KPRCB) listed in Table 10-5.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
795
10.5.3 Caching with the Direct Memory Access Interfaces
In addition to the mapping and pinning interfaces used to access metadata directly in the
cache, the cache manager provides a third interface to cached data: direct memory access (DMA).
The DMA functions are used to read from or write to cache pages without intervening buffers,
such as when a network file system is doing a transfer over the network.
The DMA interface returns to the file system the physical addresses of cached user data
(rather than the virtual addresses, which the mapping and pinning interfaces return), which can
then be used to transfer data directly from physical memory to a network device. Although small
amounts of data (1 KB to 2 KB) can use the usual buffer-based copying interfaces, for larger
transfers the DMA interface can result in significant performance improvements for a network
server processing file requests from remote systems.
To describe these references to physical memory, a memory descriptor list (MDL) is used.
(MDLs were introduced in Chapter 9.) The four separate functions described in Table 10-6 create
the cache manager’s DMA interface.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
796
You can examine MDL activity from the cache via the performance counters or
per-processor variables stored in the processor’s control block (KPRCB) listed in Table 10-7.
10.6 Fast I/O
Whenever possible, reads and writes to cached files are handled by a high-speed mechanism
named fast I/O. Fast I/O is a means of reading or writing a cached file without going through the
work of generating an IRP, as described in Chapter 7. With fast I/O, the I/O manager calls the file
system driver’s fast I/O routine to see whether I/O can be satisfied directly from the cache
manager without generating an IRP.
Because the cache manager is architected on top of the virtual memory subsystem, file
system drivers can use the cache manager to access file data simply by copying to or from pages
mapped to the actual file being referenced without going through the overhead of generating an
IRP.
Fast I/O doesn’t always occur. For example, the first read or write to a file requires setting up
the file for caching (mapping the file into the cache and setting up the cache data structures, as
explained earlier in the section “Cache Data Structures”). Also, if the caller specified an
asynchronous read or write, fast I/O isn’t used because the caller might be stalled during paging
I/O operations required to satisfy the buffer copy to or from the system cache and thus not really
providing the requested asynchronous I/O operation. But even on a synchronous I/O, the file
system driver might decide that it can’t process the I/O operation by using the fast I/O mechanism,
say, for example, if the file in question has a locked range of bytes (as a result of calls to the
Windows LockFile and UnlockFile functions). Because the cache manager doesn’t know what
parts of which files are locked, the file system driver must check the validity of the read or write,
which requires generating an IRP. The decision tree for fast I/O is shown in Figure 10-11.
These steps are involved in servicing a read or a write with fast I/O:
1. A thread performs a read or write operation.
2. If the file is cached and the I/O is synchronous, the request passes to the fast I/O entry
point of the file system driver stack. If the file isn’t cached, the file system driver sets up the file
for caching so that the next time, fast I/O can be used to satisfy a read or write request.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
797
3. If the file system driver’s fast I/O routine determines that fast I/O is possible, it calls the
cache manager’s read or write routine to access the file data directly in the cache. (If fast I/O isn’t
possible, the file system driver returns to the I/O system, which then generates an IRP for the I/O
and eventually calls the file system’s regular read routine.)
4. The cache manager translates the supplied file offset into a virtual address in the cache.
5. For reads, the cache manager copies the data from the cache into the buffer of the process
requesting it; for writes, it copies the data from the buffer to the cache.
6. One of the following actions occurs:
❏ For reads where FILE_FLAG_RANDOM_ACCESS wasn’t specified when the file was
opened, the read-ahead information in the caller’s private cache map is updated. Read-ahead may
also be queued for files for which the FO_RANDOM_ACCESS flag is not specified.
❏ For writes, the dirty bit of any modified page in the cache is set so that the lazy writer will
know to flush it to disk.
❏ For write-through files, any modifications are flushed to disk.
The performance counters or per-processor variables stored in the processor’s control block
(KPRCB) listed in Table 10-8 can be used to determine the fast I/O activity on the system.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
798
10.7 read ahead and Write behind
In this section, you’ll see how the cache manager implements reading and writing file data on
behalf of file system drivers. Keep in mind that the cache manager is involved in file I/O only
when a file is opened without the FILE_FLAG_NO_BUFFERING flag and then read from or
written to using the Windows I/O functions (for example, using the Windows ReadFile and
WriteFile functions). Mapped files don’t go through the cache manager, nor do files opened with
the FILE_FLAG_NO_BUFFERING flag set.
Note When an application uses the FILE_FLAG_NO_BUFFERING flag to open a file, its
file I/O must start at device-aligned offsets and be of sizes that are a multiple of the alignment size;
its input and output buffers must also be device-aligned virtual addresses. For file systems, this
usually corresponds to the sector size (512 bytes on NTFS, typically, and 2,048 bytes on CDFS).
One of the benefits of the cache manager, apart from the actual caching performance, is the fact
that it performs intermediate buffering to allow arbitrarily aligned and sized I/O.
10.7.1 Intelligent Read-Ahead
The cache manager uses the principle of spatial locality to perform intelligent read-ahead by
predicting what data the calling process is likely to read next based on the data that it is reading
currently. Because the system cache is based on virtual addresses, which are contiguous for a
particular file, it doesn’t matter whether they’re juxtaposed in physical memory. File read-ahead
for logical block caching is more complex and requires tight cooperation between file system
drivers and the block cache because that cache system is based on the relative positions of the
accessed data on the disk, and, of course, files aren’t necessarily stored contiguously on disk. You
can examine read-ahead activity by using the Cache: Read Aheads/sec performance counter or the
CcReadAheadIos system variable.
Reading the next block of a file that is being accessed sequentially provides an obvious
performance improvement, with the disadvantage that it will cause head seeks. To extend
readahead benefits to cases of strided data accesses (both forward and backward through a file),
the cache manager maintains a history of the last two read requests in the private cache map for
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
799
the file handle being accessed, a method known as asynchronous read-ahead with history. If a
pattern can be determined from the caller’s apparently random reads, the cache manager
extrapolates it. For example, if the caller reads page 4000 and then page 3000, the cache manager
assumes that the next page the caller will require is page 2000 and prereads it.
Note Although a caller must issue a minimum of three read operations to establish a
predictable sequence, only two are stored in the private cache map.
To make read-ahead even more efficient, the Win32 CreateFile function provides a flag
indicating forward sequential file access: FILE_FLAG_SEQUENTIAL_SCAN. If this flag is set,
the cache manager doesn’t keep a read history for the caller for prediction but instead performs
sequential read-ahead. However, as the file is read into the cache’s working set, the cache manager
unmaps views of the file that are no longer active and, if they are unmodified, directs the memory
manager to place the pages belonging to the unmapped views at the front of the standby list so that
they will be quickly reused. It also reads ahead two times as much data (2 MB instead of 1 MB,
for example). As the caller continues reading, the cache manager prereads additional blocks of
data, always staying about one read (of the size of the current read) ahead of the caller.
The cache manager’s read-ahead is asynchronous because it is performed in a thread separate
from the caller’s thread and proceeds concurrently with the caller’s execution. When called to
retrieve cached data, the cache manager first accesses the requested virtual page to satisfy the
request and then queues an additional I/O request to retrieve additional data to a system worker
thread. The worker thread then executes in the background, reading additional data in anticipation
of the caller’s next read request. The preread pages are faulted into memory while the program
continues executing so that when the caller requests the data it’s already in memory.
For applications that have no predictable read pattern, the FILE_FLAG_RANDOM_
ACCESS flag can be specified when the CreateFile function is called. This flag instructs the cache
manager not to attempt to predict where the application is reading next and thus disables
read-ahead. The flag also stops the cache manager from aggressively unmapping views of the file
as the file is accessed so as to minimize the mapping/unmapping activity for the file when the
application revisits portions of the file.
10.7.2 Write-Back Caching and Lazy Writing
The cache manager implements a write-back cache with lazy write. This means that data
written to files is first stored in memory in cache pages and then written to disk later. Thus, write
operations are allowed to accumulate for a short time and are then flushed to disk all at once,
reducing the overall number of disk I/O operations.
The cache manager must explicitly call the memory manager to flush cache pages because
otherwise the memory manager writes memory contents to disk only when demand for physical
memory exceeds supply, as is appropriate for volatile data. Cached file data, however, represents
nonvolatile disk data. If a process modifies cached data, the user expects the contents to be
reflected on disk in a timely manner.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
800
The decision about how often to flush the cache is an important one. If the cache is flushed
too frequently, system performance will be slowed by unnecessary I/O. If the cache is flushed too
rarely, you risk losing modified file data in the cases of a system failure (a loss especially irritating
to users who know that they asked the application to save the changes) and running out of physical
memory (because it’s being used by an excess of modified pages).
To balance these concerns, once per second the cache manager’s lazy writer function
executes on a system worker thread and queues one-eighth of the dirty pages in the system cache
to be written to disk. If the rate at which dirty pages are being produced is greater than the amount
the lazy writer had determined it should write, the lazy writer writes an additional number of dirty
pages that it calculates are necessary to match that rate. System worker threads from the
systemwide critical worker thread pool actually perform the I/O operations.
Note The cache manager provides a means for file system drivers to track when and how
much data has been written to a file. After the lazy writer flushes dirty pages to the disk, the cache
manager notifies the file system, instructing it to update its view of the valid data length for the
file. (The cache manager and file systems separately track the valid data length for a file in
memory.)
You can examine the activity of the lazy writer by examining the cache performance counters
or per-processor variables stored in the processor’s control block (KPRCB) listed in Table 10-9.
eXPeriMeNT: Watching the Cache Manager in action
In this experiment, we’ll use Process Monitor to view the underlying file system activity,
including cache manager read-ahead and write-behind, when Windows Explorer copies a large file
(in this example, a CD-ROM image) from one local directory to another. First, configure Process
Monitor’s filter to include the source and destination file paths, the Explorer.exe and System
processes, and the ReadFile and WriteFile operations. In this example, the c:\source.iso file was
copied to c:\programming\source.iso, so the filter is configured as follows:
You should see a Process Monitor trace like the one shown here after you copy the file:
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
801
The first few entries show the initial I/O processing performed by the copy engine and the
first cache manager operations. Here are some of the things that you can see:
■ The initial 1-MB cached read from Explorer at the first entry. The size of this read depends
on an internal matrix calculation based on the file size and can vary from 128 KB to 1 MB.
Because this file was large, the copy engine chose 1 MB.
■ The 1-MB read is followed by 16 64-KB noncached reads. Noncached reads typically
indicate activity due to page faults or cache manager access. A closer look at the stack trace for
these events, which you can see by double-clicking an entry and choosing the Stack tab, reveals
that indeed the CcCopyRead cache manager routine, which is called by the NTFS driver’s read
routine, causes the memory manager to fault the source data into physical memory:
■ After these 64-KB page fault I/Os, the cache manager’s read-ahead mechanism starts
reading the file, which includes the System process’s subsequent noncached 2-MB read at the
1-MB offset. Because of the file size and Explorer’s read I/O sizes, the cache manager chose 2
MB as the optimal read-ahead size.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
802
The stack trace for one of the read-ahead operations, shown next, confirms that one of the
cache manager’s worker threads is performing the read-ahead.
After this point, Explorer’s 1-MB reads aren’t followed by 64-KB page faults, because the
read-ahead thread stays ahead of Explorer, prefetching the file data with its 2-MB noncached
reads. Eventually, after reading about 4 MB of the file, Explorer starts performing writes to the
destination file. These are sequential, cached 64-KB writes. After about 32 MB of reads, the first
WriteFile operation from the System process occurs, shown here:
The write operation’s stack trace, shown here, indicates that the memory manager’s mapped
page writer thread was actually responsible for the write:
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
803
This occurs because for the first couple of megabytes of data, the cache manager hadn’t
started performing write-behind, so the memory manager’s mapped page writer began flushing the
modified destination file data (see Chapter 9 for more information on the mapped page writer).
To get a clearer view of the cache manager operations, remove Explorer from the Process
Monitor’s filter so that only the System process operations are visible, as shown next.
With this view, it’s much easier to see the cache manager’s 16-MB write-behind operations
(the maximum write sizes are 1 MB on client versions of Windows and 32 MB on server versions;
this experiment was performed on a server system). The Time Of Day column shows that these
operations occur almost exactly 1 second apart. The stack trace for one of the write-behind
operations, shown here, verifies that a cache manager worker thread is performing write-behind:
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
804
As an added experiment, try repeating this process with a remote copy instead (from one
Windows system to another) and by copying files of varying sizes. You’ll notice some different
behaviors by the copy engine and the cache manager, both on the receiving and sending sides.
Disabling Lazy Writing for a File
If you create a temporary file by specifying the flag FILE_ATTRIBUTE_TEMPORARY in a
call to the Windows CreateFile function, the lazy writer won’t write dirty pages to the disk unless
there is a severe shortage of physical memory or the file is explicitly flushed. This characteristic of
the lazy writer improves system performance—the lazy writer doesn’t immediately write data to a
disk that might ultimately be discarded. Applications usually delete temporary files soon after
closing them.
Forcing the Cache to Write Through to Disk
Because some applications can’t tolerate even momentary delays between writing a file and
seeing the updates on disk, the cache manager also supports write-through caching on a per–file
object basis; changes are written to disk as soon as they’re made. To turn on write-through
caching, set the FILE_FLAG_WRITE_THROUGH flag in the call to the CreateFile function.
Alternatively, a thread can explicitly flush an open file, by using the Windows FlushFileBuffers
function, when it reaches a point at which the data needs to be written to disk. You can observe
cache flush operations that are the result of write-through I/O requests or explicit calls to
FlushFileBuffers via the performance counters or per-processor variables stored in the processor’s
control block (KPRCB) shown in Table 10-10.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
805
Flushing Mapped Files
If the lazy writer must write data to disk from a view that’s also mapped into another
process’s address space, the situation becomes a little more complicated, because the cache
manager will only know about the pages it has modified. (Pages modified by another process are
known only to that process because the modified bit in the page table entries for modified pages is
kept in the process private page tables.) To address this situation, the memory manager informs
the cache manager when a user maps a file. When such a file is flushed in the cache (for example,
as a result of a call to the Windows FlushFileBuffers function), the cache manager writes the dirty
pages in the cache and then checks to see whether the file is also mapped by another process.
When the cache manager sees that the file is, the cache manager then flushes the entire view of the
section to write out pages that the second process might have modified. If a user maps a view of a
file that is also open in the cache, when the view is unmapped, the modified pages are marked as
dirty so that when the lazy writer thread later flushes the view, those dirty pages will be written to
disk. This procedure works as long as the sequence occurs in the following order:
1. A user unmaps the view.
2. A process flushes file buffers.
If this sequence isn’t followed, you can’t predict which pages will be written to disk.
eXPeriMeNT: Watching Cache Flushes
You can see the cache manager map views into the system cache and flush pages to disk by
running the Reliability and Performance Monitor and adding the Data Maps/sec and Lazy Write
Flushes/sec counters and then copying a large file from one location to another. The generally
higher line in the following screen shot shows Data Maps/sec and the other shows Lazy Write
Flushes/sec.
10.7.3 Write Throttling
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
806
The file system and cache manager must determine whether a cached write request will affect
system performance and then schedule any delayed writes. First the file system asks the cache
manager whether a certain number of bytes can be written right now without hurting performance
by using the CcCanIWrite function and blocking that write if necessary. For asynchronous I/O, the
file system sets up a callback with the cache manager for automatically writing the bytes when
writes are again permitted by calling CcDeferWrite. Otherwise, it just blocks and waits on
CcCanIWrite to continue. Once it’s notified of an impending write operation, the cache manager
determines how many dirty pages are in the cache and how much physical memory is available. If
few physical pages are free, the cache manager momentarily blocks the file system thread that’s
requesting to write data to the cache. The cache manager’s lazy writer flushes some of the dirty
pages to disk and then allows the blocked file system thread to continue. This write throttling
prevents system performance from degrading because of a lack of memory when a file system or
network server issues a large write operation.
Note The effects of write throttling are global to the system because the resource it is based
on, available physical memory, is global to the system. This means that if heavy write activity to a
slow device triggers write throttling, writes to other devices will also be throttled.
The dirty page threshold is the number of pages that the system cache will allow to be dirty
before throttling cached writers. This value is computed at system initialization time and depends
on the product type (client or server). Two other values are also computed—the top dirty page
threshold and the bottom dirty page threshold. Depending on memory consumption and the rate at
which dirty pages are being processed, the lazy writer calls the internal function CcAdjustThrottle,
which, on server systems, performs dynamic adjustment of the current threshold based on the
calculated top and bottom values. This adjustment is made to preserve the read cache in cases of a
heavy write load that will inevitably overrun the cache and become throttled. Table 10-11 lists the
algorithms used to calculate the dirty page thresholds.
Write throttling is also useful for network redirectors transmitting data over slow
communication lines. For example, suppose a local process writes a large amount of data to a
remote file system over a 9600-baud line. The data isn’t written to the remote disk until the cache
manager’s lazy writer flushes the cache. If the redirector has accumulated lots of dirty pages that
are flushed to disk at once, the recipient could receive a network timeout before the data transfer
completes. By using the CcSetDirtyPageThreshold function, the cache manager allows network
redirectors to set a limit on the number of dirty cache pages they can tolerate (for each stream),
thus preventing this scenario. By limiting the number of dirty pages, the redirector ensures that a
cache flush operation won’t cause a network timeout.
eXPeriMeNT: Viewing the Write-Throttle Parameters
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
807
The !defwrites kernel debugger command dumps the values of the kernel variables the cache
manager uses, including the number of dirty pages in the file cache (CcTotalDirtyPages), when
determining whether it should throttle write operations:
1. lkd> !defwrites
2. *** Cache Write Throttle Analysis ***
3. CcTotalDirtyPages: 7 ( 28 Kb)
4. CcDirtyPageThreshold: 425694 ( 1702776 Kb)
5. MmAvailablePages: 572387 ( 2289548 Kb)
6. MmThrottleTop: 450 ( 1800 Kb)
7. MmThrottleBottom: 80 ( 320 Kb)
8. MmModifiedPageListHead.Total: 10477 ( 41908 Kb)
9. Write throttles not engaged
10. lkd> !defwrites
11. *** Cache Write Throttle Analysis ***
12. CcTotalDirtyPages: 7 ( 28 Kb)
13. CcDirtyPageThreshold: 425694 ( 1702776 Kb)
14. MmAvailablePages: 572387 ( 2289548 Kb)
15. MmThrottleTop: 450 ( 1800 Kb)
16. MmThrottleBottom: 80 ( 320 Kb)
17. MmModifiedPageListHead.Total: 10477 ( 41908 Kb)
18. Write throttles not engaged
This output shows that the number of dirty pages is far from the number that triggers write
throttling (CcDirtyPageThreshold), so the system has not engaged in any write throttling.
10.7.4 System Threads
As mentioned earlier, the cache manager performs lazy write and read-ahead I/O operations
by submitting requests to the common critical system worker thread pool. However, it does limit
the use of these threads to one less than the total number of critical system worker threads for
small and medium memory systems (two less than the total for large memory systems).
Internally, the cache manager organizes its work requests into four lists (though these are
serviced by the same set of executive worker threads):
■ The express queue is used for read-ahead operations.
■ The regular queue is used for lazy write scans (for dirty data to flush), write-behinds, and
lazy closes.
■ The fast teardown queue is used when the memory manager is waiting for the data section
owned by the cache manager to be freed so that the file can be opened with an image section
instead, which causes CcWriteBehind to flush the entire file and tear down the shared cache map.
■ The post tick queue is used for the cache manager to internally register for a notification
after each “tick” of the lazy writer thread—in other words, at the end of each pass.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
808
To keep track of the work items the worker threads need to perform, the cache manager
creates its own internal per-processor look-aside list, a fixed-length list—one for each
processor—of worker queue item structures. (Look-aside lists are discussed in Chapter 9.) The
number of worker queue items depends on system size: 32 for small-memory systems, 64 for
medium-memory systems, 128 for large-memory client systems, and 256 for large-memory server
systems. For cross-processor performance, the cache manager also allocates a global look-aside
list at the same sizes as just described.
10.8 Conclusion
The cache manager provides a high-speed, intelligent mechanism for reducing disk I/O and
increasing overall system throughput. By caching on the basis of virtual blocks, the cache manager
can perform intelligent read-ahead. By relying on the global memory manager’s mapped file
primitive to access file data, the cache manager can provide the special fast I/O mechanism to
reduce the CPU time required for read and write operations and also leave all matters related to
physical memory management to the single Windows global memory manager, thus reducing
code duplication and increasing efficiency.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
809
11. File Systems
In this chapter, we present an overview of the file system formats supported by Windows.
We then describe the types of file system drivers and their basic operation, including how they
interact with other system components, such as the memory manager and the cache manager.
Following that is a description of how to use Process Monitor from Windows Sysinternals (at
www.microsoft.com/technet/sysinternals) to troubleshoot a wide variety of file system access
problems.
In the balance of the chapter, we first describe the Common Log File System (CLFS), a
transactional logging virtual file system implemented on the native Windows file system format,
NTFS. Then we focus on the on-disk layout of NTFS and its advanced features, such as
compression, recoverability, quotas, symbolic links, transactions (which use the services provided
by CLFS), and encryption.
To fully understand this chapter, you should be familiar with the terminology introduced in
Chapter 8, including the terms volume and partition. You’ll also need to be acquainted with these
additional terms:
■ Sectors are hardware-addressable blocks on a storage medium. Hard disks for x86 systems
almost always define a 512-byte sector size; however, Windows also supports large sector disks,
which are a new technology that allows access to even larger disks. Thus, if the sector size is the
standard 512 bytes and the operating system wants to modify the 632nd byte on a disk, it must
write a 512-byte block of data to the second sector on the disk.
■ File system formats define the way that file data is stored on storage media, and they affect
a file system’s features. For example, a format that doesn’t allow user permissions to be associated
with files and directories can’t support security. A file system format can also impose limits on the
sizes of files and storage devices that the file system supports. Finally, some file system formats
efficiently implement support for either large or small files or for large or small disks. NTFS and
exFAT are examples of file system formats that offer a different set of features and usage
scenarios.
■ Clusters are the addressable blocks that many file system formats use. Cluster size is
always a multiple of the sector size, as shown in Figure 11-1. File system formats use clusters to
manage disk space more efficiently; a cluster size that is larger than the sector size divides a disk
into more manageable blocks. The potential trade-off of a larger cluster size is wasted disk space,
or internal fragmentation, that results when file sizes aren’t perfect multiples of cluster sizes.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.