VxWorks API Reference : OS Libraries
dcacheCbio - Disk Cache Driver
dcacheDevCreate( ) - Create a disk cache
dcacheDevDisable( ) - Disable the disk cache for this device
dcacheDevEnable( ) - Reenable the disk cache
dcacheDevTune( ) - modify tunable disk cache parameters
dcacheDevMemResize( ) - set a new size to a disk cache device
dcacheShow( ) - print information about disk cache
dcacheHashTest( ) - test hash table integrity
This module implements a disk cache mechanism via the CBIO API. This is intended for use by the VxWorks DOS file system, to store frequently used disk blocks in memory. The disk cache is unaware of the particular file system format on the disk, and handles the disk as a collection of blocks of a fixed size, typically the sector size of 512 bytes.
The disk cache may be used with SCSI, IDE, ATA, Floppy or any other type of disk controllers. The underlying device driver may be either comply with the CBIO API or with the older block device API.
This library interfaces to device drivers implementing the block device API via the basic CBIO BLK_DEV wrapper provided by cbioLib.
Because the disk cache complies with the CBIO programming interface on both its upper and lower layers, it is both an optional and a stackable module. It can be used or omitted depending on resources available and performance required.
The disk cache module implements the CBIO API, which is used by the file system module to access the disk blocks, or to access bytes within a particular disk block. This allows the file system to use the disk cache to store file data as well as Directory and File Allocation Table blocks, on a Most Recently Used basis, thus keeping a controllable subset of these disk structures in memory. This results in minimized memory requirements for the file system, while avoiding any significant performance degradation.
The size of the disk cache, and thus the memory consumption of the disk subsystem, is configured at the time of initialization (see dcacheDevCreate( )), allowing the user to trade-off memory consumption versus performance. Additional performance tuning capabilities are available through dcacheDevTune( ).
Briefly, here are the main techniques deployed by the disk cache:
Least Recently Used block re-use policy Read-ahead Write-behind with sorting and grouping Hidden writes Disk cache bypass for large requests Background disk updating (flushing changes to disk) with an adjustable update period (ioctl flushes occur without delay.)
Some of these techniques are discussed in more detail below; others are described in varrious professional and academic publications.
The disk cache is composed internally of a number cache blocks, of the same size as the disk physical block (sector). These cache blocks are maintained in a list in "Most Recently Used" order, that is, blocks which are used are moved to the top of this list. When a block needs to be relinquished, and made available to contain a new disk block, the Least Recently Used block will be used for this purpose.
In addition to the regular cache blocks, some of the memory allocated for cache is set aside for a "big buffer", which may range from 1/4 of the overall cache size up to 64KB. This buffer is used for:
Combining cache blocks with adjacent disk block numbers, in order to write them to disk in groups, and save on latency and overhead Reading ahead a group of blocks, and then converting them to normal cache blocks.
Because there is significant overhead involved in accessing the disk drive, read-ahead improves performance significantly by reading groups of blocks at once.
There are certain operational parameters that control the disk cache operation which are tunable. A number of preset parameter sets is provided, dependent on the size of the cache. These should suffice for most purposes, but under certain types of workload, it may be desirable to tune these parameters to better suite the particular workload patterns.
See dcacheDevTune( ) for description of the tunable parameters. It is recommended to call dcacheShow( ) after calling dcacheTune( ) in order to verify that the parameters where set as requested, and to inspect the cache statistics which may change dramatically. Note that the hit ratio is a principal indicator of cache efficiency, and should be inspected during such tuning.
A dedicated task will be created to take care of updating the disk with blocks that have been modified in cache. The time period between updates is controlled with the tunable parameter syncInterval. Its priority should be set above the priority of any CPU-bound tasks so as to assure it can wake up frequently enough to keep the disk synchronized with the cache. There is only one such task for all cache devices configured. The task name is tDcacheUpd
The updating task also has the responsibility to invalidate disk cache blocks for removable devices which have not been used for 2 seconds or more.
There are a few global variables which control the parameters of this task, namely:
All the above global parameters must be set prior to calling dcacheDevCreate( ) for the first time, with the exception of dcacheUpdTaskPriority, which may be modified in run-time, and takes effect almost immediately. It should be noted that this priority is not entirely fixed, at times when critical disk operations are performed, and FIOFLUSH ioctl is called, the caller task will temporarily loan its priority to the update task, to insure the completion of the flushing operation.
- dcacheUpdTaskPriority
- controls the default priority of the update task, and is set by default to 250.
- dcacheUpdTaskStack
- is used to set the update task stack size.
- dcacheUpdTaskOptions
- controls the task options for the update task.
For removable devices, disk cache provides these additional features:
- disk updating
- is performed such that modified blocks will be written to disk within one second, so as to minimize the risk of losing data in case of a failure or disk removal.
- error handling
- includes a test for disk removal, so that if a disk is removed from the drive while an I/O operation is in progress, the disk removal event will be set immediately.
- disk signature
- which is a checksum of the disk's boot block, is maintained by the cache control structure, and it will be verified against the disk if it was idle for 2 seconds or more. Hence if during that idle time a disk was replaced, the change will be detected on the next disk access, and the condition will be flagged to the file system.
- NOTE
- It is very important that removable disks should all have a unique volume label, or volume serial number, which are stored in the disk's boot sector during formatting. Changing disks which have an identical boot sector may result in failure to detect the change, resulting in unpredictable behavior, possible file system corruption.
Most Recently Used (MRU) disk blocks are stored in a collection of memory buffers called the disk cache. The purpose of the disk cache is to reduce the number of disk accesses and to accelerate disk read and write operations, by means of the following techniques:
Most Recently Used blocks are stored in RAM, which results in the most frequently accessed data being retrieved from memory rather than from disk. Reading data from disk is performed in large units, relying on the read-ahead feature, one of the disk cache£s tunable parameters. Write operations are optimized because they occur to memory first. Then updating the disk happens in an orderly manner, by delayed write, another tunable parameter.
Overall, the main performance advantage arises from a dramatic reduction in the amount of time spent by the disk drive seeking, thus maximizing the time available for the disk to read and write actual data. In other words, you get efficient use of the disk drive£s available throughput. The disk cache offers a number of operational parameters that can be tuned by the user to suit a particular file system workload pattern, for example, delayed write, read ahead, and bypass threshold.
The technique of delaying writes to disk means that if the system is turned off unexpectedly, updates that have not yet been written to the disk are lost. To minimize the effect of a possible crash, the disk cache periodically updates the disk. Modified blocks of data are not kept in memory more then a specified period of time. By specifying a small update period, the possible worst-case loss of data from a crash is the sum of changes possible during that specified period. For example, it is assumed that an update period of 2 seconds is sufficiently large to effectively optimize disk writes, yet small enough to make the potential loss of data a reasonably minor concern. It is possible to set the update period to 0, in which case, all updates are flushed to disk immediately. This is essentially the equivalent of using the DOS_OPT_AUTOSYNC option in earlier dosFsLib implementations. The disk cache allows you to negotiate between disk performance and memory consumption: The more memory allocated to the disk cache, the higher the "hit ratio" observed, which means increasingly better performance of file system operations. Another tunable parameter is the bypass threshold, which defines how much data constitutes a request large enough to justify bypassing the disk cache. When significantly large read or write requests are made by the application, the disk cache is circumvented and there is a direct transfer of data between the disk controller and the user data buffer. The use of bypassing, in conjunction with support for contiguous file allocation and access (via the FIOCONTIG ioctl( ) command and the DOS_O_CONTIG open( ) flag), should provide performance equivalent to that offered by the raw file system (rawFs).
The dcache CBIO layer is intended to operate atop an entire fixed disk device. When using the dcache layer with the dpart CBIO partition layer, it is important to place the dcache layer below the partition layer.
For example:
+----------+ | dosFsLib | +----------+ | +----------+ | dpart | +----------+ | +----------+ | dcache | +----------+ | +----------+ | blkIoDev | +----------+ENABLE/DISABLE THE DISK CACHEThe function dcacheDevEnable is used to enable the disk cache. The function dcacheDevDisable is used to disable the disk cache. When the disk cache is disabled, all IO will bypass the cache layer.
Each cache block can be at one of the five different states at any time, while the state transitions may occur only when the mutex is taken. The three basic states are:
- EMPTY
- a block does not contain any disk data
- CLEAN
- a block contains an unmodified copy of a certain disk block
- DIRTY
- a block contains a disk block which has been modified in memory.
There is also a UNSTABLE state which is used between mutex locks, which is used to indicate that a block is being modified in memory and its data is not valid. This state is never used after mutex is released.
Removable Device Support Details
It is worth noting that we dont trust the block driver's ability to set its readyChanged flag correctly. Some drivers set it without need, others fail to set it when indeed a disk is replaced. Hence we devised an independent approach to this issue - we are assuming that while the device is active and a disc is replaced, we will get an error, and we also assume it takes at least 2 seconds to replace a disk. Hence, if the disk has been idle for more then 2 seconds, we check the checksum of its boot block, against a previously registered signature.
Issues to revisit or implement:
+ boot block number is hardcoded.
+ separate removable detection into a separate CBIO module below dcache
dcacheDevCreate( ) - Create a disk cache
CBIO_DEV_ID dcacheDevCreate ( CBIO_DEV_ID subDev, /* block device handle */ char * pRamAddr, /* where it is in memory (NULL = KHEAP_ALLOC) */ int memSize, /* amount of memory to use */ char * pDesc /* device description string */ )
This routine creates a CBIO layer disk data cache instance. The disk cache unit accesses the disk through the subordinate CBIO device driver, provided with the subDev argument.
A valid block device BLK_DEV handle may be provided instead of a CBIO handle, in which case it will be automatically converted into a CBIO device by using the wrapper functionality from cbioLib.
Memory which will be used for caching disk data may be provided by the caller with pRamAddr, or it will be allocated by dcacheDevCreate( ) from the common system memory pool, if memAddr is passed as NULL. memSize is the amount of memory to use for disk caching, if 0 is passed, then a certain default value will be calculated, based on available memory. pDesc is a string describing the device, used later by dcacheShow( ), and is useful when there are many cached disk devices.
A maximum of 16 disk cache devices are supported at this time.
disk cache device handle, or NULL if there is not enough memory to satisfy the request, or the blkDev handle is invalid.
dcacheDevDisable( ) - Disable the disk cache for this device
STATUS dcacheDevDisable ( CBIO_DEV_ID dev /* CBIO device handle */ )
This function disables the cache by setting the bypass count to zero and storing the old value, if there is already an old value then we won't repeat the process though.
RETURNS OK if cache is sucessfully disabled or ERROR.
dcacheDevEnable( ) - Reenable the disk cache
STATUS dcacheDevEnable ( CBIO_DEV_ID dev /* CBIO device handle */ )
This function re-enables the cache if we disabled it. If we did not disable it, then we cannot re-enable it.
RETURNS OK if cache is sucessfully enabled or ERROR.
dcacheDevTune( ) - modify tunable disk cache parameters
STATUS dcacheDevTune ( CBIO_DEV_ID dev, /* device handle */ int dirtyMax, /* max # of dirty cache blocks allowed */ int bypassCount, /* request size for bypassing cache */ int readAhead, /* how many blocks to read ahead */ int syncInterval /* how many seconds between disk updates */ )
This function allows the user to tune some disk cache parameters to obtain better performance for a given application or workload pattern. These parameters are checked for sanity before being used, hence it is recommended to verify the actual parameters being set with dcacheShow( ).
Following is the description of each tunable parameter:
- bypassCount
- In order to achieve maximum performance, Disk Cache is bypassed for very large requests. This parameter sets the threshold number of blocks for bypassing the cache, resulting usually in the data being transferred by the low level driver directly to/from application data buffers (also known as cut-through DMA). Passing the value of 0 in this argument preserves the previous value of the associated parameter.
- syncInterval
- The Disk Cache provides a low priority task that will update all modified blocks onto the disk periodically. This parameters controls the time between these updates in seconds. The longer this period, the better throughput is likely to be achieved, while risking to loose more data in the event of a failure. For removable devices this interval is fixed at 1 second. Setting this parameter to 0 results in immediate writes to disk when requested, resulting in minimal data loss risk at the cost of somewhat degraded performance.
- readAhead
- In order to avoid accessing the disk in small units, the Disk Cache will read many contiguous blocks once a block which is absent from the cache is needed. Increasing this value increases read performance, but a value which is too large may cause blocks which are frequently used to be removed from the cache, resulting in a low Hit Ratio, and increasing the number of Seeks, slowing down performance dramatically. Passing the value of 0 in this argument preserves the pervious value of the associated parameter.
- dirtyMax
- Routinely the Disk Cache will keep modified blocks in memory until it is specifically instructed to update these blocks to the disk, or until the specified time interval between disk updates has elapsed, or until the number of modified blocks is large enough to justify an update. Because the disk is updated in an ordered manner, and the blocks are written in groups when adjacent blocks have been modified, a larger dirtyMax parameter will minimize the number of Seek operation, but a value which is too large may decrease the Hit Ratio, thus degrading performance. Passing the value of 0 in this argument preserves the pervious value of the associated parameter.
OK or ERROR if device handle is invalid. Parameter value which is out of range will be silently corrected.
dcacheCbio, dcacheShow( )
dcacheDevMemResize( ) - set a new size to a disk cache device
STATUS dcacheDevMemResize ( CBIO_DEV_ID dev, /* device handle */ size_t newSize /* new cache size in bytes */ )
This routine is used to resize the dcache layer. This routine is also useful after a disk change event, for example a PCMCIA disk swap. The routine pccardDosDevCreate( ) in pccardLib.c uses this routine for that function. This should be invoked each time a new disk is inserted on media where the device geometry could possibly change. This function will re-read all device geometry data from the block driver, carve out and initialize all cache descriptors and blocks.
RETURNS OK or ERROR if the device is invalid or if the device geometry is invalid (EINVAL) or if there is not enough memory to perform the operation.
dcacheShow( ) - print information about disk cache
void dcacheShow ( CBIO_DEV_ID dev, /* device handle */ int verbose /* 1 - display state of each cache block */ )
This routine displays various information regarding a disk cache, namely current disk parameters, cache size, tunable parameters and performance statistics. The information is displayed on the standard output.
The dev argument is the device handle, if it is NULL, all disk caches are displayed.
N/A
dcacheHashTest( ) - test hash table integrity
void dcacheHashTest ( CBIO_DEV_ID dev )