eG Monitoring
 

Measures reported by VnxCacheTest

EMC VNX systems have 2 storage processors - usually suffixed by “A” or “B” to denote which one it is. It is the job of the storage processor to retrieve data from the disk when asked, and to write data to disk when asked. It also handles all RAID operations as well as read and write caching. The read cache uses a read-ahead mechanism that lets the storage system prefetch data from the disk. Therefore the data will be ready in the cache when the application needs it. The write cache buffers and optimizes writes by absorbing peak loads, combining small writes, and eliminating rewrites. The read and write caches and cache pages need to be sized adequately to achieve optimal performance of the storage system. If not, it may result in poor cache hits, a high rate of direct disk accesses, and significant degradation in the performance of the storage system. To avert such disasters, it would be good practice to periodically run the VnxCacheTest test.

This test continuously monitors the current state, size, and usage of the read and write caches of each storage processor of the EMC VNX storage system, and proactively alerts administrators to the abnormal state, ineffective usage, and/or the insufficient size of the caches. This way, administrators are enabled to pre-emptively initiate remedial measures, so that the problems are resolved before storage system performance is impacted.

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
Read_hit_ratio Indicates the percentage of read requests to this LUN that were serviced by the cache. Percent Ideally, the value of this measure should be high. A low value indicates that many read requests are serviced by direct disk accesses, which is a more expensive operation in terms of processing overheads.
Write_hit_ratio Indicates the percentage of write requests to this LUN that were serviced by the cache. Percent Ideally, the value of this measure should be high. A low value indicates that many write requests are serviced by direct disk accesses, which is a more expensive operation in terms of processing overheads.
Dirty_cache_pages Indicates the number of dirty cache pages. Number These are pages in write cache that have received new data from hosts but have not yet been flushed to disk. While a high value (i.e., a value between 60-80% of the write cache) for this measure is good as it increases the chance of a read coming from cache or additional writes to the same block of data being absorbed by the cache, a very high value - i.e., a value equal to or close to the total number of pages in the write cache - is a sign of bad health, as it indicates that the write cache is over-stressed.
Dirty_cache_pages_owned Indicates the number of cache pages owned. Number  
SPA_read_cache_state Indicates the current state of the read cache for Storage Processor (SP) A.   If the read cache of the storage processor (SP) A is enabled, then this measure will report the value Enabled. If not, then, this measure will report the value Disabled.

The numeric values that correspond to each of the states discussed above are available in the table below:

Numeric Value Measure Value
1 Enabled
0 Disabled

Note:

By default, this measure reports the above-mentioned Measure Values to indicate the status of the read cache. The graph of this measure however, represents the cache status using the numeric equivalents - 0 or 1.
SPA_write_cache_state Indicates the current state of the write cache for Storage Processor (SP) A.   If the write cache of the storage processor (SP) A is enabled, then this measure will report the value Enabled. If not, then, this measure will report the value Disabled.

The numeric values that correspond to each of the states discussed above are available in the table below:

Numeric Value Measure Value
1 Enabled
0 Disabled

Note:

By default, this measure reports the above-mentioned Measure Values to indicate the status of the write cache. The graph of this measure however, represents the cache status using the numeric equivalents - 0 or 1.
SPA_cache_pages Indicates the total number of pages in the cache of Storage Processor A. Number For best performance, each Storage Processor (SP) should have the maximum amount of its memory in cache and should use the default settings for the cache properties. Therefore, ideally the number of memory pages in the cache should be high, as otherwise, storage system performance will suffer.
SPA_read_cache_size Indicates the current size of the read cache of Storage Processor A. MB The read cache holds data that is expected to be accessed in the near future. If a request for data that is in the cache arrives, the request can be serviced from the cache faster than from the disks. Each request satisfied from cache eliminates the need for a disk access, reducing disk load. Typically, it would be good practice set the read cache to roughly 10% of available cache; 200 MB is the recommended minimum, and 1024 is the recommended maximum. For block-only VNX systems, the minimum can be set to 100 MB.

The initial read cache settings that EMC recommends for the different VNX models have been discussed in the table below:

EMC VNX Model Initial Read Cache Setting (in MB)
VNX5100 100
VNX5300 400
VNX5500 700
VNX5700 1024

If the workload exhibits a “locality of reference” behavior, where a relatively small set of data is accessed frequently and repeatedly, the read cache can improve performance. In read-intensive environments, where more than 70 percent of all requests are reads, the read cache should be large enough to accommodate the dataset that is most frequently accessed. For sequential reads from a LUN, data that is expected to be accessed by subsequent read requests is read (prefetched) into the cache before being requested. Therefore, for optimal performance, the read cache should be large enough to accommodate prefetched data for sequential reads from each LUN. An improperly sized read-cache can increase direct disk reads and can hence, adversely impact storage system performance.
SPA_write_cache_size Indicates the current size of the write cache of Storage Processor A. MB Write cache serves as a temporary buffer where data is stored temporarily before it is written to the disks. Cache writes are far faster than disk writes. Also, write-cached data is consolidated into larger I/Os when possible, and written to the disks more efficiently. (This reduces the expensive small writes in case of RAID 5 LUNs.) Also, in cases where data is modified frequently, the data is overwritten in the cache and written to the disks only once for several updates in the cache. This reduces disk load. Consequently, the write cache absorbs write data during heavy load periods and writes them to the disks, in an optimal fashion, during light load periods. However, if the amount of write data during an I/O burst exceeds the write cache size, the cache fills. Subsequent requests must wait for cached data to be flushed and for cache pages to become available for writing new data. It is hence imperative that you rightly size the write cache and set cache watermarks appropriately. Cache watermarks control the flushing behavior of write cache. Given below are a few recommendations in this regard:

  • Start with low watermark of 60% and a high watermark of 80%. This is suitable for a majority of the workloads.
  • If frequent forced flushing occurs, reduce watermark values.
  • Maintain a difference of about 20% between the low and high watermarks.
  • Avoid drastic changes to these values unless advised by EMC Support.
SPA_free_mem_size Indicates the amount of physical memory of storage processor A that is currently unused. MB  
SPA_sys_buf Indicates the size of the system buffer of storage processor A. MB  
SPA_phy_sys_mem Indicates the total physical memory of storage processor A. MB  
SPB_read_cache_state Indicates the current state of the read cache of storage processor B.   If the read cache of the storage processor (SP) B is enabled, then this measure will report the value Enabled. If not, then, this measure will report the value Disabled.

The numeric values that correspond to each of the states discussed above are available in the table below:

Numeric Value State
1 Enabled
0 Disabled

Note:

By default, this measure reports the above-mentioned States to indicate the status of the read cache. The graph of this measure however, represents the cache status using the numeric equivalents - 0 or 1.
SPB_write_cache_state Indicates the current state of the write cache of storage processor (SP) B.   If the write cache of the storage processor (SP) B is enabled, then this measure will report the value Enabled. If not, then, this measure will report the value Disabled.

The numeric values that correspond to each of the states discussed above are available in the table below:

Numeric Value State
1 Enabled
0 Disabled

Note:

By default, this measure reports the above-mentioned States to indicate the status of the write cache. The graph of this measure however, represents the cache status using the numeric equivalents - 0 or 1.
SPB_cache_pages Indicates the total number of pages in the cache of Storage Processor B. Number For best performance, each Storage Processor (SP) should have the maximum amount of its memory in cache and should use the default settings for the cache properties. Therefore, ideally the number of memory pages in the cache should be high.
SPB_read_cache_size Indicates the current size of the read cache of Storage Processor B. MB The read cache holds data that is expected to be accessed in the near future. If a request for data that is in the cache arrives, the request can be serviced from the cache faster than from the disks. Each request satisfied from cache eliminates the need for a disk access, reducing disk load. Typically, it would be good practice set the read cache to roughly 10% of available cache; 200 MB is the recommended minimum, and 1024 is the recommended maximum. For block-only VNX systems, the minimum can be set to 100 MB.

The initial read cache settings that EMC recommends for the different VNX models have been discussed in the table below:

EMC VNX Model Initial Read Cache Setting (in MB)
VNX5100 100
VNX5300 400
VNX5500 700
VNX5700 1024

If the workload exhibits a “locality of reference” behavior, where a relatively small set of data is accessed frequently and repeatedly, the read cache can improve performance. In read-intensive environments, where more than 70 percent of all requests are reads, the read cache should be large enough to accommodate the dataset that is most frequently accessed. For sequential reads from a LUN, data that is expected to be accessed by subsequent read requests is read (prefetched) into the cache before being requested. Therefore, for optimal performance, the read cache should be large enough to accommodate prefetched data for sequential reads from each LUN. An improperly sized read-cache can increase direct disk reads and can hence, adversely impact storage system performance.

Since the read cache is not mirrored, to use the available storage processor memory efficiently, ensure that you allocate the same amount of read cache to both the storage processors - i.e., A and B.

SPB_write_cache_size Indicates the current size of the write cache of Storage Processor B. MB Write cache serves as a temporary buffer where data is stored temporarily before it is written to the disks. Cache writes are far faster than disk writes. Also, write-cached data is consolidated into larger I/Os when possible, and written to the disks more efficiently. (This reduces the expensive small writes in case of RAID 5 LUNs.) Also, in cases where data is modified frequently, the data is overwritten in the cache and written to the disks only once for several updates in the cache. This reduces disk load. Consequently, the write cache absorbs write data during heavy load periods and writes them to the disks, in an optimal fashion, during light load periods. However, if the amount of write data during an I/O burst exceeds the write cache size, the cache fills. Subsequent requests must wait for cached data to be flushed and for cache pages to become available for writing new data. It is hence imperative that you rightly size the write cache and set cache watermarks appropriately. Cache watermarks control the flushing behavior of write cache. Given below are a few recommendations in this regard:

  • Start with low watermark of 60% and a high watermark of 80%. This is suitable for a majority of the workloads.
  • If frequent forced flushing occurs, reduce watermark values.
  • Maintain a difference of about 20% between the low and high watermarks.
  • Avoid drastic changes to these values unless advised by EMC Support.

Since the write cache is mirrored, the write cache allocation applies to both the storage processors - i.e., A and B.

SPB_free_mem_size Indicates the amount of physical memory of storage processor B that is currently unused. MB  
SPB_sys_buf Indicates the size of the system buffer of storage processor B. MB  
SPB_phy_sys_mem Indicates the total physical memory of storage processor B. MB  
SP_read_cache Indicates whether the read cache of the storage processor is enabled or not.   If the read cache of the storage processor (SP) is enabled, then this measure will report the value Enabled. If not, then, this measure will report the value Disabled.

The numeric values that correspond to each of the states discussed above are available in the table below:

Numeric Value Measure Value
1 Enabled
0 Disabled

Note:

By default, this measure reports the above-mentioned Measure Values to indicate the status of the read cache. The graph of this measure however, represents the cache status using the numeric equivalents - 0 or 1.
SP_write_cache Indicates whether the write cache of the storage processor is enabled or not.   If the write cache of the storage processor (SP) is enabled, then this measure will report the value Enabled. If not, then, this measure will report the value Disabled.

The numeric values that correspond to each of the states discussed above are available in the table below:

Numeric Value Measure Value
1 Enabled
0 Disabled

Note:

By default, this measure reports the above-mentioned Measure Values to indicate the status of the write cache. The graph of this measure however, represents the cache status using the numeric equivalents - 0 or 1.
Cache_page_size Indicates the number of pages currently in cache. Number To service I/O requests faster, to reduce disk overloads, and to eliminate disk abuse, the read/write caches should be sized with sufficient memory pages.

Cache page size determines the minimum amount of storage processor memory used to service a single I/O operation. Given below are some guidelines to right-size your cache:

  • Default of 8KB is fine for majority of workloads.

    • In mixed environments, this default provides a good balance.
    • Leave at 8KB for unified configurations with both Block and File-only configurations
  • Increase to maximum 16 KB if large-block I/O size is predominant in the environment.
  • With predominant small-block access, like 2 KB and 4 KB database environments, match cache page size to the predominant I/O size.
Write_cache_mirrored Indicates the write cache mirrored status.   Each storage processor (SP) has a write cache in its memory, which mirrors the write cache on the other SP. Because these caches mirror each other, they are always either enabled or disabled, and always the same size. On powerup, a storage system automatically enables the write cache on each SP if the write cache size is non-zero.

Using this measure, you can determine whether the write cache of both SPs is currently enabled/disabled.

If the write cache is disabled, then this measure will report the value Enabled. If not, the measure will report the value Disabled.

The numeric values that correspond to each of the states discussed above are available in the table below:

Numeric Value Measure Value
1 Enabled
0 Disabled

Note:

By default, this measure reports the above-mentioned Measure Values to indicate the status of the write cache. The graph of this measure however, represents the cache status using the numeric equivalents - 0 or 1.