eG Monitoring
 

Measures reported by NutAHVConCacheTest

The Content Cache (aka “Elastic Dedupe Engine”) is a deduped read cache which spans both the CVM’s memory and SSD. Upon a read request of data not in the cache (or based upon a particular fingerprint) the data will be placed in to the single-touch pool of the content cache which completely sits in memory where it will use LRU until it is ejected from the cache. Any subsequent read request will “move” (no data is actually moved, just cache metadata) the data into the memory portion of the multi-touch pool which consists of both memory and SSD. From here there are two LRU cycles, one for the in-memory piece upon which eviction will move the data to the SSD section of the multi-touch pool where a new LRU counter is assigned. Any read request for data in the multi-touch pool will cause the data to go to the peak of the multi-touch pool where it will be given a new LRU counter.

If the content cache is not sized right, then the cache will not be able to hold frequently accessed data, and will hence be unable to service many read requests. This could increase direct disk accesses and related overheads, thereby degrading overall storage performance. To ensure peak storage performance therefore, the usage of the cache should be continuously monitored, cache misses should be promptly captured, and the reasons for the same should be diagnosed. This is exactly what the NutAHVConCacheTest test helps perform.

This test closely monitors the content cache, tracks the cache hit ratio, and alerts administrators if the ratio dips below acceptable limits. In addition, the test also monitors how the cache memory is utilized in the single-touch and multi-touch pools, thus pointing administrators to sizing deficiencies that could be contributing to the high rate of cache misses (if any). Using the pointers provided by this test, administrators can right size the cache and improve cache and overall storage efficiency.

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
CACHE_HIT Indicates the number of times read requests were served from this cache during the last measurement period. Number Ideally, the value of this measure should be close to or equal to the value of the Cache lookups measure.
CACHE_HIT_PERC Indicates the ratio of cache hits to cache lookups. Number Ideally, the value of this measure should be over 80%. If so, then it indicates that almost all read requests were served by the cache. This means that direct disk accesses and related processing overheads were minimal.

A value less than 50% signifies ineffective cache usage. In other words, most of the cache lookups did not result in cache hits. One of the key reasons for this could be poor cache size. If the cache does not have enough memory resources to hold data, it may not be able to service read requests. This will increase direct disk accesses, which are I/O-intensive operations.
PMEMORY_USAGE Indicates the amount of real memory that is consumed by the data in the content cache. MB A consistent increase in the value of this measure could mean that cache misses are high, owing to which new data is being continuously written to the cache. In the process, more memory is being consumed.
CACHE_LOOKUP Indicates the number of times the cache was looked up for serving read requests during the last measurement period. Number  
SMEMORY_USAGE Indicates the amount of content cache memory saved due to deduplication. MB Performance tier deduplication removes duplicate data in the content cache (SSD and memory) to reduce the footprint of an application’s working set. This enables more working data to be managed in the content cache. Therefore, higher the value of this measure, more significant will be the performance improvements.
LSSD_USAGE Indicates the logical SSD memory used to cache data without deduplication. MB  
LMEMORY_USAGE Indicates the logical memory used to cache data without deduplication. MB  
PSSD_USAGE Indicates the real SSD memory used to cache data. MB If data in the single-touch pool of the content cache is accessed, it is moved to the in-memory portion of the multi-touch pool. Here again, it follows an LRU cycle, based on which the ‘oldest objects’ in memory are identified and moved to the SSD portion of the multi-touch pool. If the data in SSD is accessed, it is moved to the top of the multi-touch pool, from where it will be served.

Any increase in the usage of the SSD portion of the multi-touch pool can be attributed to the addition of data that is not-so-frequently accessed. If the SSD is not sized right, then data will be discarded from the pool sooner than desired. In the absence of enough data, cache misses will increase, and so will the overheads of direct disk accesses.
SSSD_USAGE Indicates the memory saved in SSD owing to deduplication. MB Performance tier deduplication removes duplicate data in the content cache (SSD and memory) to reduce the footprint of an application's working set. This enables more working data to be managed in the content cache. Therefore, higher the value of this measure, more significant will be the performance improvements.