eG Monitoring
 

Measures reported by CassCacheTest

Cassandra includes integrated caching and distributes cache data around the cluster. When a node goes down, the client can read from another cached replica of the data. The integrated architecture also facilitates troubleshooting because there is no separate caching tier, and cached data matches what is in the database exactly. The integrated cache alleviates the cold start problem by saving the cache to disk periodically. Cassandra reads contents back into the cache and distributes the data when it restarts. The cluster does not start with a cold cache.

The partition key cache is a cache of the partition index for a Cassandra table. Using the key cache instead of relying on the OS page cache decreases seek times. Enabling just the key cache results in disk (or OS page cache) activity to actually read the requested data rows, but not enabling the key cache results in more reads from disk.

To cache rows, if the row key is not already in the cache, Cassandra reads the first portion of the partition, and puts the data in the cache. If the newly cached data does not include all cells configured by user, Cassandra performs another read. The actual size of the row-cache depends on the workload. You should properly benchmark your application to get ”the best” row cache size to configure.

There are two row cache options, the old serializing cache provider and a new off-heap cache (OHC) provider. The new OHC provider has been benchmarked as performing about 15% better than the older option.

Typically, you enable either the partition key or row cache for a table.

If the caches are not sized appropriately, then, frequent disk accesses may happen which may cause severe disk overhead. To avoid this, administrators may need to size the caches appropriately and also figure out the cache that is infrequently used. The CassCacheTest test helps administrators in this regard!

This test auto-discovers the caches on the target database server. For each cache discovered, this test reports the maximum cache size and how much of this size is presently occupied by cached data; this reveals, whether/not the cache has enough RAM to hold additional data. Inconsistencies in cache sizing can be detected in the process and their impact on performance analyzed. This test also throws light on how well the cache services requests. Using this test, administrators can figure out the cache from which the least requests have been serviced and analyze the real reason behind such poor responsiveness.

Ouputs of the test: One set of results for the target Cassandra Database node that is being monitored.

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
Total_size Indicates the total size of this cache i.e., the total space allocated to this cache. MB  
Used_size Indicates the amount of space that is already utilized in this cache. MB Ideally, this value should be well below the Total size measure. A steady and significant rise in this value could mean that cached data is not been written to the disks frequently and/or least-used data is not been evicted properly. You may want to fine-tune these operations and then check to see if it reduces cache memory usage.
Cache_usage Indicates the percentage of space that is already utilized in this cache. Percent A value close to 100% is a cause for concern, as it indicates that the cache is consuming RAM excessively. You may want to consider resizing the cache to avoid direct disk reads.
Hit_count Indicates the rate at which the requests were serviced by this cache during the last measurement period. Hits/sec  
Hit_rate Indicates the requests serviced by this cache without having to read from the disk during the last measurement period. Percent A value of 85% or more is desired for this measure. Temporary dips below this number are expected directly after a large bulk update, but if you stay here longer term this can be a problem of data modeling issues or configuration problems, such as JNA is not correctly installed, and therefore the keycache is residing on heap. In theory, when combined with heap pressure this can end up over flushing the cache.
Requests Indicates the rate at which cache requests were serviced by this cache during the last measurement period. Requests/sec A high value is desired for this measure.