| Measurement |
Description |
Measurement Unit |
Interpretation |
| Redis_Status |
Indicates the current status of this cache. |
|
The values reported by this measure and its numeric equivalents are mentioned in the table below:
| Measure Value |
Numeric Value |
| Succeeded |
1 |
| Updating |
2 |
| Error |
3 |
| Unknown |
0 |
Note:
By default, this measure reports the Measure Values listed in the table above to indicate the current provisioning status. In the graph of this measure however, the same is represented using the numeric equivalents only.
Use the detailed diagnosis to know the host name, port, and version of the cache. |
| Latency |
Indicates the time taken by this cache to respond to requests. |
Seconds |
A high value is indicative of a slow cache. This can adversely impact application performance. |
| Cmd_prcsd_per_sec |
Indicates the rate at which commands were processed by this cache. |
Commands/second |
A consistent rise in the value of this measure is a sign of good health. On the other hand, a steady drop in the value of this measure hints at processing bottlenecks. In such a situation, look up the Memory used and CPU usage measures, and cache reads and writes to figure out if there are any resource contentions. A poorly sized cache can often be sluggish when responding to requests. You may want to consider increasing the resource allocations/limits for the cache, so that the cache has more processing power at its disposal. |
| Hit_rate |
Indicates the ratio of failed cache lookups to the total number of requests received by this cache. |
Percent |
A value less than 80% is a cause for concern, as it implies that the cache has failed to service a large majority of requests to it. In such a situation, check the value of the Server load measure to see if there is any abnormal increase in load, causing the cache server to timeout without completing requests. This can cause cache misses. You can also check the Memory used and Memory fragmentation ratio measures to see if the cache has sufficient memory for storing data. Memory contention on the cache is one of the common causes for poor cache performance. |
| Mem_used |
Indicates the amount of memory used by this cache. |
MB |
If the value of this measure is close to that of the Maximum memory measure, it means that the cache is about to exhaust its memory allocation. Without enough memory, the cache will not be able to store data. This can result in cache misses, which in turn will affect application performance. To avoid a memory contention therefore, consider the following:
You can implement a cluster setup with multiple Redis nodes to enhance the memory capacity of the cache.
If the cache cluster is already in place, then you may want to reduce the amount of memory that is available for non-cache operations, so that more memory is available for caching. For that, reduce the maxmemory-reserved setting of the cluster.
The maxfragmentationmemoryreserved setting configures the amount of memory, in MB per instance in a cluster, that is reserved to accommodate for memory fragmentation. When you set this value, the Redis server experience is more consistent when the cache is full or close to full and the fragmentation ratio is high. When memory is reserved for such operations, it's unavailable for storage of cached . data. By reducing the value of this setting, you can make sure that more memory is available for caching.
Change the eviction policy
Ideally therefore, the value of this measure should be 0. |
| Mem_frag_ratio |
Indicates the percentage of memory in this cache that is fragmented. |
Percent |
Fragmentation is likely to be caused when a load pattern is storing data with high variation in size. For example, fragmentation might happen when data is spread across 1 KB and 1 MB in size. When a 1-KB key is deleted from existing memory, a 1-MB key cannot fit into it causing fragmentation. Similarly, if 1-MB key is deleted and 1.5-MB key is added, it cannot fit into the existing reclaimed memory. This causes unused free memory and results in more fragmentation.
The fragmentation can cause issues when:
Memory usage is close to the max memory limit for the cache, or
Used memory is higher than the Max Memory limit, potentially resulting in page faulting in memory
|
| Evicted_keys |
Indicates the number of keys that have been evicted from this cache. |
Number |
If the cache is under severe memory pressure, you will want to see this measure report a high value. To increase the number of keys evicted, you may want to change the eviction policy. The default policy for Azure Cache for Redis is volatile-lru, which means that only keys that have a TTL value set are eligible for eviction. If no keys have a TTL value, then the system won't evict any keys. If you want the system to allow any key to be evicted if under memory pressure, then you may want to consider the allkeyslru policy. |
| Blocked_clients |
Indicates the number of client connections to this cache that were blocked. |
Number |
Ideally, the value of this measure should be 0. |
| Conctd_clients |
Indicates the number of clients currently connected to this cache. |
Number |
The maxclients setting governs the maximum number of connected clients allowed at the same time. If the value of this measure is equal to the maxclients setting, Redis closes all the new connections, returning a ‘max number of clients reached’ error. |
| Conctd_slaves |
Indicates the number of slaves connected to this cache currently. |
Number |
|
| Last_intrctn_tym |
Indicates the time since the master and slave last interacted. |
Seconds |
A high value for this measure could be a sign that there are issues in masterslave communication. If these issues persist, failover attempts may fail, and the whole purpose of an HA configuration for the cache will be defeated. Moreover, if slaves do not communicate with the master, then delays in data replication will become inevitable. Timely replication of data between the master and slaves is key to ensuring that the data replicas are always in sync with the master. If they are not, then the slaves may not be able to service cache requests effectively, when the master is down. |
| Total_keys |
Indicates the number of keys in this cache's database. |
Seconds |
|
| Rdb_last_save_time |
Indicates when data in this cache was written to disk last. |
Number |
|
| Rdb_chngs_snc_lst_sve |
Indicates the number of changes that have been written to this cache's database since the last database update happened. |
Number |
|
| Rejectd_conctn |
Indicates the number of connections to this cache that were rejected. |
Number |
Ideally, the value of this measure should be 0. However, if this measure reports a non-zero value consistently, it could be because the maxclients setting is not set commensurate to the connection load on the cache.
The maxclients setting governs the maximum number of connected clients allowed at the same time. If the value of the Connected clients measure is equal to the maxclients setting, then new connections are closed/rejected. |
| Key_misses |
Indicates the number of failed key lookups in this cache. |
Number |
A healthy, optimal cache is one that is capable of servicing all requests to it. Ideally therefore, the value of this measure should be 0. A high value for this measure indicates that the cache has failed to service requests to it. In such a situation, check the value of the Server load measure to see if there is any abnormal increase in load, causing the cache server to timeout without completing requests. This can cause cache misses. You can also check the Memory used and Memory fragmentation ratio measures to see if the cache has sufficient memory for storing data. Memory contention on the cache is one of the common causes for poor cache performance. |
| master_link_down |
Indicates the duration for which the link between the master and slave was down. |
Seconds |
A high value is a cause for concern. The longer the link is down, the longer replication will be delayed. Also, failover attempts will also fail during this period, thus rendering the cache unavailable for servicing requests. |
| Mem_used_rss |
Indicates the amount of memory used in resident set size. |
MB |
|
| Key_hits |
Indicates the number of successful key lookups in this cache. |
Number |
A high value is desired for this measure. |
| Miss_rate |
Indicates the percentage of key lookups in this cache that failed. |
Percent |
A value close to 100% is a cause for concern, as it implies that the cache has failed to service almost all of the requests to it. In such a situation, check the value of the Server load measure to see if there is any abnormal increase in load, causing the cache server to timeout without completing requests. This can cause cache misses. You can also check the Memory used and Memory fragmentation ratio measures to see if the cache has sufficient memory for storing data. Memory contention on the cache is one of the common causes for poor cache performance. |
| Max_conn |
Indicates the maximum number of simultaneous connections that this cache is allowed to entertain. |
Number |
If the value of the Connected clients measure is equal to that of this measure, Redis closes all the new connections, returning a ‘max number of clients reached’ error. |
| Total_cmd |
Indicates the total number of commands processed by this cache. |
Number |
|
| Server_load |
Indicates the current load on this cache server. |
Percent |
High server load means the Redis server is busy and unable to keep up with requests, leading to timeouts.
Following are some options to consider for high server load.
Scale out to add more shards, so that load is distributed across multiple Redis processes. Also, consider scaling up to a larger cache size with more CPU cores.
Avoid client connection spikes
Identify and eliminate long running commands;
If your Azure Cache for Redis underwent a failover, all client connections from the node that went down are transferred to the node that is still running. The server load could spike because of the increased connections. You can try rebooting your client applications so that all the client connections get recreated and redistributed among the two nodes.
|
| Gets |
Indicates the number of get operations from this cache. |
Number |
|
| Sets |
Indicates the number of set operations from this cache. |
Number |
|
| Cache_reads |
Indicates the rate at which data was read from this cache. |
KB/Second |
These measures are good indicators of the bandwidth used by the cache. Whenever there is a bandwidth contention, you can compare the value of these measures to know where the maximum bandwidth is spent - when reading from the cache? or when writing to it? |
| Cache_Writes |
Indicates the rate at which data was written to this cache. |
KB/Second |
| Cpu_usage |
Indicates the percentage of CPU resources utilized by this cache. |
Percent |
A value close to 100% indicates excessive CPU usage. This can adversely impact cache performance. You may want to determine the rootcause of this excess, so that it can be removed and normalcy restored to the cache. |