eG Monitoring
 

Measures reported by SPCacheTest

SharePoint uses the Distributed Cache to store data for very fast retrieval across all entities. The Distributed Cache service provides in-memory caching services to several features in SharePoint Server 2013. Some of the features that use the Distributed Cache service include:

  • Newsfeeds

  • Authentication

  • pOneNote client access

  • Security Trimming

  • Page load performance

Besides services, several caches that exist in Sharepoint 2013 depend upon the Distributed Cache service for their proper functioning.

Any server in the farm running the Distributed Cache service is known as a cache host. A cache cluster is the group of all cache hosts in a SharePoint Server 2013 farm. A cache host joins a cache cluster when a new application server running the Distributed Cache service is added to the farm. When using a cache cluster, the Distributed Cache spans all application servers and creates one cache in the server farm. The total cache size is the sum of the memory allocated to the Distributed Cache service on each of the cache hosts.

If the distributed cache is not able to service requests efficiently, it is bound to significantly impact the performance of the dependent services/caches. Furthermore, it will add significantly to the processing overheads of Sharepoint, as poor cache usage translates into increased database accesses. If this is to be prevented, administrators should keep a close watch on the distributed cache's ability to service requests, rapidly detect poor cache usage patterns, and accurately pinpoint the reason for the same - is it because adequate objects are not cached in the distributed cache? If so, why? Is it owing to insufficient cache size? Will allocating more memory to the cache help or should more servers be added to the cache cluster? The SPCacheTest test helps answer all these questions! This test continuously monitors the requests to the cache, reports the count of requests serviced and rejected by the cache, and thus enables administrators to ascertain how well the cache is utilized. In the event of poor cache usage, close scrutiny of these test results will provide administrators with useful pointers to what is impeding cache usage and whether/not right-sizing the cache will help clear the bottleneck.

Output of the test : One set of results each for the Sharepoint server being monitored

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
data_rate Indicates the number of cached entries transferred per second. Number  
hit_count Indicates the number of requests serviced by the cache during the last measurement period. Number A high value is desired for this measure. A sudden/steady dip in this value indicates that the cache is unable to process requests, thereby increasing direct database accesses.
hit_ratio Indicates the percentage of requests that were serviced by the cache. Percent A high value is desired for this measure. A sudden/steady drop in this value is indicative of poor cache usage, which in turn can cause direct database accesses to increase and strain the database.

One of the common reasons for a low cache hit ratio is insufficient memory allocation to the cache. In the absence of adequate memory resources, the cache may not be able to hold many frequently-accessed objects within, and may hence not be able to service many requests. Under such circumstances, you may want to consider allocating more memory to the cache. Here are a few recommendations from Microsoft with regard to how to size the distributed cache:

  • The Distributed Cache service actually uses twice the allocated amount of RAM, using the extra for housekeeping. In a small farm with fewer than 10,000 users, Microsoft recommends allocating 1GB of RAM for the Distributed Cache. This can be either a dedicated server or collocated with other SharePoint services, such as the Web Application Service. Beyond this the recommendation is using dedicated servers for the cache. A medium farm with fewer than 100,000 users should look to allocate around 2.5GB for the cache, and a large farm with up to 500,000 users should set aside around 12GB of RAM allocated for the cache.

  • It is a very strong recommendation that you should not allocate more than 16GB to any one Cache Host. This may cause the Cache Service to timeout during housekeeping operations and become unresponsive for several seconds at a time. If you need a cache size of greater than 16GB, it is better to use multiple servers in a Cache Cluster. You can have up to a maximum of 16 hosts in a Cache Cluster.

miss_count Indicates the number of requests that were not serviced by the cache since the last measurement period. Number Ideally, the value of this measure should be low. A sudden/steady increase in this value is indicative of poor cache usage, which in turn can cause direct database accesses to increase and strain the database.
read_request Indicates the number of read requests to the cache per second, during the last measurement period. Number A high value for these measures is often indicative of heavy load on the distributed cache.

In such a situation, for better cache performance, it is recommended that you opt for the dedicated mode of cache deployment. In this mode, all services other than the Distributed Cache service are stopped on the application server that runs the Distributed Cache service, thus ensuring that all critical resources on the server are at at the disposal of the distributed cache. This in turn, will help the cache handle the load efficiently!
write_request Indicates the number of write requests to the cache per second, during the last measurement period. Number
total_read Indicates the total number of read requests received by the cache since the last measurement period. Number
total_write Indicates the total number of write requests received by the cache since the last measurement period. Number