eG Monitoring
 

Measures reported by MongoThreadTest

Threads are the ‘work horses’ of a MongDB server. Every job that the server does for an application - reading pages, writing pages, eviction etc. - are performed by application threads. Administrators need to be mindful of the usage of these threads, as abnormal usage is considered to be the herald of an overload condition or a potential contention on the server. To monitor thread usage and proactively detect such problem conditions, administrators can use the Mongo Thread Statistics test. This test tracks the usage of application threads and reports the count of threads used for various activities. This way, the test points to those activities in which the maximum number of threads are actively engaged - is it fsync? reading? writing? In the event of a thread contention, these analytics will help administrators figure out where maximum threads are spent. Additionally, the test reveals how much time the threads take to perform cache eviction and how much time the cache waits for a thread to become available. If cache requests are not serviced quickly, these metrics will tell administrators why - is it because enough threads are not available to the cache?

Outputs of the test: One set of results for the MongoDB server monitored.

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
Active_threads_in_fsync Indicates the number of threads that were actively engaged in fsync operations during the last measurement period. Number

As applications write data, MongoDB records the data in the storage layer and then writes the data to disk within the syncPeriodSecs interval, which is 60 seconds by default. Run fsync when you want to flush writes to disk ahead of that interval.

Active_threads_in_read Indicates the number of threads that were actively engaged in read operations during the last measurement period. Number

If users to MongoDB complain of slowness, then you can compare the value of these two measures with that of the Active threads in fsync measure to know which operation is hogging threads - fsync? reads? or writes? In the event of a contention/slowness, these metrics will tell you which activities require more threads.

Active_threads_in_write Indicates the number of threads that were actively engaged in write operations during the last measurement period. Number

If users to MongoDB complain of slowness, then you can compare the value of these two measures with that of the Active threads in fsync measure to know which operation is hogging threads - fsync? reads? or writes? In the event of a contention/slowness, these metrics will tell you which activities require more threads.

Time_taken_fr_evict_thrd Indicates the time taken by threads to perform cache eviction. Seconds

A high value is a cause for concern, as it implies that the threads are taking too long to evict objects from the cache and free space in it. This could be because adequate threads are not engaged in eviction. Where a WiredTiger storage engine is used, to make sure eviction is smooth and quick, you may want to fine-tune one/more of these parameters:

  • The threads_min and threads_max parameters for the ‘eviction’ operation can be increased, so that more threads perform eviction, thereby reducing time taken to evict.

  • Increase the eviction_target, so that worker threads start evicting pages from the cache a lot later; until such time, worker threads will be available for other operations. This can ease thread contention.

  • Increase the eviction_trigger, so that application threads are not called into the eviction process soon. This releases application threads, so they are available to perform other operations. This again can reduce thread contention.

Wait_time_for_cache Indicates the time the cache kept waiting for a thread to become available, so that requests to the cache can be serviced. Seconds

A high value for this measure is a cause for concern, as it implies that there are not enough free threads to service cache requests. In such a situation, you may want to compare the value of the Active threads in fsync, Active threads in read, and Active threads in write measures to know where most threads are stuck. If these measures do not report abnormal values, then check the value of the he Time taken for evicting threads measure. If this measure reports an abnormally high value, then cache eviction could be the bottleneck.