eG Monitoring
 

Measures reported by MongoCacheTest

With WiredTiger, MongoDB utilizes both the WiredTiger internal cache and the filesystem cache.

Starting in 3.4, the WiredTiger internal cache, by default, will use the larger of either:

  • 50% of RAM minus 1 GB, or

  • 256 MB

This internal cache should be sized such that the working set of your application fits into it. If the internal cache is poorly sized or if the working set outgrows the cache, then, the cache will be unable to hold additional data, thereby increasing expensive disk reads.

Likewise, if changes to cached data are not written to the disk fast enough, it can cause the cache size to grow, leaving little room for additional data; once again, direct disk reads become inevitable, degrading database performance.

Additionally, if stale data is not evicted from the cache in a timely manner, it can increase cache size and consequently disk reads.

On the other hand, if the internal cache is set too high, then very little RAM will be left outside of this cache for aggregations, sorting, connection management, and the like. If there is insufficient RAM for these operations, then MongoDB can get killed by the OS Out of memory (OOM) killer! Also, over-sizing the internal cache will considerably reduce the free memory that will otherwise be available to the filesystem cache. This can also adversely impact performance!

This is why, administrators should continuously monitor the RAM usage of the WiredTiger internal cache, proactively detect excessive RAM usage by the cache, and accurately isolate its root-cause. The MongoCacheTest helps with this.

This test reports the maximum cache size and how much of this size is presently occupied by cached data; this reveals, whether/not the cache has enough RAM to hold additional data. Inconsistencies in cache sizing can be detected in the process and their impact on performance analyzed. Writes from cache to disk and cache evictions are also monitored, so that administrators can quickly detect bottlenecks in these processes and initiate measures to fine-tune these processes to curb cache growth.

Outputs of the test : One set of results for the MongoDB server monitored.

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
Cache_used_value Indicates the percentage of the maximum cache size that is used by the cache. Percent A value close to 100% is a cause for concern, as it indicates that the cache is consuming RAM excessively. You may want to consider resizing the cache to avoid direct disk reads. To adjust the size of the WiredTiger internal cache, use the e storage.wiredTiger.engineConfig.cacheSizeGB parameter.

Avoid increasing the WiredTiger internal cache size above its default value, as this may erode the memory resources required by the filesystem cache and other critical MongoDB operations.
Dirty_cache_value Indicates what percentage of the Maximum cache size is used by dirty data. Percent Dirty data designates data in the cache that has been modified but not yet applied (flushed) to disk. A steady and significant growth in this percentage represents a bottleneck, because it means that cached data is not being written to the disk fast enough.

When writing to disk, WiredTiger writes all the data in a snapshot to disk in a consistent way across all data files. The now-durable data acts as a checkpoint in the data files. The checkpoint ensures that the data files are consistent up to and including the last checkpoint; i.e. checkpoints can act as recovery points.

Using WiredTiger, even without journaling, MongoDB can recover from the last checkpoint; however, to recover changes made after the last checkpoint, run with journaling.

By default, MongoDB sets checkpoints to occur in WiredTiger on user data at an interval of 60 seconds or when 2 GB of journal data has been written, whichever occurs first.

This means that the amount of dirty data is expected to grow until the next checkpoint.

Scaling out by adding more shard will help you reduce the amount of dirty data.
Used_cache Indicates the amount of RAM used by the cached data. MB Ideally, this value should be well below the Maximum cache size. A steady and significant rise in this value could mean that cached data is not been written to the disks frequently and/or least-used data is not been evicted properly. You may want to fine-tune these operations and then check to see if it reduces cache memory usage.
Dirty_cache Indicates the amount of dirty data in the cache. MB A consistent increase in this value is a cause for concern.
Data_evicted_from_cache Indicates the rate at which data was evicted from cache. MB/Sec  
Data_read_into_cache Indicates the rate at which data was read into cache from disk. MB/Sec  
Data_written_from_cache Indicates the rate at which data was written from cache into disk. MB/Sec  
Pages_evicted_from_cache Indicates the rate at which pages were evicted from cache. Pages/sec A high value is desired for this measure. If the Cache used ratio is very high and the value of this measure is very low, it can only mean that data is not evicted frequently enough to control cache growth. You may have to fine-tune eviction to ensure that the cache does not grow uncontrollably.

Typically, when a MongoDB server approaches its maximum cache size, WiredTiger begins eviction to stop memory use from growing too large, approximating a least-recently-used algorithm. WiredTiger provides several configuration options for tuning how pages are evicted from the cache.

The eviction_trigger configuration value is the occupied percentage of the total cache size that causes eviction to start. By default, WiredTiger begins evicting pages when the cache is 95% full. An application concerned about a latency spike as the cache becomes full might want to begin eviction earlier.

The eviction_target configuration value is the overall target for eviction, expressed as a percentage of total cache size; that is, once eviction begins, it will proceed until the target percentage of bytes in the cache is reached. Note the eviction_target configuration value is ignored until eviction is triggered.

The eviction_dirty_target configuration value is the overall dirty byte target for eviction, expressed as a percentage of total cache size; that is, once eviction begins, it will proceed until the target percentage of dirty bytes in the cache is reached. Note the eviction_dirty_target configuration value is ignored until eviction is triggered.

By default, WiredTiger cache eviction is handled by a single, separate thread. In a large, busy cache, a single thread will be insufficient (especially when the eviction thread must wait for I/O). The eviction=(threads_min) and eviction=(threads_max) configuration values can be used to configure the minimum and maximum number of additional threads WiredTiger will create to keep up with the application eviction load.
Pages_read_into_cache Indicates the rate at which pages were read into cache from disk. Pages/sec A high value for this measure indicates effective usage of the cache.
Pages_written_from_cache Indicates the rate at which pages were written from cache into disk. Pages/sec Typically, at configured intervals, data modified in the cache is flushed into disk, so that data in cache and disk are in sync. Flushing also frees up memory in the cache and controls its abnormal growth. This is why, ideally, the value of this measure should be high.
Maximum_cache Indicates the maximum size of the cache that WiredTiger will use for all data. GB  
Page_faults Indicates the rate at which page faults requiring disk operations occurred. Faults/Sec Page faults refer to operations that require the database server to access data which isn’t available in active memory. The page faults counter may increase dramatically during moments of poor performance and may correlate with limited memory environments and larger data sets. Limited and sporadic page faults do not necessarily indicate an issue.

Note that this measure will be reported only if the target MongoDB server runs on Unix/Linux.