Measures reported by MongoBgFlushingTest
By default, MongoDB instances using the MMAPv1 storage engine write in-memory changes to disk every 60 seconds. If Journaling is enabled, then the MongoDB server writes changes to the on-disk journal first. These changes will then flush from journal to the disk. If the server crashes before all the changes are flushed to the disk, then journaling ensures that the changes can still be recovered from the journal. However, where Journaling is not enabled, if changes in memory are not flushed to the disk quickly, then the sudden terminaton of the MongoDB server can result in loss of critical data. If this is to be avoided, then administrators should be able to proactively detect any potential slowness in background flushing and promptly initiate measures to pre-empt the data loss than an server crash can cause. The MongoBgFlushingTest test helps with this.
This test tracks flushes to disk and reports the average time taken by the monitored server to flush writes in memory to disk. In the process, the test proactively alerts administrators to slowness in disk writes. The test also reports the duration of the last disk write, thus enabling administrators to figure out when the slowness could have occurred - did it creep in recently? or has it been persistent?
Note:
This test reports metrics only for those MongoDB instances that use the MMAPv1 storage engine.
Outputs of the test : One set of results for the target Mongo database server being monitored.
The measures made by this test are as follows:
| Measurement |
Description |
Measurement Unit |
Interpretation |
| Flushes |
Indicates the number of times the monitored server flushed writes to disk. |
Number |
|
| Flushes_rate |
Indicates the rate at which writes were flushed to disk. |
Flushes/Sec |
A high value is desired, as it indicates that changes are written to disk frequently. Frequent disk writes will minimize the data loss that may occur if the server abnormally exits. |
| Avg_flush_time |
Indicates the average time taken by the server to flush writes to disk. |
Seconds |
If the value of this measure is over 1000 milliseconds, it is a cause for concern, as it indicates that writing to disk is taking a long time. The most common causes for high flush time are:
Normally a flush should not take more than 1000 ms, if it does, it is likely the amount of data flushing to disk is too large for the disk to handle. With journaling enabled, which is standard on production MongoDB service, write operations go to journal file on disk and take away valuable disk I/O needed for flushing and page fault. To resolve this issue, you can do one or all of the following:
Upgrade to disk with higher IOPS, e.g. SSD / Flash Array.
Separate journal file and data file onto separate drives, to free up disk I/O taken by journal file.
Spikes in background flush occur when a large amount of dirty pages needed to flush to disk. Tune the application to spot for errors, or disburse writes to a longer time span.
|
| Last_flush_time |
Indicates the time taken by the last disk write. |
Seconds |
If this value is close to the Avg_flush_time, it could indicate that the slowness occurred during the last flush. Investigating the last flush can provide pointers to why it took more time than normal. |
|