eG Monitoring
 

Measures reported by IgnitePerStaTest

Ignite Persistence, or Native Persistence, is a set of features designed to provide persistent storage. When it is enabled, Ignite always stores all the data on disk, and loads as much data as it can into RAM for processing.

Persistence ensures that all the data in the RAM is backed-up and can be used if the cache fails or for distribution to downstream systems. That is the reason it is important to regularly monitor persistence, so that administrator have full idea if there is any issue and can re-mediate in timely manner.

This tests monitors the persistence storage and collects key metrics like- Memory allocated size, Write-Ahead Logging archive segments, page reads etc, which can help administrator make informed decisions about storage's health.

Outputs of the test: One set of results for each Apache Ignite Server

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
pagesRead Indicates the number of pages read since last restart. Number

 

pagesReplaced Indicates the number of pages replaced since last restart. Number

Too many pages replaced may not be ideal for performance.

pagesWritten Indicates the number of pages written since last restart. Number

 

dirtyPages Indicates the number of pages in memory that have been changed but not yet synchronized to disk. Those will be written to disk during next checkpoint. Number

If the number of dirty pages is too high, it is too risky if the cache goes down there will be high data loss.

usedCheckpointBufferPage Indicates the total number of checkpoint buffer pages used since last restart. Number

 

offHeapSize Indicates the total memory in MB allocated in off-heap cache. MB

Off-heap storage doesn't have memory management so it is fast, but the allocation should be optimal.

offheapUsedSize Indicates the total memory in MB used in off-heap cache. MB

 

checkpointBufferSize Indicates the total size of checkpoint buffer. MB

A low value is desired for this measure.

usedCheckpointBufferSize Indicates the total size of checkpoint buffer which is currently in use. MB

You need to take an action if the used checkpoint reaches near the total size of checkpoint buffer.

lastCheckpointDuration Indicates the total time it took to create the last checkpoint. Seconds

 

totalAllocatedSize Indicates the size of the space allocated on disk for the entire data storage (in MB). Note that when Native persistence is disabled, this metric shows the total size of the allocated space in RAM. MB

This metrics should be actively monitored and administrators should ensure that there is always enough memory available to support the applications and users.

walArchiveSegments Indicates the number of WAL segments in the archive. Number

The write ahead logs should be actively manages and cleared when not required.

walBuffPollSpinsRate Indicates the WAL buffer poll spins number over the last time interval. Number/sec

 

walFsyncTimeAverage Indicates the total duration of write-ahead logging fsync. Microseconds

 

walLoggingRate Indicate the average number of WAL records per second written during the last time interval Records /sec

 

walTotalSize Indicates the total size of the WAL files in MB, including the WAL archive files. MB

The write ahead log size should be optimal for the performance of the system.

walWritingRate Indicates the average number of MB per second written during the last time interval. MB/sec

Ideally this number should be high, but if the write rate is going down over ht measurements then it may be a matter of concern.