eG Monitoring
 

Measures reported by BobiLSPerfTest

One of the key factors influencing the performance of the Lumira server is the usage of its JVM memory heap. This is because, Lumira server is a pure Java based process, configured with an initial amount of java heap size. Naturally therefore, the lack of adequate free memory to the JVM, faulty and frequent garbage collections, and JVM deadlocks can all have an adverse impact on the health of the Lumira Server. Likewise, if critical services hosted on the Lumira Server are not correctly configured to handle the requests they receive, then again Lumira Server performance will degrade. This is why, the eG agent periodically runs the BobiLSPerfTest. This test enables administrators measure JVM health and the correctness of the configuration of the critical Lumira services, thus helping them rapidly detect dips in Lumira Server performance and the possible reasons for it.

Outputs of the test : One set of results for Lumira server running in the node monitored.

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
freeJVMMemory Indicates the amount of memory available to the JVM for allocating new objects. GB Ideally, the value of this measure should be high.
pctFreeMem Indicates the percentage of memory available to the JVM for allocating new objects. Percent A value close to 100% is a cause for concern, as it indicates rapid erosion of the JVM memory heap. Without sufficient memory, the Lumira Server and its services will not be able to operate optimally.
cpuUsagePCT Indicates the percentage of time the CPU was used by the Lumira Server during the last 5 mins. Percent This measure considers all processors allocated to the JVM.

A value close to 100% indicates excessive CPU usage, probably owing to CPU-intensive operations performed on the JVM. If more processing power is not allocated to the JVM, the Lumira Server may hang.
stoppedSystemPCT Indicates the percentage of time that Lumira services were stopped for Garbage Collection in the last 5 minutes. Percent A critical stage of garbage collection requires exclusive access and all Lumira services are halted at this time. This value should always be less than 10. 10 and above indicates a low throughput issue and requires further investigation.
pageFaultsDuringGC Indicates the number of page faults that occurred while garbage collection was running during the last five minutes. Number Any value greater than 0 indicates a system under heavy load and low memory conditions.
fullGC Indicates the rate of full garbage collections performed in the last 5 minutes. GCs/second A rapid increase in this value may indicate a system under low memory conditions.
jvmLockContention Indicates the current number of JVM lock contentions. Number This represents the number of synchronized objects that have threads that are waiting for access. The average value of this measure should be 0. Consistently higher values indicates threads that will not run again. You may want to take a thread dump to investigate such issues.
jvmDeadLockedThreads Indicates the number of threads that are deadlocked. Number These threads are indefinitely waiting on each other for a common set of resources. Average value should be 0. Consistently higher values warranties further investigation using thread dumps.
sessionCount Indicates the number of active sessions to the design studio. Number Design Studio is an application for building executive dashboards.
totalJVMMemory Indicates the amount of total memory available to the JVM for allocating new objects. GB Ideally, the value of this measure should be high.
currAuditEventInQueue Indicates the number of auditing events that the Lumira Server has recorded, but which have not yet been retrieved by the CMS Auditor. Number If this number increases without bound, it could mean indicate that auditing has not been configured properly or that the system is heavily loaded and generating auditing events faster than the auditor can retrieve them. When stopping servers, It is advisable to disable them first and wait for auditing events to be fully retrieved and this queue becomes empty. Otherwise, they may be retrieved only when this server has been restarted and the CMS polls for them.