eG Monitoring
 

Measures reported by RabMQNodeGCTest

RabbitMQ server is written in the Erlang programming language. Each Erlang process has its own stack and heap which are allocated in the same memory block and grow towards each other. When the stack and the heap meet, the garbage collector is triggered and memory is reclaimed. If the garbage collector does not reclaim enough memory, the heap will grow to accomodate more data. If heap growth is not controlled by efficient garbage collection, it can degrade the performance of the RabbitMQ node, and consequently, slowdown cluster operations as well.

Using the RabMQNodeGCTest test, you can keep tabs on garbage collection activity on each node of a cluster and identify the node from which the least memory was reclaimed. When a cluster under-performs, you can use this test to figure out if the dip in cluster performance is owing to excessive heap growth on a node caused by inefficient garbage collection.

Outputs of the test : One set of results for each node in the monitored RabbitMQ Cluster.

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
diffGC Indicates the number of garbage collection operations that occurred on this node during the last measurement period. Number Compare the value of this measure across nodes to identify the node on which garbage collection has happened very often. Such a node could have experienced rapid and abnormal heap growth, thus triggering garbage collection frequently. You may want to investigate the reasons for heap growth on that node. Typically, heaps grow in two stages, first a variation of the Fibonacci sequence is used starting at 233 words. Then at about 1 mega words the heap only grows in 20% increments. There are two occasions when the heap grows:

  • If the total size of the heap + message and heap fragments exceeds the current heap size;

  • If after a fullsweep, the total amount of live objects is greater than 75%

Either way, you may want to resize the heap to avoid frequent garbage collections. This is because, every time garbage collection happens, the garbage collector must suspend the execution of the node to ensure the integrity of the object trees. The more live objects are found, the longer the suspension, which has a direct impact on response time and throughput. This in turn may impact overall cluster performance as well.

diffGCRate Indicates the rate at which garbage collections take place on this node. Operations/Sec A high value is indicative of frequent garbage collections. This could be owing to rapid and significant heap growth on the node. Frequent garbage collections on a node may degrade its performance. To avoid this, you may want to consider resizing the heap on that node.
diffGCBytesReclaimed Indicates the memory reclaimed by the garbage collector on this node. MB Compare the value of this measure across nodes to know on which node the garbage collector reclaimed the maximum memory and on which node the least memory was reclaimed.
diffGCBytesReclaimedRate Indicates the rate at which memory was reclaimed by the garbage collector on this node. MB/Sec  
diffContextSwitchOperati Indicates the rate at which runtime context switching takes place on this node. Operations/Sec