eG Monitoring
 

Measures reported by StormSupSumTest

The nodes that follow instructions given by the nimbus are called as Supervisors. A supervisor has multiple worker processes and it governs worker processes to complete the tasks assigned by the nimbus. A worker process will execute tasks related to a specific topology.

Using this test, administrators can identify the number of available CPU cores, the total memory used by the CPU and the available worker slots in the Supervisor node of the target Apache Storm. Any aberrant condition of memory usage will alert the administrators to take remedial measures before users start complaining. Administrators may schedule periodic reboots of the Supervisor nodes. By knowing that a specific node has been up for an unusually long time, an administrator may come to know that the scheduled reboot task is not working on a node. This test included in the eG agent also monitors the uptime of critical nodes.

Outputs of the test: One set of results for each Supervisor node in the target Apache Storm.

Descriptor: Supervisor node

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
slotsTotal Indicates the total number of slots in the Supervisor node. Number

 

slotsUsed Indicates the number of used slots in the Supervisor node. Number

A very low value is required for this measure.

slotsFree Indicates the number of free slots in the Supervisor node. Number

A very high value is required for this measure.

totalCpu Indicates the total number of CPU cores in the Supervisor node. Number

 

usedCpu Indicates the number of used CPU cores in the Supervisor node. Number

A very low value is required for this measure.

AvailCpu Indicates the number of available CPU cores in the Supervisor node. Number

A very high value is required for this measure.

AvailCpuPercent Indicates the percentage of available CPU in the Supervisor node. Percent

A value close to 100% is required for this measure.

totalMem Indicates the total size of memory in the Supervisor node. MB

 

usedMem Indicates the size of used memory in the Supervisor node. MB

A very low value is required for this measure.

AvailMem Indicates the size of available memory in the Supervisor node. MB

A very high value is required for this measure.

AvailMemPercent Indicates the percentage of available memory in the Supervisor node. Percent

A value close to 100% is required for this measure.

uptime Indicates the time period that the Supervisor node has been up since the last time this test ran. Hrs/Mins/Secs

If the Supervisor node has not been rebooted during the last measurement period and the agent has been running continuously, this value will be equal to the measurement period. If the Supervisor node was rebooted during the last measurement period, this value will be less than the measurement period of the test. For example, if the measurement period is 300 secs, and if the Supervisor node was rebooted 120 secs back, this metric will report a value of 120 seconds. The accuracy of this metric is dependent on the measurement period - the smaller the measurement period, greater the accuracy.

rebooted Indicates whether the Supervisor node has been rebooted during the last measurement period or not.  

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure Value Numeric Value
Yes 0
No 1

Note:

By default, this measure reports the Measure Values listed in the table above to indicate whether the Supervisor node has been rebooted during the last measurement period or not.

If this measure shows 1, it means that the Supervisor node was rebooted during the last measurement period. By checking the time periods when this metric changes from 0 to 1, an administrator can determine the times when this Supervisor node was rebooted.