| Measurement |
Description |
Measurement Unit |
Interpretation |
| spiState |
Indicates the current state of Service Provider Interface. |
Boolean |
If the state of SPI is not healthy, the discovery and communication between the nodes will stop. Administrators need to ensure that SPI is working optimally before starting the cluster.
|
| nodesFailed |
Indicates the number of nodes which are in failed state. |
Number |
If there are too many failed nodes in the cluster, which are still there in cluster config, it may slow down the cluster startup.
|
| nodesJoined |
Indicates the number of nodes joined since the cluster is started. |
Number |
If the nodes are able to join and seamlessly integrating in the cluster, it is good sign that SPI is working fine.
|
| nodesLeft |
Indicates the number of nodes left since the cluster is started. |
Number |
If too many nodes have left the cluster recently, it may be needed to remove from config o therwise it will slow down the inter node communication.
|
| pendingMessageDiscarded |
Indicates the number of messages discarded because the target node could not be discovered. |
Number |
If there are too many pending messages which are discarded, it means some nodes have left the cluster but cluster config is not updated.
|
| pendingMessageRegistered |
Indicates the number of messages which are yet to be delivered to the target node. |
Number |
If there are too many pending messages which are not discarded yet, it means some nodes have left the cluster but cluster config is not updated.
|
| totalProcessedMessage |
Indicates the total number of messages processed per second through discovery SPI. |
Messages/Sec |
If this rate is going down over the range of measurements, you need to investigate the same.
|
| totalReceivedMessage |
Indicates the total number of messages received per second through discovery SPI. |
Messages/Sec |
A low value is desired for this measure.
|
| messageWorkerQueueSize |
Indicates the size of the queue of discovery messages that are waiting to be sent to other nodes. |
MB |
Worker queue size should be maintained at an optimal value.
|
| avgMessageProcessingTime |
Indicates the average time taken by each message to process through the system. |
Seconds |
Look at the trends and if the processing time is going up over a range of measurements, it would be a matter of concern.
|