eG Monitoring
 

Measures reported by CassMsgTest

When a new Cassandra database node is added to an existing Cassandra database cluster,administrators transfer the data to the new node by executing the {{netstats}} command on the node from which data should be sent to the new node. This command helps in data transfer between the nodes when the node is joining the cluster. If the data transfer is not done completely or is partially completed, the data in the new node may be stale or outdated. When users access data from that particular node, the data may either be incorrect or the data may not be available at all. To alleviate such data transfer issues and ensure that the data in the new node joining the cluster is up to date, it is important to monitor the messages that are being transferred to each new node. The CassMsgTest test helps administrators in this regard!

This test auto-discovers the nodes that are joining the target Cassandra database node in a cluster and for each node, this test reports the rate at which large messages, small messages and gossip messages were transferred from the target database node. In addition, this test also reveals the rate at which large/small/gossip messages were pending transfer to the joining node and were dropped during transfer. This test also reveals how many messages timed out during transfer. Using this test, administrators can figure out the joining node that is not up to date with the target database nose.

Ouputs of the test: One set of results for each node clustered with the target Cassandra Database node being monitored.

Descriptor of the test: Node

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
Large_msg_compltd Indicates the rate at which large messages were completely transferred to this node during the last measurement period. Messages/sec A high value is desired for this measure.
Large_msg_drop Indicates the rate at which large messages were dropped while being transferred to this node during the last measurement period. Messages/sec Ideally, the value of this measure should be zero.
Large_msg_pending Indicates the rate at which large messages were pending to be transferred from this node during the last measurement period. Messages/sec A sudden/gradual increase in the value of this measure is a cause of concern. This indicates that the data in the joining node is not up to date with the data available in the target database node.
Small_msg_compltd Indicates the rate at which small messages were completely transferred to this node during the last measurement period. Messages/sec A high value is desired for this measure.
Small_msg_drop IIndicates the rate at which small messages were dropped while being transferred to this node during the last measurement period. Messages/sec Ideally, the value of this measure should be zero.
Small_msg_pending Indicates the rate at which small messages were pending to be transferred to this node during the last measurement period. Messages/sec A sudden/gradual increase in the value of this measure is a cause of concern. This indicates that the data in the joining node is not up to date with the data available in the target database node.
Gossip_msg_compltd Indicates the rate at which gossip messages were completely transferred to this node during the last measurement period. Messages/sec A high value is desired for this measure.
Gossip_msg_drop Indicates the rate at which gossip messages were dropped while being transferred to this node during the last measurement period. Messages/sec Ideally, the value of this measure should be zero.
Gossip_msg_pending Indicates the rate at which gossip messages were pending to be transferred to this node during the last measurement period. Messages/sec A sudden/gradual increase in the value of this measure is a cause of concern. This indicates that the data in the joining node is not up to date with the data available in the target database node.
Timeouts Indicates the rate at which messages timed out while being transferred to this node during the last measurement period. Timeouts/sec A low value is desired for this measure. A sudden/steady increase in the value of this measure reveals that the messages were not transferred to the joining node successfully and the joining node does not contain all the data that needs to be transferred from the target database node.