eG Monitoring
 

Measures reported by CassMsgDropTest

Cassandra has a concept of back pressure. Back pressure is a technique used in staged event-driven architecture (SEDA) architectures in which, if a stage is already full of requests, it will not accept requests from earlier stages. As a result of back pressure, Cassandra will drop already timed-out requests without processing, and log an error. Since Cassandra is based on SEDA architecture, a request has to hop between multiple threadpools during processing, thereby increasing the latency. If too many message drops are noticed on the Cassandra database node frequently, then, administrators have to check for overload conditions or figure out the real reason behind such performance lag at the earliest. To help the administrators, eG Enterprise offers the CassMsgDropTest test.

The test auto-discovers the type of messages on the target Cassandra Database node and for each message type, reports the number of messages dropped and the time duration with which the messages were dropped from within the node and across the node. Using this test, administrators can figure out the messages of the message type that were dropped frequently and measure the maximum latency observed for the messages of each message type within the node and across nodes.

Ouputs of the test: One set of results for each message type on the target Cassandra Database node being monitored.

Descriptor of the test: Message Type

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
Dropped_messages Indicates the rate at which messages of this type were dropped during the last measurement period. Messages/sec

Ideally, the value of this measure should always be 0. This measure is a good indicator of load on the database node.

Messages are dropped when the internode messages received by a node are not processed within their proper timeout.

Load shedding is part of the Cassandra architecture. If this is a persistent issue, it is generally a sign of an overloaded node or cluster.

Latency_across_nodes Indicates the time taken for the messages of this type to be dropped from across the nodes. Milliseconds

Ideally, the value of this measure should be 0.

Compare the value of this measure across message type to figure out the messages of which message type were more prone to be dropped from across the nodes.

Latency_within_node Indicates the time taken for the messages of this type to be dropped from within the node. Milliseconds

Ideally, the value of this measure should be 0.

Compare the value of this measure across message type to figure out the messages of which message type were more prone to be dropped from within the node.