eG Monitoring
 

Measures reported by CassCliReqTest

All nodes in Cassandra are peers. A client read or write request can go to any node in the cluster. When a client connects to a node and issues a read or write request, that node serves as the coordinator for that particular client operation. The job of the coordinator is to act as a proxy between the client application and the nodes (or replicas) that own the data being requested. The coordinator determines which nodes in the ring should get the request based on the cluster configured partitioner and replica placement strategy.

In environments where multiple nodes are deployed, the nodes may receive thousands of read and write requests at a single time. To cater to the requests, it is always important for the nodes to be active. If multiple nodes are not available, then the requests may take too long to be serviced or at the worst case, the requests may fail. Therefore, it becomes important to keep track on the time taken by the nodes to service the requests and the count of requests that failed or timed out. The CassCliReqTest test helps administrators in this regard!

For each type of requests received by the target Cassandra Database server, this test reports the time taken to service the requests, the count of the requests that were unavailable, timed out and failed. By closely monitoring the measures reported by this test, administrators can further investigate the reason on why the requests were failed/timed out and take remedial measures to ensure that the requests are serviced at a faster pace!

Ouputs of the test: One set of results for each Request type on the target Cassandra Database node being monitored.

Descriptor of the test: Request type

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
Total_latency Indicates the total time taken for servicing the requests of this type during the last measurement period Milliseconds

A low value is desired for this measure.

If the value of this measure increases all of a sudden or gradually, then, it indicates that some peer nodes of the target Casandra database server are not servicing the requests or most of the requests are failing due to unavailability of the requested data etc.

Unavail_requests Indicates the rate at which requests of this type were unavailable during the last measurement period. Requests/sec An unavailable request is the only request that will cause a write to fail, so any occurrences are serious. Cassandra’s inability to meet consistency requirements can mean that several nodes are down or otherwise unreachable, or that stringent consistency settings are limiting the availability of the node.
Timeout Indicates the rate at which requests of this type were timed out during the last measurement period. Timeouts/sec A low value is desired for this measure.
Failures Indicates the rate at which requests of this type failed during the last measurement period. Failures/sec Ideally, the value of this measure should be zero.