| eG Monitoring |
|---|
|
Measures reported by CassHintTest Over time, data in a Cassandra replica can become inconsistent with other replicas due to the distributed nature of the database. Node repair corrects the inconsistencies so that eventually all nodes have the same and most upto-date data. It is important part of regular maintenance for every Cassandra cluster. Cassandra provides the following repair processes:
Occasionally, a node may become unresponsive while data is being written. This unresponsiveness may be due to hardware problems, network issues, or overloaded nodes that experience long garbage collection (GC) pauses. If a node is unable to receive a particular write, the write's coordinator node preserves the data to be written as a set of hints. When the node comes back online, the coordinator effects repair by handing off hints so that the node can catch up with the required writes. This type of repair process is termed as Hinted Handoff. The handing off hints will be happening for a period given by the max_hint_window_ms setting in cassandra.yaml. Once this window expires, nodes will stop saving hints. Hinted Handoff is an optional part of writes whose primary purpose is to provide extreme write availability when consistency is not required. Secondarily, Hinted Handoff can reduce the time required for a temporarily failed node to become consistent again with live ones. This is especially useful when a flaky network causes false-positive failures. If the hinted handoff is not enabled, then, the node may contain outdated data for a longer duration which may result in users using stale data which may result in a dip in user experience. It is therefore necessary to monitor the status of the hinted handoff round the clock. The CassHintTest test helps administrators in this regard!
By closely monitoring the Casssandra Database node, this test helps administrators to figure out if the hinted handoff is enabled or not. In addition, this test reports the total number of hints that the node needs to be updated with and the number of hints that are active for replay. If there is an abnormal increase in the count of hints, administrators may infer that there is a potential database performance degradation. Ouputs of the test: One set of results for the target Cassandra Database node being monitored. The measures made by this test are as follows:
|
||||||||||||||||||||||