eG Monitoring
 

Measures reported by LCNodeStatusTest

A high availability Linux cluster is a group of Linux computers or nodes, storage devices that work together and are managed as a single system. With Linux clustering, an application is run on one node, and clustering software is used to monitor its operation. If the software detects an issue, it moves operation of the application to the secondary node in a process called failover. Since the secondary node shares storage with the primary, operation can continue quickly, meeting very short (seconds to minutes) recovery time and recovery point objectives.

In a Linux cluster, users can connect to any node and perform any operation. Nodes will route operations to the primary node transparently to users. In case of a node failure, users will be able to reconnect to a different node, recover their topology and continue operation. Regardless of which node is serving user requests, at any point in time, administrators should be able to tell the operational state of each node in the cluster. Administrators should also be aware of frequent failover between the nodes. The LCNodeStatusTest helps administrators in this regard!

This test auto-discovers the nodes in the target cluster and for each node, the test reports the current status. This test also reports whether/not each node is the owner node and if the owner node was changed recently. Using this test, administrators can isolate the nodes that have been offline or under maintenance for a longer duration and analyze the reason for the same. This test also helps administrators detect frequent failover between the nodes and can initiate steps to analyze why the nodes have been failing over frequently.

Outputs of the test : One set of results for each node in the Linux cluster being monitored.

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
Node_status Indicates the current state of this node.   The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure Value Numeric Value
Online 100
Standby 75
Maintenance 50
Offline 0


Note:

By default, this measure reports the above-mentioned Measure Values to indicate the current state of each node. The graph of this measure however, is represented using the numeric equivalents as specified in the above-mentioned table.
Owner_node_changed Indicates whether the owner node has been changed during the last measurement period.   The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure Value Numeric Value
Yes 100
No 0


Note:

By default, this measure reports the above-mentioned Measure Values to indicate whether/not the owner node has been changed. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 100. The detailed diagnosis of this measure reveals the owner node name.
Is_owner_node Indicates whether/not this node is the owner node.   The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure Value Numeric Value
Yes 100
No 0


Note:

By default, this measure reports the above-mentioned Measure Values to indicate whether this node is owner node. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 100.