eG Monitoring
 

Measures reported by NSHATest

A high availability (HA) deployment of two Citrix® NetScaler® appliances can provide uninterrupted operation in any transaction. With one appliance configured as the primary node and the other as the secondary node, the primary node accepts connections and manages servers while the secondary node monitors the primary. If, for any reason, the primary node is unable to accept connections, the secondary node takes over.

The secondary node monitors the primary by sending periodic messages (often called heartbeat messages or health checks) to determine whether the primary node is accepting connections. If a health check fails, the secondary node retries the connection for a specified period, after which it determines that the primary node is not functioning normally. The secondary node then takes over for the primary (a process called failover). When the secondary takes over from the primary, the configuration of both the nodes should be the same. If there exists a non-sync between the configuration of the devices, then the performance of the devices will be affected due to various external reasons like network connectivity, authentication failure etc. To avoid such non-synchronization, administrators have to frequently monitor the success/failure of the command propagation feature which helps in the synchronization process. The NSHATest test helps administrators in this regard!

By carefully analyzing the syslog file, this test reports the number of times the NetScaler system in a HA setup has stopped and the number of times the command propagation failed/was successful. In addition, this test reports the number of times the NetScaler device has switched over from primary to secondary in a HA setup. Using this test, administrators may be able to figure out the effectiveness of the High availability setup of the NetScaler device.

For this test to run and report metrics, the NetScaler device should be configured to create a Syslog file in a remote Syslog server, where the details of all interactions with the NetScaler appliance will be logged. To know how to configure the Syslog server where this Syslog file should be created, Click here.

Outputs of the test : One set of results for the NetScaler appliance being monitored

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
ns_system_stopped Indicates the number of times the NetScaler system in a HA setup was stopped. Number  
ha_propagation_failed Indicates the number of times the HA Command Propagation failed. Number Command propagation is a feature of the NetScaler appliance that ensures that the commands run on the primary NetScaler appliance of the high availability setup are automatically run on the secondary NetScaler appliance.

When you run a command on the primary appliance, this feature ensures that the command runs on the secondary appliance before it runs on the primary appliance.

Ideally, the value of this measure should be zero. A HA Propagation may occur due to the following reasons:

  • Network connectivity issues between the primary and secondary NetScaler appliances;

  • Authentication failure between the primary and secondary appliances;

  • Resources, such as Secure Socket Layer (SSL) certificates and initialization script customization are missing on the secondary appliance.

Administrators therefore are required to do the following in order to maintain the least possible value for this measure:

  • Check the network connectivity between the primary and secondary NetScaler appliances;

  • Verify the Remote Procedure Call (RPC) node settings on both the appliances.

  • Run the command directly on the secondary appliance and verify the error message. The error might have occurred because a resource required for the command exists on the primary appliance but not on the secondary appliance. Ensure that the required resource exists on the secondary appliance as well.

If command execution fails on the secondary or times out when executing on the secondary, it may cause a non-sync between the configuration of the primary and the secondary.

ha_propagation_success Indicates the number of times the HA Command Propagation was successful. Number A high value is desired for this measure. A high success rate indicates that the configuration of the primary and secondary are in sync.
ha_state_change Indicates the number of times the HA state has changed for the NetScaler device i.e, the NetScaler device has changed from primary to secondary and vice versa. Number Frequent change in the high availability state of a NetScaler device indicates serious load balancing and network issues which may sometimes lead to non - synchronization between the primary and secondary devices.
cluster_state_change Indicates the number of times the cluster state has changed. Number