eG Monitoring
 

Measures reported by MsSqlAlsRplStTest

An availability replica is an instantiation of an availability group that is hosted by a specific instance of SQL Server and maintains a local copy of each availability database that belongs to the availability group. There are two types of availability replicas that exist in the availability group: single primary replica and one to eight secondary replicas.

An availability group fails over at the level of an availability replica. An availability replica provides redundancy only at the database level-for the set of databases in one availability group. Failovers are not caused by database issues such as a database becoming suspect due to a loss of a data file or corruption of a transaction log. The primary replica makes the primary databases available for read-write connections from clients. Also, in a process known as data synchronization, which occurs at the database level. The primary replica sends transaction log records of each primary database to every secondary database. Every secondary replica caches the transaction log records (hardens the log) and then applies them to its corresponding secondary database.

Whenever a failover is detected, the administrators may want the secondary replica to take over quickly from the primary replica. If the primary replica and secondary replicas are not in a position to apply the transaction logs to the primary and secondary databases, then there may be too much of non-sync between the primary replica and the secondary replicas during failover. In order to minimize such synchronization problems and maintain the secondary replicas on par with the primary replica, administrators are required to continuously monitor the operational state, synchronization status and synchronization health of the availability replicas. The MsSqlAlsRplStTest test helps administrators in this regard!

This test continuously monitors the operational state, synchronization state and synchronization health of each availability replica of the SQL AlwaysOn Availability groups. In addition, administrators would be alerted to the current recovery state of the availability replica after a failover is initiated.

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
Operational_state Indicates the current operational state of this availability replica.   The values reported by this measure and their numeric equivalents are available in the table below:

Measure Value Numeric Value
FAILED_NO_QUORUM 0
FAILED 1
OFFLINE 2
PENDING_FAILOVER 3
PENDING 4
ONLINE 5

Note:

This measure reports the Measure Values listed in the table above to indicate the states of this availability replica. However, in the graph, this measure is indicated using the Numeric Values listed in the above table.
Recovery_state Indicates the current recovery state of this availability replica.   The values reported by this measure and their numeric equivalents are available in the table below:

Measure Value Numeric Value
ONLINE_IN_PROGRESS 0
ONLINE 1

Note:

This measure reports the Measure Values listed in the table above to indicate the recovery states of this availability replica. However, in the graph, this measure is indicated using the Numeric Values listed in the above table.
Synch_health_state Indicates the health of this availability replica during synchronization.   The values reported by this measure and their numeric equivalents are available in the table below:

Measure Value Numeric Value
NOT_HEALTHY 0
PARTIALLY_HEALTHY 1
HEALTHY 2

Note:

This measure reports the Measure Values listed in the table above to indicate the health status of the availability replica during synchronization. However, in the graph, this measure is indicated using the Numeric Values listed in the above table.

Connected_state Indicates the connection state of this availability replica with the primary/secondary availability replica.   The values reported by this measure and their numeric equivalents are available in the table below:

Measure Value Numeric Value
DISCONNECTED 0
CONNECTED 1

Note:

This measure reports the Measure Values listed in the table above to indicate the connection state between the primary and secondary replicas. However, in the graph, this measure is indicated using the Numeric Values listed in the above table.

The Detailed diagnosis of this measure if enabled, lists the ReplicaID, IsLocal, Role, Last connect error number, Last connect error description, and the Last connect error time.