eG Monitoring
 

Measures reported by HBAStatsTest

By periodically checking the target port status and measuring the I/O processing capability of the ports, you can identify overloaded ports, and thus proactively detect potential/existing load-balancing irregularities and/or processing bottlenecks on the host bus adapter. The HBA Port Stats test facilitates this port check.

For every port configured on the Host Bus Adapter, this test reports the port state, the I/O processing ability of the ports, and the errors encountered by each port. In the process, the test not only points administrators to overloaded ports, but also puts a finger on ports that are slow in processing I/O requests and the ports that are erroneous.

Ouputs of the test: One set of results for the target server being monitored.

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
Port_state Indicates the current state of this port.   The numeric values that correspond to the states are as follows:

Measure Value Numeric Value
Unknown 1
Operational 2
User Offline 3
Bypassed 4
In diagnostics mode 5
Link Down 6
Port Error 7
Loopback 8

Note:

By default, the test reports the States listed in the table above to indicate the current state of the port. In the graph of this measure however, the state is indicated using the corresponding numeric equivalents only.

Port_type Indicates the current type of this port.   The values that this measure can take and their corresponding numeric values are as follows:

Measure Value Numeric Value
Unknown 1
Other 2
Not present 3
Fabric 5
Public Loop 6
Fabric on a loop 7
Fabric Port 8
Fabric expansion port 9
Generic Fabric Port 10
Private Loop 20
Point to Point 21

Note:

By default, this measure reports one of the Measure Values listed in the table above. In the graph of this measure however, the port type is indicated by the corresponding numeric equivalents only.

Port_speed Indicates the current operational speed of this port. Gbps  
Received_frames Indicates the rate at which frames were received by this port during the last measurement period. Frames/sec Compare the value of these measures across the ports to identify the slowest port in terms of receiving / transferring frames.
Transfer_frames Indicates the rate at which frames were transferred from this port during the last measurement period. Frames/sec
Received_words Indicates the rate at which words were received by this port during the last measurement period. Words/sec Compare the value of these measures across the ports to identify the slowest port in terms of receiving / transferring words.
Transfer_words Indicates the rate at which words were transferred from this port during the last measurement period. Words/sec
Error_frames Indicates the rate at which frames were received / transferred with errors from this port during the last measurement period. Frames/sec Compare the value of this measure across the ports to identify the port that is more error prone.
Dumped_frames Indicates the number of frames dumped by this port per second due to lack of buffer credit during the last measurement period. Frames/sec Buffer credits, also called buffer-to-buffer credits (BBC) are used as a flow control method by Fiber Channel technology and represent the number of frames a port can store.

Each time a port transmits a frame that port's BB Credit is decremented by one; for each R_RDY received, that port's BB Credit is incremented by one. Transmission of an R_RDY indicates that the port has processed a frame, freed a receive buffer, and is ready for one more. If the BB Credit is zero, the corresponding node cannot transmit until an R_RDY is received back. A high value for this measure therefore indicates that an R_RDY was not received by the FC port for a long time. This is a cause for concern, as until the R_RDY is received, the FC port will not resume communication.

The solution for this problem is to allocate optimal buffer credits to the FC port. The optimal number of buffer credits is determined by the distance (frame delivery time), the processing time at the receiving port, the link signaling rate, and the size of the frames being transmitted. As the link speed increases, the frame delivery time is reduced and the number of buffer credits must be increased to obtain full link utilization, even in a short-distance environment. Smaller frame sizes need more buffer credits.

LIP_count Indicates the number of Loop initialization primitive events on this port during the last measurement period. Number  
NOS_count Indicates the number of Non-operational state (NOS) primitive events on this port during the last measurement period. Number  
Link_failure_count Indicates the number of link failures experienced per second by this target during the last measurement period. Number Ideally, the value of this measure should be zero. A non-zero value indicates that Fiber Channel connectivity with this target was “broken” that many times. This is likely an indicator for a faulty connector or cable. These are also caused when the device connected to this target is restarted, replaced or being serviced when the Fiber Channel cable connected to this target is temporarily disconnected.
Loss_of_sync_count Indicates the number of times this port failed to synchronize during the last measurement period. Number Ideally, the value of this measure should be zero. A non-zero value for this measure indicates that port went into the “loss of synchronization” state, where it encountered continuous Disparity errors.

This is likely an indicator for a faulty connector or cable. These are also caused when the device connected to the port is restarted, replaced or being serviced when the Fiber Channel cable connected to the port is temporarily disconnected.

If the port is in the “loss of synchronization” state for longer than a specific period, the port will get into the link failure state which could degrade the performance of the Fiber Channel link.

Loss_of_signal_count Indicates number of signals lost on this port during the last measurement period. Number Ideally, the value of this measure should be zero. A non-zero value for this measure indicates that the port detected a loss of the electrical or optical signal used to transfer data on the port.

This is likely an indicator for a faulty connector or cable. These are also caused when the device connected to the port is restarted, replaced or being serviced when the Fiber Channel cable connected to the port is temporarily disconnected.

If the port is in the “loss of signal” state for longer than a specific period, the port will get into the link failure state which could degrade the performance of the Fiber Channel link.

Primitive_proto_errors Indicates the number of Primitive sequence protocol errors that occurred on this port during the last measurement period. Number Ideally, the value of this measure should be zero.
Invalid_tx_words Indicates the number of invalid word transmissions detected on this port per second during the last measurement period. Words/sec A low value is desired for this measure.
Invalid_crc_count Indicates the number of invalid Cyclic Redundancy Checksums that occurred on this port during the last measurement period. Number Ideally, the value of this measure should be zero.

A high value for this measure indicates poor health of the target port.

These are usually recoverable errors and will not degrade system performance unless their occurrence is sustained when the data cannot be relayed after retransmissions.