|
Measures reported by IBPmaPortTest
In the InfiniBand Switch, management packets that allow retrieval of the hardware performance counters are transmitted through ports on which Performance Management Agent (PMA) is deployed. For an optimal communication to happen through the PMA ports, packets that are sent through the PMA ports should be error-free and transmitted fully without being discarded due to congestion. If one/more ports transmit packets with errors or discard packets due to congestion, this may result in the failure of links that are passing through the PMA ports to the components of the InfiniBand Switch and in the worst case may result in the failure of the target switch itself! To avoid such casualties, it is necessary to monitor the errors detected on the PMA ports and rectify the same before end users start complaining. The IBPmaPortTest test helps in this regard!
This test auto-discovers the PMA ports of the InfiniBand Switch and reports the number of symbol errors and the total number of packets with errors received on each PMA port. In addition, this test also reports how many times the port training state machine failed to recover the links that failed due to errors. The count of inbound/outbound packets discarded at each port is also reported. By regularly analyzing the metrics reported by this test, administrators can identify the error-prone PMA ports and take remedial actions to ensure error free packet trasmission.
Outputs of the test : One set of results for each PMA port on the InfiniBand Switch to be monitored.
The measures made by this test are as follows:
| Measurement |
Description |
Measurement Unit |
Interpretation |
| ibPmaSymbolErrCounter |
Indicates the number of symbol errors detected on one or more physical lanes of this PMA port. |
Number |
A Symbol error is reported when the port receives an undefined or invalid symbol while transmitting packets. By comparing the value of this value across the ports, administrators can find out the error-prone port. |
| ibPmaLinkErrRecoveryCntr |
Indicates the number of times the port training state machine has successfully completed the link error recovery process on this PMA port. |
Number |
|
| ibPmaLinkDownedCntr |
Indicates the number of times the port training state machine failed the link error recovery process and downed the link on this PMA port. |
Number |
|
| ibPmaPortRcvErr |
Indicates the number of packets containing an error that were received on this port. |
Number |
The packets may contain one of the following errors:
Local physical errors (ICRC, VCRC, FCCRC, and all physical errors that cause entry into the BAD PACKET and BAD PACKET DISCARD states of the packet receiver state machine)
Malformed data packet errors (LVer, length, VL)
Malformed link packet errors (operand, length, VL)
Packets discarded due to buffer overrun.
Ideally, the value of this measure should be 0. A non-zero value indicates the existence of one/more problems with the port. A very high value is indicative of a problem-prone port. |
| ibPmaPortRcvRemPhysErr |
Indicates the number of packets marked with the End-of-Bad-Packets (EBP) delimiter received on this port. |
Number |
|
| ibPmaPortRcvSwiRelayErr |
Indicates the number of inbound packets discarded since this port is down/congested. |
Number |
Comparing the values of these measures will help administrators to identify the port at which maximum number of packets are discarded during inbound/outbound transmission. |
| outDiscardPkt |
Indicates the number of outbound packets discarded since this port is down/congested. |
Number |
Comparing the values of these measures will help administrators to identify the port at which maximum number of packets are discarded during inbound/outbound transmission. |
|