Measures reported by UCSCsFaultTest
A fault is a mutable object that is managed by the Cisco UCS Manager. Each fault represents a failure in the Cisco UCS Manager or an alarm threshold that has been raised. The fault can change from one state or severity to another during its lifecycle. Each fault includes information about the operational state of the affected object at the time the fault was raised. If the fault is transitional and the failure is resolved, then the object transitions to a functional state.
The fault remains in the Cisco UCS Manager until the fault is cleared and deleted according to the settings in the fault collection policy. The fault collection policy controls the lifecycle of a fault in a Cisco UCS instance, including when faults are cleared, the flapping interval (the length of time between the fault being raised and the condition being cleared), and the retention interval (the length of time a fault is retained in the system). The fault, if not detected earlier, may cause the following types of problems:
service unavailability
power problem, thermal problem and voltage problem,
component configuration failures,
serious management issues,
poor adapter connectivity,
network issue such as link down,
log capacity issue or failed server discovery.
To prevent the above-said problems, the faults raised in the Cisco UCS Manager should be tracked at regular intervals and cleared before the operation of the Cisco UCS Manager comes to a halt! The UCSCsFaultTest test helps administrators in this regard!
This test monitors the faults raised in the Cisco UCS Manager and for each severity, this test reports the number of faults raised. Using this test, administrators can figure out which severity type of faults were raised at the maximum and take corrective measures to rectify the same.
Outputs of the test : One set of results for the Cisco UCS Manager that is being monitored.
The measures made by this test are as follows:
| Measurement |
Description |
Measurement Unit |
Interpretation |
| Critical |
Indicates the number of Critical fault events occurred in the Cisco UCS Manager during the last measurement period. |
Number |
Ideally, value of this measure should be zero. A critical fault is a service-affecting condition that requires immediate corrective action. For example, the critical severity could indicate that the managed object is out of service and its capability must be restored. |
| Major |
Indicates the number of Major fault events occurred in the Cisco UCS Manager during the last measurement period. |
Number |
Ideally, value of this measure should be zero. A major fault is a service-affecting condition that requires urgent corrective action. For example, this fault could indicate a severe degradation in the capability of the managed object and that its full capability must be restored. |
| Minor |
Indicates the number of Minor fault events occurred in the Cisco UCS Manager during the last measurement period. |
Number |
Ideally, value of this measure should be zero. A minor fault condition that requires corrective action to prevent a more serious fault from occurring. For example, this severity could indicate that the detected alarm condition is not currently degrading the capacity of the managed object. |
| Warning |
Indicates the number of Warning fault events occurred in the Cisco UCS Manager during the last measurement period. |
Number |
Ideally, value of this measure should be zero. A warning fault us a potential or impending service-affecting fault that currently has no significant effects in the system. Action should be taken to further diagnose, if necessary, and correct the problem to prevent it from becoming a more serious service-affecting fault. |
| Information |
Indicates the number of information events that occurred on the Cisco UCS Manager during the last measurement period. |
Number |
A basic notification or informational message, possibly independently insignificant. These messages provide the details about the state changes and fault transitions. For more details, use the detailed diagnosis of this measure. |
|