|
Measures reported by CompFailureTest
Hardware errors/failures, if not promptly detected and resolved, can prove to be fatal to the availability and overall health of a storage system.
This test intercepts the traps sent by the storage system, extracts information related to hardware errors/failures from the traps, and reports the count and detailed description of these trap messages to the eG manager. This information enables administrators to detect current and potential hardware failures, understand the nature of these failures, and accordingly decide on the remedial measures.
The measures made by this test are as follows:
| Measurement |
Description |
Measurement Unit |
Interpretation |
| Num_messages |
Indicates the number of failure events of this type that were captured during the last measurement period in this storage system. |
Number |
The failure events may be generated due to the failure of hardware units like fans, chassis power supply etc., or failure of the cluster node, the shell Interface module failure etc. If the failure events are not rectified within a certain pre-defined timeperiod, the storage system will be shutdown automatically.
Ideally, the value of this measure should be zero. A high value is an indication of performance degradation of the storage system.
The detailed diagnosis capability, if enabled provides you with a more detailed information about the failure events that were captured by this measure. |
|