eG Monitoring
 

Measures reported by VrtxCoolDeviceTest

Each 1100 Watt PSU in the VRTX has a built-in fan. For cooling of the server-modules there are four blower-modules, each containing two fans, and for cooling of the rest of the chassis there are 6 internal fans. If any of these fans fail, then the temperature of the core hardware components of the VRTX may suddenly soar, causing irreparable damage to those components. If such failures are to be averted, administrators must continuously check on the health, speed, and running condition of every fan, detect potential aberrations in fan state before they actually occur, and quickly initiate preventive measures. This is what the VrtxCoolDeviceTest test does!

For every fan in the VRTX, this test reports the current health, speed, and running condition of that fan, captures abnormalities on-the-fly, and brings them to the attention of the administrators. This enables administrators to identify those fans that are in the danger of going down and helps them quickly initiate measures to repair or replace such fans to ensure that VRTX operations resume without a glitch.

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
Health_status Indicates how healthy this fan currently is.   The values that this measure can report and their corresponding numeric values are discussed below:

Measure Value Numeric Value
Other 1
Unknown 2
Normal 3
NonCritical Upper 4
Critical Upper 5
NonRecoverable Upper 6
NonCritical Lower 7
Critical Lower 8
NonRecoverable Lower 9
Failed 10

Note:

By default, this measure reports one of the Measure Values listed above to indicate the current health of a fan. In the graph of this measure however, the same is represented using the numeric equivalents only.

Speed Indicates the current speed of this fan. Rpm A sudden and significant rise in the value of this measure could be a cause of concern.
Running_status Indicates the current running condition of this fan.   The values that this measure can report and their corresponding numeric values are discussed below:

Measure Value Numeric Value
Good 1
Bad 2

Note:

By default, this measure reports one of the Measure Values listed above to indicate the current condition of a fan. In the graph of this measure however, the same is represented using the numeric equivalents only.