eG Monitoring
 

Measures reported by VrtxTempProbeTest

Temperature probes in the VRTX system are configured with threshold values, which when violated, automatically increases the speed of the corresponding fans, so that temperatures never rise beyond a permissible limit. In the absence of these temperature probes, such automated cooling actions will not occur, causing the internal temperature of the VRTX to soar uncontrollably, fatally damaging hardware components in the process. This is why, it is important that administrators periodically check that the temperature probes are up and operating without a glitch.

Also, the threshold values defined for each of the temperature probes may have to be fine-tuned from time to time, so that the fan speed is changed only when there is a genuine need and not for marginal spikes in temperature. For this, the administrator should keep track of the temperature probe readings over time, understand whether/not that reading is good or bad as per the current threshold definition, and accordingly make changes (if required) to the configuration. The VrtxTempProbeTest test helps achieve both these ends.

This test auto-discovers the temperature probes, reports the current status of each probe, reveals the current temperature reading of that probe, and indicates whether that reading is good or bad. This way, the test alerts administrators to unexpected probe failures and urges them to instantly initiate corrective action and restore normalcy. Additionally, the test also helps administrators quickly analyze the current temperature reading of a probe vis-à-vis its threshold setting, and thus helps them figure out whether the thresholds need to be refined or not.

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
Health_status Indicates how healthy this temperature probe currently is.   The values that this measure can report and their corresponding numeric values are discussed below:

Measure Value Numeric Value
Other 1
Unknown 2
Normal 3
NonCritical Upper 4
Critical Upper 5
NonRecoverable Upper 6
NonCritical Lower 7
Critical Lower 8
NonRecoverable Lower 9
Failed 10

Note:

By default, this measure reports one of the Measure Values listed above to indicate the current health of a temperature probe. In the graph of this measure however, the same is represented using the numeric equivalents only.

Temperature Indicates the current temperature reading of this probe. DegreeC This measure reports values, only if the temperature probe is of a type other than ‘GenericDiscrete’.

A sudden and a significant rise in temperature may require closer scrutiny.

Temperature_state Indicates whether the temperature recording of this probe is good or bad.   The values that this measure can report and their corresponding numeric values are discussed below:

Measure Value Numeric Value
Good 1
Bad 2

Note:

By default, this measure reports one of the Measure Values listed above to indicate the current temperature status of a probe. In the graph of this measure however, the same is represented using the numeric equivalents only.

This measure reports values, only if the temperature probe is of a type other than ‘GenericDiscrete’.