eG Monitoring
 

Measures reported by CorOSConUptmeTest

In environments where CRIO container engine is used extensively to launch containers and pods, it is essential to monitor the uptime of critical containers launched by the target CRIO engine. By tracking the uptime of each of the containers, administrators can determine what percentage of time a container has been up. In some environments, administrators may schedule periodic reboots of their containers. By knowing that a specific container has been up for an unusually long time, an administrator can easily identify that the scheduled reboot task is not working on a container. The CorOSConUptmeTest helps administrators track such irregularities with ease!

Use this test to promptly detect unscheduled reboots and unexpected breaks in the availability of each container on the target CRIO container engine.

Outputs of the test : One set of results for each container available in the CRIO Engine beingmonitored.

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
cont_reboot IIndicates whether/not this container was rebooted.   The values reported by this measure and their numeric equivalents are available in the table below:

Measure Value Numeric Value
No 0
Yes 1


Note:

This measure reports the Measure Values listed in the table above while indicating whether /not this container was rebooted. However, in the graph of this measure, the measure is indicated using only the Numeric Values listed in the above table.

For each container, the detailed diagnosis of this measure lists the time, shutdown date, restart date, duration of shutdown, and whether/not each container is in maintenance.
time_diff Indicates the time duration for which this container has been up since the last time this test ran. Seconds A low value implies that the container was recently rebooted. From the measure value, you can figure out if the reboot was scheduled or unscheduled.

A high value could indicate that a scheduled reboot has failed.
total_uptime Indicates the total time that this container has been up since its last reboot. Seconds This measure displays the number of years, months, days, hours, minutes and seconds since the last reboot of each container. Administrators may wish to be alerted if a container has been running without a reboot for a very long period. Setting a threshold for this metric allows administrators to determine such conditions.