eG Monitoring
 

Measures reported by SystemTest

SystemTest is an operating system-specific test and relies on native measurement capabilities of the operating system to collect various metrics pertaining to the CPU and memory usage of a host system. The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
Cpu_util Indicates the percentage of utilization of the CPU time of the host system. Percent A high value could signify a CPU bottleneck. The CPU utilization may be high because a few processes are consuming a lot of CPU, or because there are too many processes contending for a limited resource. Check the currently running processes to see the exact cause of the problem. The detailed diagnosis capability, if enabled, lists the top 10 processes that are consuming more CPU, and the users who are running the CPU-intensive processes.
System_cpu_util Indicates the percentage of CPU time spent for system-level processing. Percent An unusually high value indicates a problem and may be due to too many system-level tasks executing simultaneously.
Run_queue_length This measure indicates the instantaneous length of the queue in which threads are waiting for the processor cycle. This length does not include the threads that are currently being executed. Number A value greater than 2 indicates processor congestion.
Num_blocked_procs Indicates the number of processes blocked for I/O, paging, etc. Number A high value could indicate an I/O problem on the host (e.g., a slow disk).
Swap_memory On Windows systems, this measurement denotes the committed amount of virtual memory. This corresponds to the space reserved for virtual memory on disk paging file(s). On Solaris systems, this metric corresponds to the swap space currently available. On HPUX and AIX systems, this metric corresponds to the amount of active virtual memory (it is assumed that one virtual page corresponds to 4 KB of memory in this computation). MB An unusually high value for the swap usage can indicate a memory bottleneck. Check the memory utilization of individual processes to figure out the process(es) that has (have) maximum memory consumption and look to tune their memory usages and allocations accordingly. 
Free_memory Indicates the free memory available. MB This measure typically indicates the amount of memory available for use by applications running on the target host.

On Unix operating systems (AIX and Linux), the operating system tends to use parts of the available memory for caching files, objects, etc. When applications require additional memory, this is released from the operating system cache. Hence, to understand the true free memory that is available to applications, the eG agent reports the sum of the free physical memory and the operating system cache memory size as the value of the Free memory measure while monitoirng AIX and Linux operating systems.

The detailed diagnosis of this measure, if enabled, lists the top 10 processes responsible for maximum memory consumption on the host.

Scan_rate This measure indicates the memory scan rate. Pages/Sec A high value is indicative of memory thrashing. Excessive thrashing can be detrimental to application performance.
Steal_time Indicates the percentage of time a virtual processor waits for a real CPU while the hypervisor is servicing another virtual processor. Pages/Sec This measure is applicable only for the Windows VMs that are provisioned via a VMware vSphere ESX.

A low value is desired for this measure.

A high value for this measure indicates that a particular virtual processor is waiting longer for real CPU resources. If this condition is left unattended, it can stall the tasks performed by the virtual processor and cause the overall performance of the virtual processor to deteriorate significantly and badly impact user-experience with the target server.

The impact of stolen CPU always manifests in slowness but can have more profound effects on your infrastructure. Here are some examples:

  • Slower page load times

  • Slower database query times

  • Slower processing of reports

  • Increased queue size of asynchronous tasks because of an inability to process them quickly

  • Increased IaaS bill due to launching more servers to handle the same amount of load

To avoid such eventualities, administrators should either immediately terminate the virtual machine and launch a replacement or upgrade the VM to have more CPU.

Note:

The Num_blocked_procs measure will not be available when executing this test on a Windows host.

For multi-processor systems, where the CPU statistics are reported for each processor on the system, the statistics that are system-specific (e.g., run queue length, free memory, etc.) are only reported for the "Summary" descriptor of this test. However, note that the 'Scan rate' measure will not be available for the Summary descriptor on Linux systems.