eG Monitoring
 

Measures reported by EsxCpuSummaryTest

This test alerts administrators to issues with the overall CPU usage of the ESX host across processors.

 The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
Usage Indicates the percentage of physical CPU used by the host Percent A very high value for this measure indicates excessive CPU utilization by the host. The CPU utilization may be high because a few processes are consuming a lot of CPU, or because there are too many processes contending for a limited resource.
Max_limited Indicates the percentage of  scheduling limit over a past minute. Percent  
Cpu_usage Indicates the CPU usage in MHz of the VMware ESX host. Mhz  
Reserved_capacity Indicates the total CPU capacity that is reserved by the VMs MB  
CpuCoStop Indicates the percentage of time the virtual host was ready to execute commands but was waiting for the availability of multiple pCPUs as the virtual host was configured to use multiple vCPUs. Percent If the virtual host is unresponsive and the value of this measure is high, it may indicate that the vSphere host has limited CPU resources to simultaneously co-schedule all pCPUs. If this value is low, then any performance problems should be attributed to other issues and not to the co-scheduling of the pCPU.
tot_capacity Indicates the total CPU capacity reserved by and available for virtual machines. MHz  
demand Indicates the total amount of CPU resources that all the powered on virtual machine on the host would use if there were no CPU contention or CPU limit. MHz By observing the variations to this measure over time, you will be able to judge how much CPU resources the VMs really require.
latency Indicates the percentage of time for which the powered on virtual machines on the host are ready to run, but are not running because they have reached their maximum CPU limit setting. Percent A high value of this measure is a cause for concern as it indicates that the VMs on the host have been non-operational for a long time for want of CPU resources. You may want to consider increasing the CPU limits, reservations, and shares for the VMs, so as to preempt such unpleasant situations.
swap_wait Indicates the percentage of CPU time spent waiting for swap-in. Percent If the value of this measure is abnormally high, then check if the host has enough memory for running all VMs.
cpu_wait Indicates the percentage of CPU time spent in wait state. Percent CPU wait time includes CPU swap wait time, CPU idle time, and CPU I/O wait time. If the value of this measure is abnormally high, then you may want to check the value of the swap_wait and Cpu_idle measures to know on what CPU was waiting the longest - was it waiting for swapping? was it waiting for an I/O operation to complete? or was it just being idle?
ready_wait Indicates the total time that the virtual machines on the host were ready, but could not get scheduled to run on the physical CPU during last measurement interval. Percent This metric should typically be low - generally 5% or less. If VMs wait too long to run, it can significantly affect the responsiveness of the VMs.
CoreUtilization Indicates the CPU utilization of the corresponding core as a percentage during the interval. Percent This measure is reported only if hyper-threading is enabled.
Cpu_idle Indicates the percentage of time that the CPU spent in an idle state. Percent If the CPU wait time measure is abnormally high, then compare the value of this measure with that of the swap_wait measure to know where the CPU spent maximum time - waiting for swapping? in the idle state? or waiting for an I/O operation?
Cpu_physical Indicates the ratio of the number of virtual CPUs utilized to the number of available physical CPUs. Percent pCPU or ‘physical’ CPU in its simplest terms refers to a physical CPU core i.e. a physical hardware execution context (HEC) if hyper-threading is unavailable or disabled. If hyperthreading has been enabled then a pCPU would consitute a logical CPU. This is because hyperthreading enables a single processor core to act like two processors i.e. logical processors. So for example, if an ESX 8-core server has hyper-threading enabled it would have 16 threads that appear as 16 logical processors and that would constitute 16 pCPUs. As for a virtual CPU (vCPU) this refers to a virtual machine’s virtual processor and can be thought of in the same vein as the CPU in a traditional physical server. vCPUs run on pCPUs and by default, virtual machines are allocated one vCPU each. However, VMware have an add-on software module named Virtual SMP (symmetric multi-processing) that allows virtual machines to have access to more than one CPU and hence be allocated more than one vCPU. The number of virtual machine vCPUs allocated compared to the number of physical CPU cores available is the vCPU-to-pCPU ratio. Determining this ratio will depend on the CPU utilization of the workloads. If workloads are CPU-intensive, the vCPU-to-pCPU ratio will need to be smaller; if workloads are not CPU-intensive, the vCPU-to-pCPU ratio can be larger. If the vCPU-to-pCPU ratio is too large - i.e., if the value of this measure is very high - it can result in high CPU Ready times. This may have a negative impact on the virtual machine's performance. Here are some recommendations:
  • 1:1 to 3:1 - i.e., if the value of this measure varies between 100 % to 300% - it is no problem
  • 3:1 to 5:1- i.e., if the value of this measure is in the 300% - 500% range, it may begin to cause performance degradation
  • 6:1 or greater - i.e., any value that is 600% and over - is bound to cause a problem