| Measurement |
Description |
Measurement Unit |
Interpretation |
| iopsRead |
Indicates the number of Frontend write operations performed on this vSAN disk group per second. |
IOPS |
Virtual machines are considered front-end – where the application on the virtual machine reads and writes to the vSAN disks. Compare the values of these measures across the disk groups to know what is contributing to the abnormal I/O activity levels - read operations? or write operations? |
| iopsWrite |
Indicates the number of Frontend read operations performed on this vSAN disk group per second. |
IOPS |
| throughputRead |
Indicates the number of tests under this health check category that returned the Info state . |
MB/Sec |
The detailed diagnosis of this measure, if enabled using DDForPassedandInfo flag, reveals the name of the tests under each health check category that returned the Info state, detailed message and health status of the tests. |
| latencyAvgRead |
Indicates the rate at which this disk group processes the Frontend write requests. |
Seconds |
| latencyAvgWrite |
Indicates the number of tests under this health check category that returned the Failed state. |
Seconds |
Ideally, the values of these measures should be very low. By comparing the values of these measures, administrators can figure out where the slowness is maximum - when processing Frontend read requests? or Frontend write requests? |
| capacity |
Indicates the total capacity of this vSAN disk group. |
Seconds |
|
| capacityUsed |
Indicates the amount of space utilized from the total capacity of this vSAN disk group. |
Seconds |
Ideally, the value of this measure should be low. If the value of this measure is close to the Capacity measure, it indicates that the disk group is running out of space. Compare the value of this measure across the disk groups to identify the disk group that is being over utilized. |
| capacityReserved |
Indicates the amount of space that has been reserved for future use on this disk group. |
Seconds |
The space reserved on each disk group will be provisioned to the hosts after host failures or during maintenance. This way administrators can ensure that sufficient free capacity will be available for components to successfully rebuild after the host failures or during maintenance. |
| rcHitRate |
Indicates the percentage of reads are delivered from the read cache for this disk group. |
Percent |
A high value is desired for this measure. A grdual/significant decrease in the value of this measure indicates that the read performance is deteriorating while performing read operations from the read cache. In such a case, administrators should increase the size of the vSAN caching tier by adding more disk groups. Alternatively, administrators can tune the working set of the benchmark by doing one of the following:
Decrease the number of active VMs on this cluster
Reduce the number of VM disks accessed by the benchmark
Reduce the size of accessed data in the case of the benchmark
|
| rcSize |
Indicates the size of the read cache managed by this disk group. |
GB |
VMware vSAN leverage SSD devices of each disk group as the "performance tier" for caching purpose. The purpose of leveraging SSD devices for caching is to serve the highest possible ratio of read operations from the data stored in the read cache and to minimize the read operations to be served by the capacity disks. |
| iopsRcRead |
Indicates the number of read operations processed from the read cache.. |
IOPS |
|
| iopsRcWrite |
Indicates the number of write operations processed from the read cache. |
IOPS |
|
| latencyRcRead |
Indicates the time taken by the read cache for processing the read requests. |
Seconds |
|
| latencyRcWrite |
Indicates the time taken by the read cache for processing the write requests. |
Seconds |
|
| wbSize |
Indicates the size of the write buffer of this disk group. |
GB |
The vSAN uses "Write buffers" to de-stage written data (not individual write operations) in a way that will create a benign near-sequential (proximal) write workload for the HDDs that form the capacity tier of the vSAN disk group. |
| wbFreePct |
Indicates the percentage of space available for use in the write buffer of this disk group. |
Percent |
|
| iopsWbRead |
Indicates the number of read operations processed from the write buffer. |
IOPS |
|
| iopsWbWrite |
Indicates the number of write operations processed from the write buffer. |
IOPS |
|
| latencyWbRead |
Indicates the time taken while processing read operations from the write buffer. |
Seconds |
|
| latencyWbWrite |
Indicates the time taken while processing write operations from the write buffer. |
Seconds |
|
| ssdBytesDrained |
Indicates the rate at which the SSD bytes were destaged from the SSD |
Seconds |
|
| zeroBytesDrained |
Indicates the rate at which the Zero bytes were destaged from SSD. |
Seconds |
|
| memCongestion |
Indicates the number of times the Mem congestion occurred on this disk group. |
Number |
Congestion is a flow control mechanism used by vSAN. Whenever a bottleneck occurs in a lower layer of vSAN (closer to the physical storage devices), vSAN uses this flow control (aka congestion) mechanism to relieve the bottleneck in the lower layer and instead reduce the rate of incoming I/O at the vSAN ingress, i.e. vSAN Clients (VM Consumption). This reduction of the incoming rate is done by introducing an IO delay at the ingress that is equivalent to the delay the IO would have occurred due to the bottleneck at the lower layer. Thus, it is an effective way to shift latency from the lower layers to the ingress without changing the overall throughput of the system. Mem congestion occurs when the size of used memory heap by vSAN internal components exceed the threshold. |
| slabCongestion |
Indicates the number of times the Slab congestion occurred on this disk group. |
Number |
Slab congestion is reported when the number of inflight operations exceed the capacity of vSAN internal operation slabs. |
| ssdCongestion |
Indicates the number of times the SSD congestion occurred on this disk group. |
Number |
SSD congestion occurs when the cache tier disk write buffer runs out of space. |
| logCongestion |
Indicates the number of times the Log congestion occurred on this disk group. |
Number |
Log congestion occurs when vSAN internal log in cache tier disk runs out of space. |
| compCongestion |
Indicates the number of times the Comp congestion occurred on this disk group. |
Number |
Cache invalidations are an indicator for the number of writes on the same address offset as an existing data in the read cache. When a write operation to an IO address follows a read operation, the contents of the read cache must be updated. Such an eviction is referred to as a cache invalidation. |
| warEvictions |
Indicates the number of cache lines that are invalidated due to excessive write operations on this disk group. |
Number |
Log congestion occurs when vSAN internal log in cache tier disk runs out of space. |
| quotaEvictions |
Indicates the number of times the read cache contents were evicted due to read cache contention. |
Number |
Typically, contents in the read cache are evicted when the working set size is larger than the size of the read cache. A low value is desired for this measure. A gradual/sudden increase in the value of this measure indicates the deterioration in the read cache performance. |
| oioWrite |
Indicates the number of outstanding write operations performed on this disk group. |
Number |
|
| oioRecWrite |
Indicates the number of outstanding recovery write operations performed on this disk group. |
Number |
|
| oioWriteSize |
Indicates the amount of data written on this disk group during the outstanding write operations. |
GB |
|
| oioRecWriteSize |
Indicates the amount of data written on this disk group during the outstanding recovery write operations. |
GB |
|
| iopsResyncReadPolicy |
Indicates the number of IO write operations used for performing resynchronization on this disk group due to change in policy settings. |
IOPS |
When there is a change in VM storage policy settings, vSAN might initiate object recreation and subsequent resynchronization of the objects. Compare the values of these measures across the vSAN disk groups to identify the disk group on which maximum number of read and write operations are performed for resynchronization due to change in policy settings. |
| iopsResyncWritePolicy |
Indicates the number of IO read operations used for performing resynchronization on this disk group due to decommission. |
IOPS |
| iopsResyncReadDecom |
Indicates the number of IO read operations used for performing resynchronization on this disk group due to decommission. |
IOPS |
Typically, decommissioning is performed for disk groups from vSAN while upgrading a device or replacing a failed device, or removing a cache device. Compare the values of these measures across the vSAN disk groups to identify the disk group on which maximum number of read and write operations are performed for resynchronization due to decommission. |
| iopsResyncWriteDecom |
Indicates the number of IO write operations used for performing resynchronization on this disk group due to decommission. |
IOPS |
| iopsResyncReadRebalance |
Indicates the number of IO read operations used for performing resynchronization on this disk group while rebalancing the objects. |
IOPS |
|
| iopsResyncWriteRebalance |
Indicates the number of IO write operations used for performing resynchronization on this disk group while rebalancing the objects. |
IOPS |
|
| iopsResyncReadFixComp |
Indicates the number of IO read operations used for performing resynchronization on this disk group due to the object repair operation. |
IOPS |
|
| iopsResyncWriteFixComp |
Indicates the number of IO write operations used for performing resynchronization on this disk group due to the object repair operation. |
IOPS |
|
| tputResyncReadPolicy |
Indicates the rate at which the data is read for performing resynchronization on this disk group due to change in policy settings. |
MB/Sec |
|
| tputResyncWritePolicy |
Indicates the rate at which the data is written for performing resynchronization on this disk group due to due to change in policy settings. |
MB/Sec |
|
| tputResyncReadDecom |
Indicates the rate at which the data is read for performing resynchronization on this disk group due to the decommission. |
MB/Sec |
|
| tputResyncWriteDecom |
Indicates the rate at which the data is written for performing resynchronization on this disk group due to the decommission. |
MB/Sec |
|
| tputResyncReadRebalance |
Indicates the rate at which the data is read for performing resynchronization on this disk group while rebalancing the objects. |
MB/Sec |
|
| tputResyncWriteRebalance |
Indicates the rate at which the data is written for performing resynchronization on this disk group caused by repairing the objects. |
MB/Sec |
|
| tputResyncReadFixComp |
Indicates the rate at which the data is read for performing resynchronization on this disk group caused by repairing the objects. |
MB/Sec |
|
| tputResyncWriteFixComp |
Indicates the rate at which the data is written for performing resynchronization on this disk group caused by repairing the objects. |
MB/Sec |
|
| latResyncReadPolicy |
Indicates the average time taken to perform read operations for performing resynchronization due to change in policy settings. |
Seconds |
vSAN cluster read average latency of resync traffic, including policy change, repair, maintenance mode / evacuation and rebalance from resyncing objects in the perspective of vSAN backend. |
| latResyncWritePolicy |
Indicates the average time taken to perform write operations for performing resynchronization due to change in policy settings. |
Seconds |
|
| latResyncReadDecom |
Indicates the average time taken to perform read operations during resynchronization due to decommission. |
Seconds |
|
| latResyncWriteDecom |
Indicates the average time taken to perform write operations during resynchronization due to decommission. |
Seconds |
|
| latResyncReadRebalance |
Indicates the average time taken to perform read operations during resynchronization caused by rebalancing the objects. |
Seconds |
|
| latResyncWriteRebalance |
Indicates the average time taken to perform write operations during performing resynchronization caused by rebalancing the objects. |
Seconds |
|
| latResyncReadFixComp |
Indicates the average time taken to perform read operations during resynchronization due to object repair. |
Seconds |
|
| latResyncWriteFixComp |
Indicates the average time taken to perform write operations during resynchronization due to object repair. |
Seconds |
|