eG Monitoring
 

Measures reported by NutClusStrPoolTest

A storage pool is a group of physical storage devices including PCIe SSD, SSD, and HDD devices for the cluster. The storage pool can span multiple Nutanix nodes and is expanded as the cluster scales. In most configurations, only a single storage pool is leveraged.

Since the VMs and nodes in a cluster rely heavily on the storage pools for their availability and overall performance, it is imperative that the storage pools be sized and tuned right. If not, the dependent VMs and nodes will experience serious performance setbacks ranging from a slowness to a standstill!

To determine whether/not a storage pool needs to be resized, an administrator must first know how much storage space is available to that pool, how this space has been utilized, what is the typical I/O load on the pool, and how well it processes this load. The NutClusStrPoolTest test reports these statistics for each storage pool that is managed by the Nutanix Prism. With the help of this information, administrators can proactively detect a potential space contention, an I/O overload, and even processing latencies that may impact storage performance, and can initiate measures to avert them. Additionally, the test also measures and reports the effectiveness of the storage optimization methodologies that are applied currently.

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
NO_OF_DISK Indicates the number of disks pooled in this storage pool. Number The detailed diagnosis of this measure lists the ID of the disk, status of the disk and host name.
DISK_USAGE Indicates the total amount of physical storage space used by the disks in this storage pool. GB  
STR_CAPACITY Indicates the amount of space in the cluster that is available to this storage pool. GB Where there are multiple storage pools, you can compare the value of this measure across the pools to know which pool has been sized with the maximum storage space.
STR_USAGE Indicates the total amount of physical storage space used in this storage pool. GB A consistent increase in the value of this measure is indicative of rapid usage of space in the pool, which could lead to a storage space contention.
STR_FREE Indicates the total amount of physical storage space that is unused in this pool. GB Ideally, the value of this measure should be high. A very low value for this measure could indicate that the pool is running short of storage resources and may require expansion.
STR_USAGE_PERC Indicates the percentage of physical storage space used in this storage pool. Percent A value close to 100% is a cause for concern as it indicates a probable contention for storage space on the pool. You may want to consider resizing the pool to make sure that VM operations continue uninterrupted.
STR_FREE_PERC Indicates the percentage of physical storage space that is unused in this storage pool. Percent A value less than 50% is a cause for concern as it indicates a probable contention for storage space on the pool. You may want to consider resizing the pool to make sure that VM operations continue uninterrupted.
LOGICAL_USAGE Indicates the total amount of logical storage space used in this storage pool. GB  
IO_LATENCY Indicates the average I/O latency for physical disk requests in this storage pool. Seconds Ideally, the value of this measure should be very low. A high value or a steady increase in this value could indicate an I/O processing bottleneck on the pool. In such a case, compare the value of the READ_IO_LATENCY and WRITE_IO_LATENCY measures to figure out where the slowness is worst - when processing read requests? or write requests?
READ_IO_LATENCY Indicates the average time taken by this storage pool to process read I/O requests. Seconds If the IO_LATENCY measure reports an abnormally high value, then compare the value of these measures to figure out where the slowness is maximum - when processing read requests? or write requests?
WRITE_IO_LATENCY Indicates the average time taken by this storage pool to process write I/O requests. Seconds
IO_BANDWIDTH Indicates the bandwidth per second used by this storage pool when processing I/O requests. KB/Sec A high value for this measure denotes that the storage pool is processing bandwidth-intensive I/O. In such situations, you may want to compare the value of the READIO_BWIDTH and WRITEIO_BWIDTH measures to know what type of I/O requests are truly contributing to the excessive bandwidth consumptions - read requests? or write requests?
READIO_BWIDTH Indicates the bandwidth per second used by this storage pool when processing read I/O requests. KB/Sec If the value of the IO_BANDWIDTH measure is high, then you may want to compare the values of these measures to know what type of I/O requests are truly contributing to the excessive bandwidth consumption - read requests? or write requests?
WRITEIO_BWIDTH Indicates the bandwidth per second used by this storage pool when processing write I/O requests. KB/Sec
IOPS Indicates the number of I/O operations performed currently on this storage pool. Number This measure is a good indicator of the level of I/O activity on the storage pool. A steady and significant increase in the value of this measure could indicate a potential I/O overload. In such situations, you may want to compare the value of the READ_IOPS and WRITE_IOPS measures of the storage pool to know what type of IO operations are contributing to the overload.
READ_IOPS Indicates the number of read I/O operations performed currently on this storage pool. Number If the value of the IOPS measure is unusually high, then compare the value of these measures for that storage pool to know what is contributing to the unusual I/O activity levels - read requests? or write requests?
WRITE_IOPS Indicates the number of write I/O operations performed currently on this storage pool. Number
TRANS_USAGE Indicates the amount of actual usage of storage (i.e., usage after compression and deduplication) in the storage pool. GB The Nutanix platform incorporates a wide range of storage optimization technologies that work in concert to make efficient use of available capacity for any workload. Compression and Deduplication are two such technologies.

Compression can be inline or offline. Inline compression will compress sequential streams of data or large I/O sizes (>64K) in memory before it is written to the Extent Store. Offline compression will initially write the data as normal (in an un-compressed state) and then leverage the Curator framework to compress the data cluster wide.

The Elastic Dedupe Engine in Nutanix allows for data deduplication in the capacity (Extent Store) and performance (Unified Cache) tiers. Streams of data are fingerprinted during ingest using a SHA-1 hash at a 16K granularity. This fingerprint is only done on data ingest and is then stored persistently as part of the written block's metadata. For duplicate data that can be deduplicated in the capacity tier, the data does not need to be scanned or re-read, essentially duplicate copies can be removed.

The true effectiveness of these optimization methodologies can be measured by determining how much storage space in the pool these technologies helped save. By comparing the value of this measure with the value of the Storage usage measure of the pool, you should be able to make an accurate assessment of the effectiveness of these methodologies.