eG Monitoring
 

Measures reported by XenDiskIOTest

XenServer provides support for a broad range of storage hardware. The term Storage Repository (SR) is used to describe a particular storage target on which Virtual Disk Images (VDIs) are stored. A VDI is a disk abstraction that contains the contents of a disk as presented to a virtual machine. XenServer allows these VDIs to be supported on a large number of SR types, including local disks, NFS filers, Fibre Channel disks and shared iSCSI LUNs. The SR abstraction allows advanced storage features such as thin provisioning, VDI snapshots, and fast cloning to be exposed on storage targets that support them. If a XenServer host is unable to or takes too much time to read from or write to an SR, it can result in undue delays in the provisioning and maintenance (i.e., creation, deletion, cloning, connecting, resizing, etc.) of virtual disk images. This, in turn, can significantly slowdown VM accesses. To ensure that the user experience with VMs remains top-notch, administrators should continuously monitor the I/O throughput of each storage repository (SR) supported by a XenServer host and quickly isolate the slow SRs. This is where the XenDiskIOTest test helps. By continuously measuring and reporting how well each SR handles read and write requests, this test precisely pinpoints slow SRs, thus prompting administrators to probe into the reasons for the slowness and fix them.

Note:

The performance metrics reported by this test are enabled by default in the XenServer 6.1.0 Performance and Monitoring Supplemental Pack. In XenServer 6.2.0 however, these metrics, though part of the core product, are disabled by default, owing to performance reasons related to XenCenter. This means that, when monitoring XenServer 6.2.0, this test will not report any metrics by default. In such cases, to make sure that the test reports metrics, do the following:

  • Login to the XenServer host as root user.

  • Enable the metrics by issuing the following command from the CLI:

    xe-enable-all-plugin-metrics true

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
tot_throughputs Indicates the throughput of this SR. MB/Sec A high value indicates high throughput and rapid I/O processing by the SR. Compare the value of this measure across SRs to identify the SR with the lowest throughput.
read_throughputs Indicates the rate at which the host reads data from this SR. MB/Sec Ideally, the value of this measure should be high. A consistent drop in the value of this measure indicates a reading bottleneck in the SR. You can compare the value of this measure across SRs to identify that SR which is the slowest in processing read requests.
write_throughputs Indicates the rate at which the host writes data to this SR. MB/Sec Ideally, the value of this measure should be high. A consistent drop in the value of this measure indicates a writing bottleneck in the SR. You can compare the value of this measure across SRs to identify that SR which is the slowest in processing write requests.
tot_io_requests Indicates the rate at which I/O operations are performed by this SR. Requests/Sec This measure is a good indicator of the I/O processing capacity of the SR. A high value is hence desired for this measure. A consistent drop in this value could indicate a processing bottleneck. In such a situation, you can compare the value of the read_io_requests and write_io_requests measures of the corresponding SR to figure out where the bottleneck lies - in reading data from the SR? or in writing to the SR?
read_io_requests Indicates the rate at which this SR services read requests. Requests/Sec Ideally, the value of this measure should be high. A steady drop in this value indicates a slowdown in processing read requests. Compare the value of this measure across SRs to know which SR is the slowest in responding to read requests.
write_io_requests Indicates the rate at which this SR services write requests. Requests/Sec Ideally, the value of this measure should be high. A steady drop in this value indicates a slowdown in processing write requests. Compare the value of this measure across SRs to know which SR is the slowest in responding to write requests.
iowait Indicates the percentage of time the host's CPU was waiting for this SR to complete I/O processing. Percent A high value for this measure indicates that the SR is taking too long to complete I/O processing. This hints at a probable processing bottleneck with the SR.
latency Indicates the average time taken by this SR to process I/O requests. MilliSeconds A high value for this measure is a cause for concern, as it indicates that the SR is highly latent and takes too long to process I/O. Compare the value of this measure across SRs to identify the most latent SR.
queue_size Indicates the average number of I/O requests to this SR that are in queue for processing. Number If the value of this measure grows consistently, it indicates that the SR is unable to process requests quickly enough to clear the queue. The SR with the maximum number of queued requests could be experiencing a serious I/O processing bottleneck. To identify this SR, compare the value of this measure across SRs.
in_flight Indicates the number of I/O requests to this SR that are currently being processed. Number