eG Monitoring
 

Measures reported by VnxDiskTest

This test monitors the current state, overall health, and the I/O activity-levels of each disk in the EMC VNX Unified storage system. With the help of this test, administrators can not only identify failed disks, but also those that are error-prone and may fail any time, so that they can endeavor to avert the potential disk failure. In addition, the test also points administrators to disks that are busy processing I/O requests almost all the time. This way, the test sheds light on irregularities in the distribution of I/O load across disks, and prompts administrators to fine-tune the load-balancing algorithm, so as to prevent potential delays in data access. In addition, the test also proactively alerts administrators to probable space contentions in disks and excessive bandwidth consumption by disks, thereby enabling administrators to initiate pre-emptive actions.

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
State Indicates the current state of the disk.   The states of the disk along with the corresponding numeric values are indicated in the table below:

Numeric Value Measure Value
0 Failed
1 Off
2 Removed
3 Binding
4 Empty
5 Enabled
6 Expanding
7 Unbound
8 Powering up
9 Ready
10 Reduced power, Transitioning
11 Hot spare ready
12 Unknown
13 Formatting
14 Equlizing
15 Rebuilding
16 Full power
17 Low power
18 Unformatted
19 Unsupported

Note:

By default, this measure reports any of the above-mentioned Measure Values while indicating the status of the disk. However, in the graph of this measure, the same will be represented using their numeric equivalents only - i.e., 0 to 19.
Lun_count Indicates the number of LUNs that are sharing this disk. Number Use the detailed diagnosis of this measure to know which LUNs are sharing this disk.
Busy Indicates the percentage of time for which this disk was busy. Percent A value close to 100% is a cause for concern, as it indicates a potential I/O overload on the disk. If the problem persists, it is a sign that serious load-balancing irregularities exist and need to be looked into.
Hard_read_errors Indicates the number of hard read errors in this disk. Number An increase in the value of these measures indicates that the disk life is going to end or fail. By comparing the value of these measures across disks, you can identify the disk that will potentially fail.

Hard_write_errors Indicates the number of hard write errors in this disk. Number
Soft_read_errors Indicates the number of soft read errors in this disk. Number
Soft_write_errors Indicates the number of soft write errors in this disk. Number
Read_requests Indicates the rate at which read requests were made to this disk. Reqs/sec Compare the value of these measures across disks to isolate overloaded disks. This will also reveal irregularities in load balancing across disks.
Write_requests Indicates the rate at which write requests were made to this disk. Reqs/sec
Data_reads This measure indicates the rate at which data is read from this disk. MB/Sec Compare the value of these measures across disks to identify the slowest disk in terms of servicing read and write requests (respectively).
Data_writes Indicates the rate at which data is written to this disk. MB/Sec
Total_bandwidth This measure indicates the sum of data reads and data writes to this disk. MB/Sec Compare the value of this measure across disks to identify the disk that is consuming the maximum bandwidth.
Capacity Indicates the total size of this disk. GB  
User_capacity Indicates the amount of space on this disk that is assigned to bound LUNs. GB  
Used_percent Indicates the percentage of space in this disk that is currently in use. Percent Ideally, the value of this measure should be low. A consistent increase in this value could indicate a gradual, but steady erosion of space in the disk. A value close to 100% indicates that the disk is rapidly running out of space.
Read_retry Indicates the number of times read requests to this disk were retried. Number A low value is desired for this measure.
Write_retry Indicates the number of times write requests to this disk were retried. Number A low value is desired for this measure.
Remap_sector Indicates the number of sectors on this disk that were remapped to new locations on the disk due to read/write errors. Number A low value is desired for this measure.
Request_service_time Indicates the time taken by this disk to service requests. Secs A high value is typically indicative of a request processing bottleneck in the disk. Compare the value of this measure across disks to know which disks are experiencing significant latencies.