eG Monitoring
 
Measures reported by VnxLunTest

A logical unit number (LUN) is a unique identifier used to designate individual or collections of hard disk devices for address by a protocol associated with a SCSI, iSCSI, Fibre Channel (FC) or similar interface. LUNs are central to the management of block storage arrays shared over a storage area network (SAN). LUN errors, poor LUN cache usage, and abnormal I/O activity on the LUNs, if not promptly detected and resolved, can hence significantly degrade the performance of the block storage array. This is why, it is important that LUN performance is continuously monitored. This can be achieved using the VnxLunTest test.

This test auto-discovers the LUNs in the VNX for Block system and reports the current state of each LUN, captures LUN errors, and measures the level of I/O activity on every LUN, so that administrators are notified of LUN-related problems well before they impact storage system performance.

Measurement Description Measurement Unit Interpretation
State Indicates the current state of this LUN.   If the state reported by this measure is Bound, it indicates that the LUN is currently in a bound state. A bind creates LUNs on a RAID GROUP. Binding a LUN involves the preparation of allocated storage space. This preparation is particularly important when storage capacity is being reallocated for reuse.

LUNs are bound after RAID GROUPS are created. LUNs are available for use immediately after they are created, but the bind is not strictly complete until after all the bound storage has been prepared and verified.

During the preparation step, the storage allocated to the LUN is overwritten with binary zeroes. These zeroes erase any previous data from the storage and set up for the parity calculation. When zeroing is complete, parity and metadata is calculated for the LUN sectors.

If the state reported by this measure is Not bound, it indicates that the LUN is currently in an unbound state.

The numeric values that correspond to each of the states discussed above are as follows:

Numeric Value State
1 Bound
0 Not bound

Note:

By default, this measure reports the above-mentioned States to indicate the state of the LUN. The graph of this measure however, represents the cache status using the numeric equivalents - 0 or 1.

Use the detailed diagnosis of this measure to view additional details of a LUN.
Hard_errors Indicates the number of hard errors on this LUN. Number Increase in the value of this measure indicates disk life is going to end or that the disk is about to fail.
Soft_errors Indicates the total number of uncorrected read and write errors on this LUN. Number Increase in the value of this measure indicates disk life is going to end or that the disk is about to fail.
Queue_length Indicates the average number of requests to this LUN that are in queue. Number A very high value could indicate a processing bottleneck on the LUN. By comparing the value of this measure across LUNs, you can quickly identify which LUN has too many pending requests - this LUN could probably be the one with the processing bottleneck.
Read_cache_hits Indicates the number of times read requests to this LUN were fulfilled by the read cache. Number A high value is desired for this measure.
Write_cache_hits Indicates the number of times write requests to this LUN were fulfilled by the write cache. Number A high value is desired for this measure.
Read_cache_misses Indicates the number of times read requests to this LUN were not serviced by the read cache. Number A low value is desired for this measure.
Read_hit_ratio Indicates the percentage of read requests to this LUN that were serviced by the cache. Percent Ideally, the value of this measure should be high. A low value indicates that many read requests are serviced by direct disk accesses, which is a more expensive operation in terms of processing overheads.
Write_hit_ratio Indicates the percentage of write requests to this LUN that were serviced by the cache. Percent Ideally, the value of this measure should be high. A low value indicates that data is often directly written to the disk, which is a more expensive operation in terms of processing overheads.
Read_requests Indicates the number of read requests made per second to this LUN. Reqs/Sec Comparing the value of these measures across LUNs will clearly indicate which LUN is the busiest in terms of the number of read and write requests handled - it could also shed light on irregularities in load balancing across the LUNs.
Write_requests Indicates the number of write requests made per second to this LUN. Reqs/Sec
Data_reads Indicates the rate at which data was read from this LUN. Blocks/Sec Comparing the value of these measures across LUNs will clearly indicate which LUN is the busiest in terms of the rate at which data is read and written - it could also shed light on irregularities in load balancing across the LUNs.
Data_writes Indicates the rate at which data was written to this LUN. Blocks/Sec
Total_io Indicates the rate of the I/O activity on this LUN. Reqs/Sec A consistent increase in the value of this measure for a LUN could hint at a potential overload condition.
Rebuilt Indicates the percentage of this LUN that has been rebuilt. Percent A rebuild replaces a failed hard disk within a RAID group with an operational disk. If one or more LUNs are bound to the RAID group with the failed disk, then, all the LUNs affected by the failure are rebuilt. A rebuild restores a LUN to its fully assigned number of hard drives using an available hot spare should a drive in one of the RAID groups fail. LUNs are rebuilt one by one. Each LUN is rebuilt by its owning Storage Processor (SP).

Using the value of this measure, you will be able to track the progress of the rebuild, and will be able to gauge how much longer it will take for the rebuilding to complete.

Bound Indicates the percentage of the binding process that is complete for this LUN. Percent A bind is an information organization, data security, and data integrity feature of a storage system. Binding a LUN involves the preparation of allocated storage space. This preparation is particularly important when storage capacity is being reallocated for reuse. This reuse of storage includes erasing any previous data found on the hard drives, and the setting of parity and metadata for the storage.

LUNs are typically available for use immediately after they are bound. However, the bind is not strictly complete until after all the bound storage has been prepared and verified. Depending on the LUN size and verify priority, these two steps may take several hours. Using the value of this measure, you will be able to track the progress of the binding function, and will be able to gauge how much longer it will take for the binding to complete.

Lun_capacity_mb Indicates the total capacity of this LUN. GB  
Lun_capacity_blocks Indicates the size of this LUN, in blocks. Blocks