eG Monitoring
 

Measures reported by NAUSDDiskTest

Disks form the basic storage device in the NetApp storage systems. ATA disks, Fibre Channel disks, SCSI disks, SAS disks or SATA disks are used, depending on the storage system model.

Data ONTAP assigns and makes use of four different disk categories to support data storage, parity protection, and disk replacement. The disk category can be one of the following types: Data disk - Holds data stored on behalf of clients within RAID groups (and any system management data) Global hot spare disk - Does not hold usable data, but is available to be added to a RAID group in an aggregate. Any functioning disk that is not assigned to an aggregate functions acts as a hot spare disk. Parity disk - Stores information required for data reconstruction within RAID groups. Double-parity disk - Stores double-parity information within RAID groups, if RAIDDP is used.

Administrators should closely monitor the space usage and the level of I/O activity of each of these disks, so that they can proactively detect a space crunch or an I/O latency and receive early warnings of inconsistencies in load-balancing across disks. The NetApp Disks test aids administrators in this endeavor. This test auto-discovers the disks used by the storage system and reports how well every disk uses the available space and processes I/O requests. This way, potential space contentions and I/O latencies can be isolated, and slow disks and those that are running short of space can be identified. In addition, the test also reports the current state of each disk and how busy each disk is, thus pointing administrators to broken disks and over-used disks. In the process, the test turns the spotlight on irregularities in load-balancing.

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
Disk_count Indicates the total number of disks in this disk group. Number This measure is applicable only for disk groups and not individual disks.
State Indicates the curent RAID status of this disk in this Storage system.   The values that this measure reports and their corresponding numeric values have been discussed in the table below:

Measure Value Numeric Value
partner 1
Present 2
Zeroing 3
Spare 4
Copy 5
Pending 6
Reconstructing 7
Broken 8

Note:

By default, this measure reports the above-mentioned Measure Values while indicating the current RAID status of this disk in this Storage system. However, the graph of this measure will be represented using the corresponding numeric equivalents i.e., 1 to 8.

Free_space Indicates the amount of free space that is currently available for use in this disk of this Storage system. MB A high value is desired for this measure.
Physical_space Indicates the total amount of space available in this disk of this Storage system. MB  
Used_space Indicates the amount of space that is already utilized in this disk of this Storage system. MB A consistent increase in the value of this measure could indicate that the disk space is getting slowly but steadily eroded.

Compare the value of this measure across all disks to identify the disks that are utiilizing disk space excessively.

Used_space_prcnt Indicates the percentage of space that has been already utilized in this disk. Percent
Total_transfers Indicates the rate at which data transfer is being initiated from this disk. Ops/sec  
User_reads Indicates the rate at which data or metadata associated with user requests is being retrieved from this disk. Ops/sec A consistent decrease in the value of this measure is indicative of a gradual slowdown in a user's ability to read from the disk. Compare the value of this measure across disks to know which disks service read requests slowly.
User_writes Indicates the rate at which data or metadata associated with user requests is being stored in this disk. Ops/sec
User_read_latency Indicates the time taken for retrieving data or metadata associated with user requests from this disk during the last measurement period. Msecs Very high values for these measures are indicative of the existence of road-blocks to rapid reading/writing by the Storage system. By observing the variations in these measures over time, you can understand whether the latencies are sporadic or consistent. Consistent delays in reading/writing could indicate that there are persistent bottlenecks (if any) in the Storage device to speedy I/O processing.
user_write_latency Indicates the time taken for a write operation on this disk during the last measurement period. Msecs
Disk_busy Indicates the percentage of time when there is atleast one outstanding request (i.e., read or write) to this disk. Percent Comparing the percentage of time that the different disks are busy, an administrator can determine whether the application load is properly balanced across the different disks.