eG Monitoring
 
Measures reported by NAUSDVolTest

Volumes contain file systems that hold user data that is accessible using one or more of the access protocols supported by Data ONTAP, including NFS, CIFS, HTTP, FTP, FC, and iSCSI.

For users to be able to read from/write data into volumes quickly, adequate space must be available in the volumes and the I/O requests should be processed rapidly by the volumes. Slowdowns in data storage/retrieval can be attributed to storage space contentions experienced by one/more volumes or I/O processing bottlenecks. In the event of such slowdowns, administrators need to swiftly isolate the following:

  • Which volumes are over-utilized?
  • Which volumes are overloaded?
  • Which volumes are experiencing serious latencies?
  • When were these latencies observed most frequently - while reading or writing?
  • What type of operations registered the maximum latency - CIFS, NFS, or iSCSI?

This test provides accurate answers to these questions. With the help of these answers, you can quickly diagnose the root-cause of slowdowns when reading from/writing into a volume.

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
Volume_count Indicates the number of volumes that are currently highly utilized/slow/busy. Number This measure appears only for the Highly utilized, Slow and Busy volumes.

In the case of Highly utilized volumes, the detailed diagnosis of this measure if enabled, lists the names of the highly utilized volumes and the percentage of space that is utilized in each volume.

In the case of Slow volumes, the detailed diagnosis of this measure if enabled, lists the names of the slow volumes and the average latency i.e., the time taken to perform read/write operations on each volume.

In the case of Busy volumes, the detailed diagnosis of this measure if enabled, lists the names of the busy volumes and the rate at which operations were performed on each volume.

With the help of the detailed diagnosis information therefore, you can quickly identify the highly utilized, slow, and busy volumes. This measure is deprecated from eG v5.6.5.

State Indicates the current state of this volume.   The values that this measure can report and their corresponding numeric equivalents are shown in the table below:

Numeric Value Measure Value
0 Online
1 Creating
2 Restricted
3 Offline
4 Partial
5 Unknown
6 Failed

Note:

By default, this measure reports the above-mentioned Measure Values while indicating the current state of this volume. However, the graph of this measure will be represented using the corresponding numeric equivalents of the states as mentioned in the table above.
Error_volume Indicates whether/not this volume is error-prone.   Generally, errors may be caused when the volume is inconsistent, unrecoverable or invalid. A volume is considered to be inconsistent if there exists known inconsistencies in the associated file system. An increase in the inconsistencies will render the volume unrecoverable. Unrecoverable volumes cannot be accessed. If mirroring has been enabled, Data ONTAP will automatically access the mirrored data of the unrecoverable volume. A volume is said to be invalid if a vol-copy or Snapmirror initial transfer has been aborted. Such invalid volumes are generally partially created and cannot be recovered fully. Operation errors are taken into account if this volume is a Single Instance Storage (SIS) volume.

This measure reports a value Yes if this volume is error-prone and a value No if this volume is error-free.

The numeric values that correspond to the above-mentioned values are represented in the table below:

Numeric Value Measure Value
1 Yes
0 No

Note:

By default, this measure reports the above-mentioned Measure Values while indicating whether/not this volume is error-prone. However, the graph of this measure will be represented using the corresponding numeric equivalents of the states as mentioned in the table above.

The detailed diagnosis capability of this measure, if enabled, lists the type of the error. In the case of an SIS operation error, the actual SIS error message will also be displayed as part of the detailed diagnosis.

This measure is applicable only for individual volumes.

Used_percent Indicates the percentage of space that is utilized in this volume. Percent Ideally, the value of this measure should be low. A high value or a consistent increase in the value of this measure is indicative of excessive space usage in a volume.

This measure will be 0 for restricted and offline volumes.

For the Busy volumes, Slow volumes, and Highly utilized volumes, this measure will report the average space usage across the volumes (of that category).

Total_size Indicates the total size of this volume. MB The value of this measure will not include the WAFL reserve and the volume snapshot reserve.

This measure will be 0 for restricted and offline volumes.

For the Busy volumes, Slow volumes, and Highly utilized volumes, this measure will report the average size across the volumes (of that category).

Reserve Indicates the space that is reserved for overwriting snapshotted data in this volume. MB This space can be utilized only by space reserved LUNs and files and only when the volume is full.

This measure will be 0 for restricted and offline volumes.

For the Busy volumes, Slow volumes, and Highly utilized volumes, this measure will report the average reserved space across the volumes (of that category).

Reserve_actual_used Indicates the percentage of reserved space that is actually used by this volume. Percent A low value is desired for this measure.

This measure will be 0 for restricted and offline volumes.

For the Busy volumes, Slow volumes, and Highly utilized volumes, this measure will report the average reserved space usage across the volumes (of that category).

File_used Indicates the percentage of inodes i.e., files that were currently utilized in this volume. Percent A high value indicates that the inodes in the volume may get exhausted soon.

This measure will be 0 for restricted and offline volumes.
Total_ops Indicates the rate at which the operations (including read and write) were performed on this volume. Ops/sec This measure is a good indicator of how busy the volume is.

Comparing the value of this measure across volumes will enable you to quickly detect load-balancing irregularities (if any).
Write_ops Indicates the rate at which write operations were performed on this volume. Ops/sec  
Read_ops Indicates the rate at which read operations were performed from this volume. Ops/sec  
Avg_latency Indicates the average time taken by the WAFL filesystem to process all the operations performed on this volume. Microseconds The value of this measure excludes the request processing time and the network communication time of the volume.

A high value of this measure is a cause for concern, as it indicates a processing bottleneck.
Read_latency Indicates the average time taken by the WAFL filesystem to process the read requests of this volume. Microseconds The value of these measures exclude the request processing time and the network communication time of the volume.

If the Avg_latency of a volume is high, then you can compare the value of these measures for that volume to know when the latency occurred - while reading or writing?
write_latency Indicates the average time taken by the WAFL filesystem to process the write requests made to this volume. Microseconds
Read_data Indicates the rate at which data bytes were read from this volume. Bytes/sec  
write_data Indicates the rate at which data bytes were written to this volume. Bytes/sec  
Cifs_ops Indicates the rate at which the CIFS operations were performed on this volume. Ops/sec This measure is inclusive of all the CIFS operations i.e., read, write and other miscellaneous CIFS operations.

By comparing the value of this measure with that of the nfs_ops and San_ops measures for a volume, you can figure out which type of operation imposed the maximum load on that volume.
nfs_ops Indicates the rate at which the NFS operations were performed on this volume. Ops/sec This measure is inclusive of all the NFS operations i.e., read, write and other miscellaneous CIFS operations.

By comparing the value of this measure with that of the Cifs_ops and San_ops measures for a volume, you can figure out which type of operation imposed the maximum load on that volume.
San_ops Indicates the rate at which the SAN operations were performed on this volume. Ops/sec This measure is inclusive of all the block protocol operations i.e., read, write and other miscellaneous SAN operations.

By comparing the value of this measure with that of the Cifs_ops and nfs_ops measures for a volume, you can figure out which type of operation imposed the maximum load on that volume.
Cifs_latency Indicates the average time taken for performing the CIF operations (including read, write and other miscellaneous CIF operations) on this volume. Microseconds The value of these measures exclude the request processing time and the network communication time of the volume.

Ideally, the value of these measure should be low. If the Avg latency of a volume is very high, then, you can compare the value of these measures for that volume to determine the reason for the latency - is it becausing of processing bottlenecks experienced by CIFS operations? NFS operations? Or SAN operations?
nfs_latency Indicates the average time taken for performing the NFS operations (including read, write and other miscellaneous NFS operations) on this volume. Ops/sec
San_latency Indicates the average time taken for performing the block protocol operations (including read, write and other miscellaneous SAN operations) on this volume. Ops/sec