eG Monitoring
 

Measures reported by NAUSDLunTest

This test auto-discovers the LUNs configured on the NetApp Unified Storage system, monitors the availability, state, and the processing ability of each LUN, and reports the following:

  • The number of busy and slow LUNs in the NetApp Unfiied Storage system
  • Which LUNs are currently offline?
  • Is any LUN experiencing a contention for storage space?
  • Is I/O load uniformly balanced across all LUNs, or is any LUN overloaded? Is it causing the LUN to receive an increased number of Queue Full responses?
  • Are the LUNs able to process the I/O requests quickly? Is any LUN experiencing processing bottlenecks?

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
Lun_count Indicates the number of LUNs in this category. Number This measure is applicable only to the slow and busy LUNs. This measure is used to detect the number of SLA violations that had occurred and helps you to understand the overall impact on the LUNs due to these SLA violations.

In the case of the slow LUNs, the detailed diagnosis of this measure will list out the name of the LUN and the average latency of the LUN.

In the case of the busy LUNs, the detailed diagnosis of this measure will list out the name of the LUN and the operation rate of the LUN.

This measure is deprecated from eG v5.6.5.

Is_Online Indicates whether/not this LUN is online.   This measure is applicable only for the individual LUNs. This measure reports a value Yes if this LUN is currently available online and a value No if this LUN is not available online.

The numeric equivalents corresponding to the above-mentioned values are listed in the table below:

Numeric Value Measure Value
0 No
1 Yes

Note:

This measure reports the Measure Values listed in the table above to indicate the current state of a LUN. However, in the graph of this measure, the same is indicated using only the Numeric Values listed in the above table.

Size Indicates the size of this LUN in the active file system. MB This measure will show the exact size of the LUN in case of the individual LUNs whereas the average size will be displayed in case of the slow/busy LUNs.
Size_used Indicates the currently used size of this LUN. MB A low value is desired for this measure. A high value indicates that the LUN is running out of space.

This measure will show the exact used size of the LUN in case of the individual LUNs whereas the average size that is used will be displayed in case of the slow/busy LUNs.

Read_ops Indicates the rate at which the read operations were performed on this LUN. Ops/sec A high value is desired for this measure. A consistent decrease in this value could indicate a processing bottleneck.

This measure will show the exact rate of read operations that were performed on each LUN in case of the individual LUNs. Whereas, the average rate of read operations will be displayed in case of the slow/busy LUNs.

Write_ops Indicates the rate at which the write operations were performed to this LUN. Ops/sec A high value is desired for this measure. A consistent decrease in this value could indicate a processing bottleneck.

This measure will show the exact rate of read operations that were performed on each LUN in case of the individual LUNs. Whereas, the average rate of read operations will be displayed in case of the slow/busy LUNs.

Total_ops Indicates the rate at which the operations (incuding the read and write) were performed on this LUN. Ops/sec A high value is desired for this measure. A consistent decrease in this value could indicate a processing bottleneck.

This measure will show the exact rate of read operations that were performed on each LUN in case of the individual LUNs. Whereas, the average rate of read operations will be displayed in case of the slow/busy LUNs.

Avg_latency Indicates the average time taken for executing an operation in this LUN. Millisecs A high value indicates that the LUN is taking too long to process the I/O requests to it.

Compare the value of this measure across LUNs to isolate the slow LUNs.

This measure will show the exact execution time for each operation on each LUN in case of the individual LUNs. Whereas, the average execution time for each operation will be displayed in case of the slow/busy LUNs.

Queue_full Indicates the rate at whiich the queue full responses were received on this LUN. Responses/sec This meassure is a good indicator for detecting sudden/co ordinated bursts of I/O from the initiators.

A Queue full condition signals that the target/storage port is unable to process more I/O requests and thus the initiator will need to throttle I/O to the storage port. Some operating systems like AIX may not handle repeated Queue full responses gracefully i.e., will not throttle the I/O requests appropriately leading to I/O errors. These conditions can also be alleviated by reducing the LUN queue depth setting appropriately.

This measure will display the individual Queue full responses for each LUN whereas the average Queue full responses in case of the slow/busy LUNs. This will help detect sudden and co-ordinated bursts of I/O from the initiators.

Read_data Indicates the rate at which data is read from this LUN. Bytes/sec A high value is desired for this measure.
Write_data Indicates the rate at which data is written to this LUN. Bytes/sec A high value is desired for this measure.
Queue_depth Indicates the queue depth of this LUN. Number Queue Depth is the number of outstanding I/O requests a LUN will issue or hold before the LUN can trigger a Queue Full response i.e., the number of I/O operations that can run in parallel on the LUN. This is useful when compared to the number of Queue Full responses triggered by the LUN. Queue depth is usually set too high and hence could contribute significantly to latency if improperly set. With respect to the Busy or Slow LUNs, this is the average for all Busy/Slow LUNs
Avg_read_latency Indicates the average time taken to execute a read request in this LUN. Millisecs A low value is desired for this measure. A high value indicates that the requests take too long to execute which directly affects the performance of the LUNs.
Avg_write_latency Indicates the average time taken to execute a write request on this LUN. Millisecs