eG Monitoring
 

Measures reported by TPDDiskTest

A disk that is currently offline or a disk that has failed will not be able to cater to the user requests thus causing prolonged delays in data access for users. Administrators hence have to continuously track the status and health of the disk so that abnormal health and status of the disk can be detected proactively and pre-emptively treated. The TPDDiskTest test helps administrators with this.

This test monitors the health and status of each disk available on the HP 3PAR Storage system as well as the capacity of each disk, using which any abnormalities can be detected before users start complaining of slowdowns.

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
healthState Indicates how healthy this disk currently is.   The values that this measure can report and their corresponding numeric values are discussed in the table below:

Numeric Value Measure Value
0 OK
1 Unknown
2 Degraded/Warning
3 Minor failure
4 Major failure
5 Critical failure
6 Non-recoverable error

Note:

By default, this measure reports the Measure Values discussed above to indicate the state of a disk. In the graph of this measure however, states are represented using the numeric equivalents only.

operationalStatus Indicates the current operational state of this disk.   The values that this measure can report and their corresponding numeric values are discussed in the table below:

Numeric Value Measure Value
0 OK
1 In Service
2 Power Mode
3 Completed
4 Starting
5 Dormat
6 Other
7 Unknown
8 Stopping
9 Stressed
10 Stopped
11 Supporting Entity in Error
12 Degraded or Predicted Failure
13 Predictive Failure
14 Lost Communication
15 No Contact
16 Aborted
17 Error
18 Non-Recoverable Error

Note:

By default, this measure reports the Measure Values discussed above to indicate the operational state of a disk. In the graph of this measure however, operational states are represented using the numeric equivalents only.

detailedStatus Describes the current operational state of this disk.   This measure will be reported only if the API provides a detailed operational state.

Typically, the detailed state will describe why the disk is in a particular operational state. For instance, if the operationalStatus measure reports the value Stopping for a disk, then this measure will explain why that disk is being stopped.

The values that this measure can report and their corresponding numeric values are discussed in the table below:

Numeric Value Measure Value
0 Online
1 Success
2 Power Saving Mode
3 Write Protected
4 Write Disabled
5 Not Ready
6 Removed
7 Rebooting
8 Offline
9 Failure

Note:

By default, this measure reports the Measure Values discussed above to indicate the detailed operational state of a disk. In the graph of this measure however, detailed operational states are represented using the numeric equivalents only.

dataTransmitted Indicates the rate at which data was transmitted by this disk. MB/Sec  
iops Indicates the rate at which I/O operations were performed on this disk. IOPS Compare the value of this measure across disks to know which disk handled the maximum number of I/O requests and which handled the least. If the gap between the two is very high, then it indicates serious irregularities in load-balancing across disks.

You may then want to take a look at the reads and writes measure to understand what to fine-tune the load-balancing algorithm for read requests or that of the write requests.

reads Indicates the rate at which read operations were performed on this disk. Reads/Sec Compare the value of this measure across disks to know which disk handled the maximum number of read requests and which handled the least. If the gap between the two is very high, then it indicates serious irregularities in load-balancing across disks.
writes Indicates the rate at which write operations were performed on this disk. Writes/Sec Compare the value of this measure across disks to know which disk handled the maximum number of write requests and which handled the least. If the gap between the two is very high, then it indicates serious irregularities in load-balancing across disks.
dataReads Indicates the rate at which data is read from this disk. MB/Sec Compare the value of these measures across disks to identify the slowest disk in terms of servicing read and write requests (respectively).
dataWritten Indicates the rate at which data is written to this disk. MB/Sec
busy Indicates the percentage of time this disk was busy processing requests. Percent Compare the value of this measure across disks to know which disk was the busiest and which disk was not. If the gap between the two is very high, then it indicates serious irregularities in load-balancing across disks.
avgReadSize Indicates the amount of data read from this disk per I/O operation. MB/Op Compare the value of these measures across disks to identify the slowest disk in terms of servicing read and write requests (respectively).
avgWriteSize Indicates the amount of data written to this disk per I/O operation. MB/Op
readHits Indicates the percentage of read requests that were serviced by the cache of this disk. Percent A high value is desired for this measure. A very low value is a cause for concern, as it indicates that cache usage is very poor; this in turn implies that direct disk accesses, which are expensive operations, are high.
writeHits Indicates the percentage of write requests that were serviced by the cache of this disk. Percent
averageResponseTime Indicates the time taken by this disk to respond to I/O requests. Microsecs Ideally, this value should be low. If not, it implies that the disk is slow.
queueDepth Indicates the number of requests that are in queue for this disk. Number A consistent increase in this value indicates a potential processing bottleneck with the disk.