eG Monitoring
 

Measures reported by SolFireDrivTest

When a node is added to the SolidFire cluster, or new drives are installed to an existing node, the drives are available to be added to the SolidFire cluster. If a single drive in the SolidFire cluster is over-utilized or is unable to process I/O requests quickly, it can damage the user experience with the entire storage system. It is hence the responsibility of the storage administrator to keep an eye out for space contentions and processing bottlenecks with each of the drives, detect such anomalies even before they occur, and resolve them before users complain. The SolidFire Drives test helps the storage administrator discharge his duties efficiently.

This test auto-discovers the drives of the SolidFire storage system and reports the status, space utilization and processing ability of each of the drives. This enables administrators to proactively detect a potential slowdown in processing or a probable drive contention, identify which drive is contributing to these abnormal phenomena, and intervene to ensure that the problem is resolved before it spirals out of control.

Outputs of the test : One set of results for each Drive Type:Drive on the target SolidFire storage system.

First-level descriptor: Drive Type

Second-level descriptor: Drive

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
status Indicates the current state of this drive.   The values that this measure can report and their corresponding numeric values have been discussed below:

Measure Value Numeric Value
Active 0
Available 1
Erasing 2
Failed 3
Removing 4

Note:

By default, this measure reports the States listed in the table above to indicate the current state of this drive. The graph of this measure however, represents the same using the numeric equivalents only.
activeSessions Indicates the number of active iSCSI sessions on this drive. Number This measure will be available only for metadata drive type.
failedDieCount Indicates the number of failed hardware elements in this drive. Number  
lifeRemainingPercent Indicates the percentage of lifetime available in this drive. Percentage Indicator
powerOnHours Indicates the number of hours since this drive was powered on. Hours  
reallocatedSectors Indicates the number of bad sectors that were replaced in this drive. Number  
reserveCapacityPercent Indicates the percent of data available as reserve on this drive. Percentage  
avgLTReadData Indicates the average amount of data read for the lifetime from this drive per second during the last measurement period. MB/sec A consistent decrease in the value of these measures for a drive indicates an I/O processing bottleneck.
avgLTWriteData Indicates the average amount of data written for the lifetime to this drive per second during the last measurement period. MB/sec A consistent decrease in the value of these measures for a drive indicates an I/O processing bottleneck.
avgReadData Indicates the average rate at which data was read from this drive during the last measurement period. MB/sec A consistent decrease in the value of these measures for a drive indicates an I/O processing bottleneck.
avgWriteData Indicates the average rate at which data was written to this drive during the last measurement period. MB/sec A consistent decrease in the value of these measures for a drive indicates an I/O processing bottleneck.
totalBandwidth Indicates the average rate at which data was read from and written to this drive during the last measurement period. MB/sec  
avgReadIOPS Indicates the average rate at which read operations were performed on this drive during the last measurement period. IOPS A consistent decrease in the value of these measures for a drive indicates an I/O processing bottleneck.
avgWriteIOPS Indicates the average rate at which write operations were performed on this drive during the last measurement period. IOPS A consistent decrease in the value of these measures for a drive indicates an I/O processing bottleneck.
totalIOPS Indicates the average rate at which read and write operations were performed on this drive during the last measurement period. IOPS  
totalCapacity Indicates the total capacity of this drive. GB  
usedCapacity Indicates the amount of space utilized in this drive. GB A low value is desired for this measure.
freeCapacity Indicates the amount of space available for use in this drive. GB A high value is desired for this measure.
usedCapacityPCT Indicates the percentage of space utilized in this drive. Percent A value close to 100 is an indication that the drive is about to run out of space. You may want to consider more drives, in this case, to make more space.
freeCapacityPCT Indicates the percentage of space available for use in this drive. Percent A value close to 0 is an indication that the drive is about to run out of space. You may want to consider more drives, in this case, to make more space.
usedMemory Indicates the amount of memory current used by the node hosting this drive. GB