eG Monitoring
 

Measures reported by XIOClusterTest

An XtremIO Storage Array can include a single X-Brick or a cluster of multiple X-Bricks. A cluster of multiple X-Bricks consists of:

  • Two or four X-Bricks

  • Two InfiniBand Switches

This test auto discovers the clusters of the target storage array and reports the current health, connection state and uptime. In addition, this test monitors the SSD space utilization of the cluster and helps administrators identify potential space crunch, if any. Also, this test helps administrators to figure out the cluster that is busy processing I/O requests along clusters, detect irregularities in the distribution of I/O load across clusters and thus enables administrators to initiate pre-emptive measures.

Outputs of the test : One set of results for each cluster on the EMC XtremIO Storage array being monitored

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
upTime Indicates the total time duration for which this cluster has been up since the last restart. Hours  
healthState Indicates the current health of this cluster.   The values reported by this measure and their numeric equivalents are available in the table below:

Measure Value Numeric Value
Healthy 0
Unknown 1

Note:

This measure reports the Measure Values listed in the table above to indicate the health of this cluster. However, in the graph, this measure is indicated using the Numeric Values listed in the above table.

connState Indicates the current connection state between the XtremIO Management Server (XMS) and this cluster.   The values reported by this measure and their numeric equivalents are available in the table below:

Measure Value Numeric Value
Connected 0
Unknown 1

Note:

This measure reports the Measure Values listed in the table above to indicate the connection state of this cluster. However, in the graph, this measure is indicated using the Numeric Values listed in the above table.

consistState Indicates the detection of data consistency error in this cluster.   This measure will report a value Healthy if the data consistency error is determined as non-existant and Unknown otherwise.

The values reported by this measure and their numeric equivalents are available in the table below:

Measure Value Numeric Value
Healthy 0
Unknown 1

Note:

This measure reports the Measure Values listed in the table above to indicate the detection of data consistency error in this cluster. However, in the graph, this measure is indicated using the Numeric Values listed in the above table.

usedPCT Indicates the percentage of SSD space utilized by this cluster. Percent A value close to 100 indicates that the SSDs in the cluster are running out of space.
freePCT Indicates the percentage of SSD space that is currently available for use in this cluster. Percent A high value is desired for this measure. A sudden/gradual decrease in the value of this measure is an indication for the administrators to either free up space in the SSDs or add additional resources to the cluster.
reads Indicates the number of reads made on this cluster per second during the last measurement period. Reads/sec Comparing the value of these measures across clusters will clearly indicate which cluster is overloaded - it could also shed light on irregularities in load balancing across the clusters.
writes Indicates the number of writes to this cluster during the last measurement period. Writes/Sec
dataReads Indicates the rate at which data is read from this cluster during the last measurement period. MB/Sec Compare the values of these measures across the clusters to identify the slowest cluster in terms of servicing read and write requests (respectively).
dataWritten Indicates the rate at which data is written to this cluster during the last measurement period. MB/Sec
avgReadSize Indicates the average amount of data read from this cluster per I/O operation during the last measurement period. MB/Op Compare the values of these measures across the clusters to identify the slowest cluster in terms of servicing read and write requests (respectively).
avgWriteSize Indicates the average amount of data written to this cluster per I/O operation during the last measurement period. MB/Op