eG Monitoring
 

Measures reported by IbmStorwizeVdiskStsTest

A volume or a VDisk is a logical disk that the clustered system presents to a host connected over a Fibre Channel or Ethernet network. These VDisks enable administrators to more efficiently manage resources. If any of these VDisks is in an offline or degraded state, it can cause write data that has been modified to be pinned in the SAN Volume Controller cache. This prevents volume failover and causes a loss of input/output (I/O) access. I/O loss can also occur if the cache of a VDisk is corrupt. To prevent or at least minimize such losses, administrators need to swiftly detect the abnormal state of the VDisk and/or its cache and instantly initiate measures to remove the abnormality, so that normalcy is restored soon. This is where the IbmStorwizeVdiskStsTest test helps.

This test reports the current status of each VDisk of the IBM Storwize v7000 storage system and also reports the cache state of every VDisk, so that the abnormal state of the VDisk and/or the cache can be promptly detected and speedily resolved.

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
status Indicates the current status of this VDisk.   This measure reports the status of this VDisk as follows:

  • offline
  • online
  • degraded
  • A VDisk is offline and unavailable if one of the following takes place:

    • Both nodes in the I/O group are missing.
    • None of the nodes in the I/O group that are present can access the VDisk.
    • All synchronized copies for this VDIsk are in storage pools that are offline.
    • The VDisk is formatting.

    A VDisk is reported as degraded if any of the following occurs:

    • One of the nodes in the I/O group is missing.
    • One of the nodes in the I/O group cannot access all the MDisks in the storage pool that the VDisk spans. In this case MDisks are shown as degraded and the fix procedures for MDisks should be followed to resolve the problem.
    • The fast write cache pins data for one or more VDisks in the I/O group and is unable to perform a failback until the situation is resolved. An error log indicating that the cache has pinned data is displayed. Follow the fix procedures for this error log to resolve the problem. The most common causes of pinned data are the following:

      • One or more VDisks in an I/O group is offline due to an asymmetric failure and has pinned data in the cache. Asymmetric failures can occur because of Storwize® V7000 fabric faults or misconfiguration, back-end controller faults or misconfiguration or because repeated errors has led to the system excluding access to a MDisk through one or more nodes.
      • One or more VDisks in an I/O group is offline due to a problem with a FlashCopy® mapping.

    The numeric values that correspond to the above-mentioned Measure Values are as follows:

    State Numeric Value
    offline 0
    online 1
    degraded 2

    Note:

    By default, this measure reports the above-mentioned Measure Values while indicating the status of this VDisk. However, the graph of this measure will be represented using the corresponding numeric equivalents of the Measure Values as mentioned in the table above.

    The detailed diagnosis of this measure reveals the VDisk ID, the VDisk IO GROUP ID, the VDisk IO GROUP NAME, MDISK ID, MDISK NAME, the VDisk TYPE and the FAST WRITE STATUS of the VDisk. From the detailed diagnostics, you can glean the name of the I/O group to which the VDisk belongs and the MDisks (i.e., the management disks) in the storage pool that the VDisk spans. In the event that the VDisk is offline or degraded, you can use the I/O group and MDisk ID to investigate the reason for the degradation or unavailability of the VDisk - is it because the I/O group has a missing node? or is it because the MDisk is degraded?

    capacity Indicates the total capacity of this VDisk. TB  
    fast_write Indicates the cache status of this VDisk.   This measure reports any of the values listed below:

  • corrupt
  • repairing
  • empty
  • not empty
  • A cache state of corrupt indicates that the VDisk requires recovery by using one of the recovervdisk commands. A cache state of repairing indicates that repairs initiated by a recovervdisk command are in progress.

    The numeric values that correspond to each of the measure values listed above are mentioned in the table below:

    Measure Value Numeric Value
    corrupt 1
    repairing 2
    empty 3
    not empty 4
    Note:

    By default, this measure reports the above-mentioned Measure Values while indicating the cache status of this VDisk. However, the graph of this measure will be represented using the corresponding numeric equivalents of the Measure Values as mentioned in the table above.