eG Monitor
 

Measures reported by FCCluDiskTest

A cluster resource is any physical or logical component that has the following characteristics:

  • Can be brought online and taken offline.

  • Can be managed in a server cluster.

  • Can be hosted (owned) by only one node at a time.

One of the standard cluster resource type is the Physical Disk Resource Type. You use the Physical Disk resource type to manage disks that are on a cluster storage device. Each cluster disk will at any point in time be owned only by a single node in the cluster. Moreover, when configuring a service or application for a cluster, you can select the cluster disk the service/application should use.

If a cluster disk fails or is in an offline state for a long time, it might affect the functioning of the services/applications that rely on that disk for their functioning. Likewise, if a cluster disk runs short of space suddenly, once again the associated services/applications will be affected. To protect these critical services/applications from failure and to define robust fail-over policies for cluster disk resources, administrators will have to continuously monitor the state and usage of each of the cluster disk resources. This can be achieved using the FCCluDiskTest test. This test auto-discovers the cluster disks and tracks the state and usage of each disk, so that administrators are proactively alerted to abnormalities in the state and excesses in the usage of any disk.

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
State Indicates the current state of this cluster disk.   The values that this measure can report and the states they indicate have been listed in the table below:

State Measure Value
Online 100
Online pending 90
Inherited 80
Initializing 70
Pending 60
Offline pending 50
Unknown 40
Offline 20
Failed 0


If the cluster service detects that a cluster disk is not operational, it attempts to restart that cluster disk. You can specify the number of times the cluster service can attempt to restart a resource in a given time interval. If the cluster service exceeds the maximum number of restart attempts within the specified time period, and the disk is still not operational, the cluster service considers the disk to have failed. Typically, a failed disk will adversely impact the availability and performance of the services/applications to which that disk has been assigned.

To ensure high availability of services/applications, you can add the cluster disk and the services/applications that depend on that disk to a single cluster group and configure a fail-over policy for that group. Then, you can configure the failure of the cluster disk to trigger a group fail-over, so that the entire group is failed over to another node in the cluster.

The detailed diagnosis of this measure, if enabled, will indicate the path of the cluster disk, which node currently owns the cluster disk, the shared volume, and the owner group.

Total_space Indicates the total capacity of this cluster disk. GB  
Used_space Indicates the space in this cluster disk that is in use currently. GB Ideally, the value of this measure should be low. A high value is indicative of excessive space usage by a cluster disk.
Free_space Indicates the amount of space in this cluster disk that is currently unused. GB A high value is desired for this measure.
Percent_usage Indicates the percentage of the total capacity of this cluster disk that is utilized. Percent A value close to 100% is indicative of abnormal space usage. Compare the value of this measure across cluster disks to know disk is using space excessively. Before assigning storage to a cluster service/application, you may want to check this comparison to figure out which cluster disks have enough space to manage more services/applications.