eG Monitoring
 

Measures reported by NAUSDSysCompsTest

This test periodically monitors the processors, spare disks, Vfilers, and the DMA channels used by the storage system, and proactively alerts you to abnormalities such as the following:

  • Excessive CPU usage by the storage system;
  • Over-utilization of processors supported by the storage system;
  • Write latencies experienced by the NVRAM DMA transactions;
  • Unavailability of spare disks;

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
Cpu_busy Indicates the percentage of time for which the CPU time was busy performing system-level processing. Percent A high value indicates that the storage system is utilizing CPU resources excessively. A consistent increase in this value could indicate a potential CPU contention on the storage system.
Avg_processor_busy Indicates what percentage of time, on an average, a processor is busy processing requests. Percent A high value indicates that processors have been over-utilized in more than one instance. This is a cause for concern, as it reveals load-balancing irregularities and the need for additional processors to handle the load.
Total_processor_busy Indicates the total percentage of time all the processors were actively serving requests. Percent A high value indicates that processors have been over-utilized in more than one instance. This is a cause for concern, as it reveals load-balancing irregularities and the need for additional processors to handle the load.
Wait_latency Indicates the NVRAM DMA wait time per transaction in this storage system. Milliseconds When CP (consistency point) is triggered, Data ONTAP reads the journal of write requests from the NVRAM, and uses DMA (Direct Memory Access) to update the disk with the data. Direct memory access (DMA) is a feature that allows hardware subsystems to access system memory independently of the central processing unit (CPU).

Any latencies experienced by the DMA channel can slowdown writes to the disk, consequently degrading the storage system's write performance. This is why, a low value is desired for this measure.

Transaction_ops Indicates the rate at which the NVRAM DMA transactions are performed in this storage system. Ops/sec A consistent decrease in the value of this measure could indicate latencies. Any latencies experienced by the DMA channel can slowdown writes to the disk, consequently degrading the storage system's write performance.
Spare_status Indicates whether/not sufficient spare disk is available.   A hot spare disk is a disk that is assigned to a storage system but is not in use by a RAID group. It does not yet hold data but is ready for use. If a disk failure occurs within a RAID group, Data ONTAP automatically assigns hot spare disks to RAID groups to replace the failed disks.

At a minimum, you should have at least one matching or appropriate hot spare available for each kind of disk installed in your storage system. However, having two available hot spares for all disks provides the best protection against disk failure.

This measure indicates a value Yes if sufficient spare disk is available in this storage system and a value No if no spare disk is available for any filesystem.

The numeric values that correspond to the above-mentioned measure values are as follows:

Measure Value Numeric Value
Yes 1
No 0

By default, Data ONTAP issues warnings to the console and logs if you have fewer than one hot spare disk that matches the attributes of each disk in your storage system. You can change the threshold value for these warning messages by using the raid.min_spare_count option.

To make sure that you always have two hot spares for every disk (a best practice), you can set the raid.min_spare_count option to 2.

Setting the raid.min_spare_count option to 0 disables low spare warnings. You might want to do this if you do not have enough disks to provide hot spares (for example if your storage system does not support external disk shelves). You can disable the warnings only if the following requirements are met:

  • Your system has 16 or fewer disks.
  • You have no RAID groups that use RAID4.
Note:

By default, this measure reports the above-mentioned Measure Values while indicating whether sufficient spare disks are available in this storage system. However, in the graph of this measure, spare disk availability will be represented using the corresponding numeric equivalents 0 or 1.

VFiler_count Indicates the number of offline/inconsistent storage resources available across all vfilers in this storage system. Number MultiStore is also known as vFiler. A Unified Storage System's storage space could be divided into vFiler units. Each vFiler unit is run by a separate administrator, and is available on a separate network interface. One vFiler cannot view the storage space owned by other vFiler units (except for the special vFiler units "vFiler zero", which is the actual physical machine).