eG Monitoring
 

Measures reported by ZFSDiskusageTest

Dataset is the generic name that is used to refer to the following ZFS components: clones, file systems, snapshots, and volumes. Each dataset is identified by a unique name in the ZFS namespace. Datasets are identified using the following format:

pool/path[@snapshot]

pool

Identifies the name of the storage pool that contains the dataset

path

Is a slash-delimited path name for the dataset component

snapshot

Is an optional component that identifies a snapshot of a dataset

A snapshot is a read-only copy of a file system or volume. A clone on the other hand is a writable volume or file system whose initial contents are the same as the snapshot from which it was created. Both snapshots and clones do not consume any disk space initially, but as and when changes are made to the underlying dataset, snapshots and clones start using disk space. This implies that the existence of too many snapshots/clones or the presence of large sized snapshots and clones can add significantly to the disk space consumption of a dataset, causing a serious contention for disk space resources on the host! To conserve disk space usage therefore, administrators often resort to configuring a quota limit for each dataset or enabling compression on a ZFS folder. But how will an administrator ascertain the effectiveness of these configurations? This is where the ZFSDiskusageTest test helps!

For every dataset on ZFS, this test reports the total space usage of the dataset, thus pointing you to those datasets that are rapidly eroding storage space. Alongside, the test enables administrators to keep track of the quota limit set for a dataset and the compression ratio achieved by a dataset, so that the impact of these configurations on the total disk space usage of the dataset can be effectively assessed; the results of this analysis can later be used to fine-tune the configurations! In addition, the test monitors the count of snapshots and clones created from each dataset and reports the space usage of these snapshots and clones, thus leading you to why a particular dataset is consuming too much space - is it because too many snapshots were created from that dataset? Is it because of the large size of the snapshots? Is it owing to incessant cloning of the snapshots? Or is it due to the large size of the snapshot clones?

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
available_space Indicates the amount of disk space currently available to this dataset and all its children, assuming no other activity in the pool. GB A high value is desired for this measure. You can compare the value of this measure across datasets to know which databse has very little space available.
used_space Indicates the amount of space currently consumed by this dataset and all its descendents. GB Ideally, the value of this measure should be low.

You can even compare the value of this measure across datasets to identify the dataset that is over-utilizing the disk space.

referred_space Indicates the total space currently allocated to this dataset. GB This is the sum of available_space and used_space.
used_space_per Indicates the percentage of space used by this dataset. Percent A low value is desired for this measure. A consistent rise in the value of this measure is a cause for concern, as it indicates gradual erosion of disk space by a dataset.

Compare space usage across datasets to know which dataset is consuming disk space excessively. To know why this dataset is hogging disk space, check out the value reported by the snapshot_used_space and clone_used_space measures for that dataset. This will indicate what is causing the space crunch - snapshots of the dataset? Or clones of the snapshots of the dataset? Based on this analysis, you may want to consider identifying and destroying some snapshots and/or clones - say, the ones that are no longer used actively - so as to free disk space.

You may also want to take a look at the value of the quota_space and the compression_ratio measures for that dataset to understand whether/not altering the quota and/or compression algorithm will help in reducing disk space usage of the dataset.

snapshot_num Indicates the number of snapshots currently available for this dataset. Number By correlating snapshot_num with snapshot_used_space you can understand whether too many snapshots of small sizes were created for the dataset or few snapshots of very large sizes.

In the event of a space crunch, you can also compare the value of the snapshot_used_space with that of the clone_used_space measure to know what is occupying too much space - snapshots? Or clones? Based on this analysis, you may want to consider identifying and destroying some snapshots and/or clones - say, the ones that are no longer used actively - so as to free disk space.

snapshot_used_space Indicates the total amount of disk space currently used by the snapshots of this dataset. GB
clone_num Indicates the number of clones currently associated with this dataset. Number By correlating clone_num with clone_used_space you can understand whether too many clones of small sizes were created for the dataset or few clones of very large sizes.

In the event of a space crunch, you can also compare the value of the snapshot_used_space measure with that of the clone_used_space measure to know what is occupying too much space - snapshots? Or clones? Based on this analysis, you may want to consider identifying and destroying some snapshots and/or clones - say, the ones that are no longer used actively - so as to free disk space.

clone_used_space Indicates the total amount of disk space currently used by the clones associated with this dataset. GB
compression_status Indicates the current compression status of this dataset.   ‘Compression’ is a feature of ZFS, which when turned on, saves disk space and improves performance of the system. Internally, ZFS allocates data using multiples of the device's sector size, typically either 512 bytes or 4KB. When compression is enabled, a smaller number of sectors can be allocated for each block.

If compression is enabled for the dataset, this measure will report the value On. If compression is disabled, this measure will report the value Off.

The numeric values that correspond to these measure values are listed below:

Measure Value Numeric Value
On 1
Off 0

Note:

By default, this measure reports one of the Measure Values listed in the table above. The graph of this measure however will represent the compression status using the numeric equivalents only.

compression_ratio Indicates the current compression ratio of this dataset. Ratio A consistent drop in this value is disconcerting, as it indicates that data blocks are not been compressed efficiently, thereby increasing disk space consumption. Under such circumstances, you may want to change the compression algorithm in use. LJZB is the default compression algorithm for ZFS. Specifically, it provides fair compression, has a high compression speed, has a high decompression speed and detects incompressible data quickly. The other options available are:

  • LZ4
  • GZIP
  • ZLE
A good alterative to LJZB would be LZ4. Tests have revealed that LZ4 averages a 2.1:1 compression ratio, while GZIP is much slower.
quota_space Indicates the current quota limit set for this dataset. GB Quota limits the amount of disk space a dataset and its descendents can consume. This property enforces a hard limit on the amount of disk space used, including all space consumed by descendents, such as file systems and snapshots.

If the load on the dataset is consistently high, you may want to increase the quota limit to ensure that there is no loss of data. Likewise, if the dataset is consuming space excessively owing to too many unused snapshots/clones, you may want to reduce the quota limit, so that administrators are discouraged from needlessly creating snapshots and clones.
total_space Indicates the total space allocated for this dataset. GB A value of this measure should be high.