eG Monitoring
 

Measures reported by AWSSGatewayTest

AWS Storage Gateway connects an on-premises software appliance with cloud-based storage to provide seamless integration with data security features between your on-premises IT environment and the Amazon Web Services (AWS) storage infrastructure.

AWS Storage Gateway offers file-based, volume-based and tape-based storage solutions:

  • File Gateway - File gateway is a type of AWS Storage Gateway that supports a file interface into Amazon S3 and that adds to the current block-based volume and VTL storage. File gateway combines a service and virtual software appliance, enabling you to store and retrieve objects in Amazon S3 using industry-standard file protocols such as Network File System (NFS). The software appliance, or gateway, is deployed into your on-premises environment as a virtual machine (VM) running on VMware ESXi. The gateway provides access to objects in S3 as files on a NFS mount point.

    File gateway also provides low-latency access to data through transparent local caching. File gateway manages data transfer to and from AWS, buffers applications from network congestion, optimizes and streams data in parallel, and manages bandwidth consumption.

  • Volume Gateway - Volume gateway provides cloud-backed storage volumes that you can mount as Internet Small Computer System Interface (iSCSI) devices from your on-premises application servers. The gateway supports the following volume configurations:

    • Cached Volumes - You store your data in Amazon Simple Storage Service (Amazon S3) and retain a copy of frequently accessed data subsets locally.

    • Stored volumes - If you need low-latency access to your entire data set (and not just the frequently accessed data set), you can configure your on-premises gateway to store all your data locally and then asynchronously back up point-in-time snapshots of this data to Amazon S3.

  • Tape Gateway - Tape Gateway provides a virtual tape infrastructure that scales seamlessly with your business needs and eliminates the operational burden of provisioning, scaling, and maintaining a physical tape infrastructure.

In order to ensure the peak performance of their mission-critical applications, administrators must make sure that the storage gateway used by and volumes provisioned for the on-premise applications are able to process I/O requests quickly and are sized commensurate to the current and anticipated load.

The AWSSGatewayTest test helps administrators with this analysis. This test auto-discovers the storage gateways configured on AWS and reports the I/O throughput, cache usage, and I/O latency of each storage gateway. In the process, the test pinpoints overloaded gateways and those that are experiencing slowness when processing I/O requests. With the help of the test, you can also judge how effectively/otherwise the cache is being used, and determine how the cache can be tweaked to improve performance.

Outputs of the test : One set of results for each storage gateway / volume (as the case may be).

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
Read_bytes If the GATEWAY FILTER NAME parameter is set to GatewayID, then this measure reports the amount of data that on-premise applications read from this storage gateway for all volumes in the gateway.

If the GATEWAY FILTER NAME is set to VolumeID, then this measure will report the amount of data that was read from this volume by on-premise applications.
KB If the value of these measures is consistently low for any gateway/volume, it indicates low throughput.

Here are some recommended best practices for optimizing gateway performance:

  • Add high performance disks such as solid-state drives (SSDs) and a NVMe controller.

  • Attach virtual disks to your VM directly from a storage area network (SAN) instead of the Microsoft Hyper-V NTFS.

  • Confirm that the virtual processors that are assigned to the gateway VM are backed by an equal number of cores and that you are not oversubscribing the CPUs of the host server.

  • You can add additional CPUs to the gateway host server.

  • When you provision disks in a gateway setup, we strongly recommend that you do not provision local disks for the upload buffer and cache storage that use the same underlying physical storage disk.

  • For volumes gateways, if you find that adding more volumes to a gateway reduces the throughput to the gateway, consider adding the volumes to a separate gateway. In particular, if a volume is used for a high-throughput application, consider creating a separate gateway for the high-throughput application. However, as a general rule, you should not use one gateway for all of your high-throughput applications and another gateway for all of your low-throughput applications.

Write_bytes If the GATEWAY FILTER NAME parameter is set to GatewayID, then this measure reports the amount of data that on-premise applications wrote into this storage gateway for all volumes in the gateway.

If the GATEWAY FILTER NAME is set to VolumeID, then this measure will report the amount of data read that was written into this volume by on-premise applications.
KB
Read_time If the GATEWAY FILTER NAME parameter is set to GatewayID, then this measure reports the time taken by on-premise applications read from storage volumes in this gateway.

If the GATEWAY FILTER NAME is set to VolumeID, then this measure will report the time taken by on-premise applications to read from this volume.
Secs An abnormally high value for these measures indicates an I/O processing bottleneck. You may want to investigate the slowdown further and isolate its root-cause. The best practices discussed in the Interpretation of the Read data and Write data measure can be employed to optimize gateway performance and avert such anomalies.
Write_time If the GATEWAY FILTER NAME parameter is set to GatewayID, then this measure reports the time taken by on-premise applications to write into all storage volumes in this gateway.

If the GATEWAY FILTER NAME is set to VolumeID, then this measure will report the time taken by on-premise applications to write into this volume.
Secs
Queue_writes If the GATEWAY FILTER NAME parameter is set to GatewayID, then this measure reports the amount of data waiting to be written to all volumes of this gateway.

If the GATEWAY FILTER NAME is set to VolumeID, then this measure will report the amount of data waiting to be written to this volume.
KB A high value of this measure or a steady increase in the value of this measure for a storage gateway/volume could indicate an I/O processing bottleneck.
Cloud_byte_download Indicates the amount of compressed data that all volumes of this gateway downloaded from AWS. KB This measure is reported only for each storage gateway, and not for each volume.
Cloud_byte_upload Indicates the amount of compressed data that all volumes of this gateway uploaded to AWS. KB This measure is reported only for each storage gateway, and not for each volume.
Work_store_percent Indicates the percent usage of this gateway's upload buffer. Percent This measure is reported only for cached volume gateways and tape gateways.

To prepare for upload to Amazon S3, a cached volume gateway and/or a tape gateway stores incoming data in a staging area, referred to as an upload buffer. Your gateway uploads this buffer data over an encrypted Secure Sockets Layer (SSL) connection to AWS, where it is stored encrypted in Amazon S3.

A value close to 100% for this measure indicates that the disk used by the storage gateway as the upload buffer is running out of space. This can happen if the gateway is unable to write data to Amazon S3 at the same pace at which it writes to the buffer. This in turn implies a bottleneck when uploading.

This can also happen if the disk is not sized right. The minimum disk space recommendation for the working storage upload buffer is 150 GiB and the maximum is 2 TiB.
Work_store_used Indicates the total number of bytes being used in this gateway's upload buffer. KB This measure is reported only for cached volume gateways and tape gateways.
Work_store_free Indicates the total amount of unused space in this gateway's working storage. KB This measure is reported only for cached volume gateways and tape gateways.
Upload_buff_free Indicates the total amount of unused space in this stored volume gateway's upload buffer. KB This measure is reported only for stored volume gateways.

To prepare for upload to Amazon S3, a stored volume gateway stores incoming data in a staging area, referred to as an upload buffer/working storage. Your gateway uploads this buffer data over an encrypted Secure Sockets Layer (SSL) connection to AWS, where it is stored encrypted in Amazon S3.

Adequate free space should be available in the working storage to enable the gateway to store all the incoming data before upload. A high value is hence desired for this measure. The minimum disk space recommendation for the working storage is 150 GiB and the maximum is 2 TiB.
Upload_buff_perc=Data usage of gateway's upload buffer
Upload_buff_used=Data used in gateway's upload buffer
Cache_hit_perc If the GATEWAY FILTER NAME parameter is set to GatewayID, then this measure reports the percentage of application reads served from this gateway's cache.

If the GATEWAY FILTER NAME is set to VolumeID, then this measure will report the percentage of read operations from this volume that are served from the cache.
Percent Ideally, the value of this measure should be above 80%. If not, then it means that many read requests are being serviced by directly accessing the data in AWS. This can increase I/O overheads and adversely impact application performance.
Cache_perc_used If the GATEWAY FILTER NAME parameter is set to GatewayID, then this measure reports the percent usage of this gateway's cache.

If the GATEWAY FILTER NAME parameter is set to VolumeID, then this measure reports what percentage of the gateway's cache storage is used by this volume.
Percent If the value of this measure grows steadily close to 100%, it denotes the excessive usage of that gateway's cache storage.

If the value of this measure is close to 100% for a volume, it implies that a particular volume is taking up too much cache space.

If the gateway's cache storage runs out of space, then the cache will no longer be able to hold frequently-accessed objects; this in turn will increase cache misses and related overheads. This is why, the cache storage has to be sized rightly. The recommended minimum cache size is 150 GiB and the maximum is 16 TiB.
Cache_perc_dirty If the GATEWAY FILTER NAME parameter is set to GatewayID, then this measure reports the percentage of this gateway's cache that has not been persisted to AWS.

If the GATEWAY FILTER NAME parameter is set to VolumeID, then this measure reports what percentage of the gateway's cache storage has not been persisted to this volume of AWS.
Percent As your applications write data to the storage volumes in AWS, the gateway initially stores the data on the cache storage before uploading the data to Amazon S3.

The value of this measure represents the amount of cached data that is yet to be uploaded to Amazon S3. If this value is very high, it could indicate that the gateway is having trouble uploading data to AWS. You may want to investigate the reasons for the same. In the process, you may also want to configure this test to report metrics and volume, and identify the exact volume on AWS to which maximum data has not been uploaded.
Tot_cache_size Indicates the amount of data stored in this gateway's cache. KB This measure is reported only for each storage gateway, and not for each volume.
Time_recover Indicates the time since the last available recovery point of this gateway's cache storage. Secs This measure is reported only for each storage gateway, and not for each volume.

A volume recovery point is a point in time at which all data of the volume is consistent. You can clone a volume or create a snapshot of it from its recovery point.
Cache_used Indicates the amount of data being used in this gateway's cache storage. KB This measure is reported only for each storage gateway, and not for each volume.
Cache_free Indicates the total amount of unused space in this gateway's cache storage. KB This measure is reported only for each storage gateway, and not for each volume.

Ideally, the value of this measure should be high.