Measures reported by AWSAmazonEC2Test
Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute capacity in the cloud. An EC2 instance is a virtual server in Amazon’s Elastic Compute Cloud (EC2) for running applications on the Amazon Web Services (AWS) infrastructure. Since users may run mission-critical applications on these EC2 instances, high uptime of the EC2 instances is imperative to the uninterrupted functioning of these applications and to ensure 100% user satisfaction with this cloud-based service. AWS administrators therefore, should frequently perform health checks on every instance, measure its load and resource usage, and capture potential failures and resource contentions, well before end-users notice and complain. This is exactly where the AWSAmazonEC2Test test helps!
This test monitors the powered-on state of each EC2 instance and promptly alerts administrators if any instance has been powered-off inadvertently. Additionally, the test also reveals how each instance uses the CPU, disk, and network resources it is configured with, thus providing early pointers to irregularities in instance sizing, and prompting administrators to make necessary amends. This way, the test makes sure that critical applications are always accessible to end-users and perform at peak capacity.
Output of the test : One set of results for each instance / auto scaling group / instance type / image ID in each region of the AWS cloud being monitored, depending upon the option chosen from the EC2 Filter Name drop-down
First-level descriptor : AWS EC2 region name
Second-level descriptor : EC2 instance ID / auto scaling group name / instance type / image ID
The measures made by this test are as follows:
| Measurement |
Description |
Measurement Unit |
Interpretation |
| instance_state |
Indicates the current powered-on state of this instance. |
|
This measure is reported only if InstanceID is the option from the EC2 Filter Name drop-down of this test.
The values that this measure can report and their corresponding numeric values are detailed in the table below:
| Measure Value |
Description |
Numeric Value |
| Running |
When the instance is ready for you, it enters the running state. |
1 |
| Pending |
When you launch an instance, it enters the pending state |
2 |
| Terminated |
When you no longer need an instance, you can terminate it, then it goes to terminated state. |
3 |
| Shutting down |
While terminate the instance, As soon as the status of an instance changes to shutting-down or terminated |
4 |
| Stopping |
When you stop your instance, it enters the stopping state |
5 |
| Stopped |
After exiting the stopping state, it enters the stopped state |
0 |
Note:
By default, this measure will report the Measure Values listed in the table above to indicate the current powered-on state of an instance. In the graph of this measure however, the same will be represented using the numeric equivalents only.
|
| total_volume |
Indicates the number of EBS volumes attached to this instance. |
Number |
This measure is reported only if the InstanceId option is chosen from the EC2 Filter Name drop-down of this test.
You can attach an EBS volumes to one of your instances that is in the same Availability Zone as the volume.
You can attach multiple volumes to the same instance within the limits specified by your AWS account. Your account has a limit on the number of EBS volumes that you can use, and the total storage available to you.
Using the detailed diagnosis of this measure, you can identify the volumes that are attached to this EC2 instance. |
| CPU_Credit_Usage |
Indicates the number of CPU credits consumed by this T2 instance / all T2 instances / all T2 instances created from this image ID during the last measurement period. |
Number |
This measure is reported only for individual T2 instances, the T2 instance type, and the image ID using which T2 instances (if any) were created.
A CPU Credit provides the performance of a full CPU core for one minute. Traditional Amazon EC2 instance types provide fixed performance, while T2 instances provide a baseline level of CPU performance with the ability to burst above that baseline level. The baseline performance and ability to burst are governed by CPU credits.
One CPU credit is equal to one vCPU running at 100% utilization for one minute. Other combinations of vCPUs, utilization, and time are also equal to one CPU credit; for example, one vCPU running at 50% utilization for two minutes or two vCPUs running at 25% utilization for two minutes.
Each T2 instance starts with a healthy initial CPU credit balance and then continuously (at a millisecond-level resolution) receives a set rate of CPU credits per hour, depending on instance size.
When a T2 instance uses fewer CPU resources than its base performance level allows (such as when it is idle), the unused CPU credits (or the difference between what was earned and what was spent) are stored in the credit balance for up to 24 hours, building CPU credits for bursting. When your T2 instance requires more CPU resources than its base performance level allows, it uses credits from the CPU credit balance to burst up to 100% utilization. The more credits your T2 instance has for CPU resources, the more time it can burst beyond its base performance level when more performance is needed. This implies that ideally, the value of the CPU credit usage measure should be low for an instance and the value of the CPU credit balance for that instance should be high, as that way, an instance is assured of more CPU resources when performance demands increase. By comparing the value of this measure across instances, you can precisely identify the instance that has used up a sizeable portion of its CPU credits. |
| CPU_Credit_bal |
Indicates the number of CPU credits that have been earned by this T2 instance / all T2 instances / all T2 instances created from this image ID. |
Number |
| CPU_Utilization |
Indicates the percentage of allocated EC2 compute units that are currently in use on this instance. |
Percent |
A value close to 100% indicates excessive usage of CPU by an instance. If the value of this measure is consistently high for an instance, it could indicate that the application running on that instance requires more processing power. In such a case, you may want to allocate more CPU resources to that instance. |
| Disk_ReadBytes |
Indicates the rate at which data was read from all disks available to this instance. |
KB/Sec |
Compare the value of this measure to identify the instance that is the slowest in responding to read requests. |
| Disk_ReadOps |
Indicates the rate at which read operations were performed on all disks available to this instance. |
Operations/Sec |
Compare the value of this measure across instances to know which instance is too slow in processing read requests. |
| Disk_WriteBytes |
Indicates the rate at which data was written to all disks available to this instance. |
KB/Sec |
Compare the value of this measure to identify the instance that is the slowest in responding to write requests. |
| Disk_WriteOps |
Indicates the rate at which write operations were performed on all disks available to this instance. |
Operations/Sec |
Compare the value of this measure across instances to know which instance is too slow in processing write requests. |
| Network_In |
Indicates the rate at which data was received by all network interfaces of this instance. |
KB/Sec |
Compare the value of these measures across instances to know which instance is consuming too much bandwidth. Then, compare the value of the Incoming network traffic and Outgoing network traffic measures of that instance to determine where bandwidth consumption was more - when receiving data over the network? or when sending data? |
| Network_Out |
Indicates the rate at which data was sent by all the network interfaces of this instance. |
KB/Sec |
| Status_Check_Failed |
Indicates whether a status check (system status check or instance status check) failed for this instance. |
|
Amazon EC2 performs automated checks on every running EC2 instance to identify hardware and software issues. These status checks are of two types: system and instance status checks.
If either of these status checks fails, then this measure will report the value Failed. If none of these status checks fail, then this measure will report the value Passed.
The values that this measure can report and their corresponding numeric values are listed in the table below:
| Measure Value |
Numeric Value |
| Failed |
1 |
| Passed |
0 |
Note:
By default, this measure reports the Measure Values above to indicate whether a check passed or failed. In the graph of this measure however, the same is indicated using the numeric equivalents only.
|
| Status_Failed_Instance |
Indicates whether/not this instance passed the EC2 instance status check in the last minute. |
  |
Instance status checks monitor the software and network configuration of your individual instance. These checks detect problems that require your involvement to repair. When an instance status check fails, typically you will need to address the problem yourself (for example, by rebooting the instance or by making instance configuration changes).
The following are examples of problems that can cause instance status checks to fail:
The values that this measure can report and their corresponding numeric values are listed in the table below:
| Measure Value |
Numeric Value |
| Failed |
1 |
| Passed |
0 |
Note:
By default, this measure reports the Measure Values above to indicate whether a check passed or failed. In the graph of this measure however, the same is indicated using the numeric equivalents only.
|
| Status_Failed_System |
Indicates whether/not this instance passed the EC2 system status check in the last minute. |
|
System status checks monitor the AWS systems required to use your instance to ensure they are working properly. These checks detect problems with your instance that require AWS involvement to repair. When a system status check fails, you can choose to wait for AWS to fix the issue, or you can resolve it yourself (for example, by stopping and starting an instance, or by terminating and replacing an instance).
The following are examples of problems that can cause system status checks to fail:
Loss of network connectivity
Loss of system power
Software issues on the physical host
Hardware issues on the physical host
The values that this measure can report and their corresponding numeric values are listed in the table below:
| Measure Value |
Numeric Value |
| Failed |
1 |
| Passed |
0 |
Note:
By default, this measure reports the Measure Values above to indicate whether a check passed or failed. In the graph of this measure however, the same is indicated using the numeric equivalents only.
|
|