AWS Monitoring helps you gain observability into your AWS environment
Amazon Elastic Compute Cloud (EC2) on the Amazon Web Services (AWS) platform provides re-sizable computing capacity to help you run and scale business applications in the cloud. EC2 enables users to provide resources in the form of virtual servers that are called instances.
There are different types of instances that provide different capacities for CPU, memory, storage, and networking. With EC2 instances, users can modify resource capacity in an agile way and launch instances in specific locations to match the regional demand.
Basic infrastructure-level metrics are collected by querying the CloudWatch API based on the polling intervals set. EC2 IT automations can also be integrated with other AWS services.
In spite of having multiple individual instances, you should keep track of the basic system-level metrics of your infrastructure. EC2 metrics fall into the following categories:
EC2 instances have many virtual CPUs, and tracking the CPU usage can help you with exact resource mapping based on your workload. Although CloudWatch monitors the utilization and processing capacity of an instance, it does not monitor the CPU usage of the hardware layer on which the instance is being hosted. T2/T3 instances are capable of providing processing power based on a baseline level.
This measures the number of CPU credits consumed by the instance. Usually, one CPU credit is equivalent to one minute of 100% CPU utilization.
This measures the number of earned CPU credits accrued by the instance. Credits are earned anytime the instance is running below its baseline CPU performance level.
This measures the number of surplus credits that have been consumed by the T2/T3 unlimited instance. When the CPU credit balance is exhausted, the instance will consume additional credits to maintain higher CPU usage.
This measures the number of consumed surplus credits that are not paid down by earned CPU credits and tracks the difference between the number of credits accumulated and the current credit balance.
Resource usage metrics are some of the most prominent host-level metrics for monitoring applications that have consistently high utilization levels.
This metric measures the percentage of allocated CPU units that are being used by the instance.
These metrics help you monitor the number of completed read and write operations on all your instance volumes. They can also determine if the performance degradation is the result of high IOPS, which causes bottlenecks.
These measure the number of bytes received by or sent out of all network interfaces.
This metric allows you to measure the number of times the instance metadata service was successfully accessed using a method that does not involve a token.
Amazon Elastic Block Store (EBS) is a scalable, high performance block storage service under EC2. The EBS storage volume provides persistent storage compared to an instance volume, which loses the storage volume when the instance stops working.
These metrics help you measure the count of the completed read and write operations for all EBS volumes attached to the instance within a specific period of time.
These metrics measure the bytes read and written for all EBS volumes attached to the instance within a specific period of time.
EBS balance percent
This metric shows the percentage of I/O or throughput credits remaining in the burst bucket.
Amazon Elastic Inference is a resource you can attach to your EC2 instances to accelerate your deep learning inference workloads. Through Elastic Inference metrics, you can monitor the connectivity and performance of your Elastic Inference accelerator connected to your EC2 instance.
This metric checks whether the Elastic Inference accelerator has passed a status health check in the previous minute. A value of zero (0) indicates the status check has failed, and a value of one (1) indicates the status check has passed.
This metric checks whether the connectivity to the Elastic Inference accelerator is active or has failed. A value of zero (0) indicates a failed connection, and a value of one (1) indicates a successful connection.
This metric helps you measure the memory of the Elastic Inference accelerator.
Amazon Elastic Graphics provides flexible, low-cost, high performance graphics acceleration for your Windows instances. With Elastic Graphics metrics, you can monitor the connectivity and performance of your Elastic Graphics accelerator connected to your EC2 instance.
GPU connectivity is the backbone of graphics acceleration, and this metric allows you to check whether the connectivity to the Elastic Graphics accelerator is active or has failed. A value of zero (0) indicates a failed connection, and a value of one (1) indicates a successful connection.
You will be able to check whether the Elastic Graphics accelerator has passed a status health check in the previous minute. A value of zero (0) indicates the status check has failed, and a value of one (1) indicates the status check has passed.
Similar to CPU utilization, the GPU memory utilization metric allows you to monitor the GPU memory used in MiB.
EC2 instance status checks help you check on the status of an individual instance and the AWS systems hosting it. They are available at one-minute intervals, giving you an accurate indication of an instance’s health. This lets you determine whether the problem is with the AWS infrastructure, the software, or the network configuration of the instance.
This metric helps you determine whether the instance has failed both the instance reachability check and the system reachability check in the previous minute.
This reports if the instance has failed the instance reachability check in the previous minute. Usually, these failures are due to problems outside of your control, such as power loss. This can likely be resolved by stopping and restarting an instance to switch it to a new host.
This metric reports if the instance has failed the system reachability check in the previous minute.
We have looked at several metrics that are vital for EC2 monitoring as well as tracking the health of your applications. EC2’s varied range of instances lets you create customized infrastructure suitable for any of the above use cases that allows you to scale, change and downsize your instances.
Write for Site24x7 is a special writing program that supports writers who create content for Site24x7 “Learn” portal. Get paid for your writing.
Apply Now