Monitoring and Troubleshooting AWS Cloud

In this post, we will discuss a few ways to monitor the resources from Amazon Web Services. Each service that AWS provides has its own specific monitoring methods and this is mostly done with the help of Amazon CloudWatch.

Amazon CloudWatch monitors resources and applications deployed on AWS in real time. Amazon CloudWatch allows the collection and tracking of various metrics. A metric is a time-ordered set of data points that are made available to CloudWatch by other AWS service.

Alarms can be triggered when certain values of the metrics are crossed so that corrective action can be taken. In addition to this, dashboards are available with metrics for almost every AWS service that is used.

Amazon CloudWatch Logs allows to monitor and store logs from various sources. CloudWatch Logs can monitor the logs from EC2 instances, monitor for errors in the applications.

Some important concepts about CloudWatch Logs are:

  • Log stream – a sequence of log events that have the same source
  • Log group – a set of log streams that have the same policies with regards to retention, monitoring or access control

Amazon CloudWatch Logs Insights allows to interactively search and analyze the log data from CloudWatch Logs using a purpose-built query language. Amazon CloudWatch Logs Insights can automatically discover fields in the logs from various AWS services like Route 53 or VPC.

As mentioned, Amazon CloudWatch can have multiple sources of information and this diagram shows how other AWS services (just to name a few) are interacting with Amazon CloudWatch:

So, let’s see how CloudWatch can monitor the traffic going through your VPC using VPC Flow Logs feature.

VPC Flow Logs is a feature that allows the operator to receive information about the traffic incoming and outgoing a VPC. The log data can be sent to Amazon CloudWatch or to Amazon S3.

The feature can help the operator find what kind of traffic is reaching the VPC and to assist with troubleshooting cases where specific traffic does not reach the VPC.

A flow log has specific records that will specify the source/destination IP, source/destination port, protocol, the number of bytes and packets, the action associated with the traffic.

This is the diagram used for this exercise:

Currently, there is one VPC with two subnets and there is one EC2 instance in each of these two subnets. This is the VPC:

These are the two instances where one instance is in the public subnet(it has a public IPv4 address):

Before enabling the VPC flow logs, let’s check the network interfaces assigned to these two EC2 instances(we will need the network interfaces IDs later):

To enable flow logs, you just need to go through a few specific steps using this menu:

In the next menu, you will need to specify if you want to log the accepted connections, rejected connections or both. Along with this, you need to specify if you want to send the logs to an S3 bucket or to CloudWatch Logs service. The IAM role can be one existing already or you can create one:

In case the IAM role is not created, you can create one directly from the above menu by selecting the “Set Up Permissions” which will lead you to the below menu:

After the VPC flow log is enabled, in CloudWatch service, you should see the flow group:

If that is expanded, then two log streams should be available, one for each network interface that we have in the VPC. As mentioned above, we have two EC2 instances, each with only one network interface. As you can see, the log stream ID is formed from the network interface ID and the filter type(in this specific case, all, but it can be accept or reject as well):

The EC2 instance from the public subnet has a web server running, but the VPC has a network access control list that is blocking/denying HTTP access to port 80 from any source IP address:

However, ICMP and ssh to any IP addresses assigned to this VPC are allowed.

My IP from where I will try the ICMP/ssh/HTTP traffic is the following one:

After few pings, ssh, and HTTP connection attempts, here is the content of the log stream generated for the network interface attached to the EC2 instance from the public subnet:

As you can see, in the flow logs, it is displayed as the private IPv4 address assigned to this EC2 instance, not the public IPv4. This is how it is recorded in the VPC flow logs(the instance is tracked via the private IPv4 because an EC2 instance might or might not have a public IPv4, but it will for sure have a private one). This type of logging will capture the traffic between EC2 instances/subnets of the same VPC.

The above logs are just the logs for one minute(18:25 – 18:26).

Using CloudWatch Logs Insights, you can get the above information in a more readable format.

For instance, you can find how many rejections were from each IP. In the above log snapshot, it’s easy because there are only a few lines, but imagine that you need to have an idea from a long time interval where there could be thousands of possible hosts trying to access your VPC.

There are few predefined queries that can be used specifically for various types of logs:

Here is a better way to display how many rejections were there for each IP for the same 1-minute interval. For my IP, there were three:

One other useful thing is that I can see how much traffic was exchanged between my VPC and various IP addresses. This is for a 25 minutes interval during which intermittently I was pinging the EC2 instance from my laptop and at the same time I was pinging the Google DNS server from the EC2 instance:

In case you would want to have this information in a graphical way, you can create CloudWatch dashboards where interesting information can be displayed.

This is how a dashboard is created:

Then from the metrics section, choose Logs

And one of the predefined metrics, like IncomingBytes:

Then the above metric can be displayed on the dashboard. This is how the dashboard looks like after I added the metric related to the number of logs received for VPC Flow log:

Some other type of metrics can be used to created dashboards. This is another dashboard where specific EC2 metrics are displayed(in this particular case, these metrics are from the EC2 instance from the public subnet of the VPC):

And this would be the end of this article regarding Amazon CloudWatch that can help you to monitor AWS resources and perform troubleshooting.

Amazon CloudWatch can receive logs from different sources and then present those logs in a useful and easily readable format.

Become the cloud networking hero of your business.

See how Aviatrix can increase security and resiliency while minimizing cost, skills gap, and deployment time.

Related Topics