How to Integrate Firewalls with the AWS Transit Gateway (TGW)

I’ve been getting a decent amount of questions from my customers about the AWS routing construct, called the Transit Gateway, and lately the concept of in-line filtering and deep packet inspection has come up during the discussions of modeling new cloud implementation strategies that fit into the new transit model. Since there has been exponential growth in the capacities for intercloud routing, it naturally gives rise to conversations around egress control with two emergent camps of distributed and centralized. To get a clear understanding of what types of architectures are relevant for a given customer, I have to ask:

Why do you want to put your firewalls in the cloud?

I usually get answers in the neighborhood of the following:

  • My VPCs need independent internet connectivity, not tied to on-prem firewalls, which includes ingress and egress to the internet.
  • My mission critical applications are moving to the cloud so they require localized security controls.
  • My cloud has become the entry point for 3rd party vendors and customer sites connecting to the cloud.
  • My entire internet-working philosophy is becoming cloud-centric and I need a plan to integrate in-line inspection into the cloud stack. As of now, there is no logical point of governance or strategy behind the deployment and enablement of my cloud operations.
  • I don’t know how much stuff in the cloud and I am scared.

Nothing gets the morning started better than subscribing to the latest PAN-OS AMI in the AWS marketplace, but before we get ahead of ourselves, we need to look at all of the scenarios that a given organization may have executed on to get a firewall into the cloud datapath. They are described in ascending order of innovation…

1. The Empire Strikes Backhaul

In the long-held, traditional approach of inspecting all traffic to and from everything, a die-hard practice persists to this day where everything is brought from remote sites back into the corporate data center before being extended out into the internet. This is done primarily because the firewall has long been thought of as a hardware appliance that takes up four U’s of space in the rack and is the solution to all of your DPI needs.

So you bring all traffic back down to and use a giant behemoth of a firewall to examine all your cloud traffic in-line. Effective, but maybe not the most efficient way to solve for these security issues — you will be introducing a significant amount of latency to your connections, there are only so many tunnels you can build from the datacenter to the cloud and a direct line to your cloud provider will get expensive if you opt for multiple 10Gb lines to land you in the cloud. It is possible to aggregate several of them at the handoff point, but the complexity can also cost you extra troubleshooting time as well.

2. Putting firewalls in every VPC

Security in every spoke is an admirable philosophy. And sometimes there can be simplicity in replication, especially if you are working from a position of complete security coverage on every packet of traffic that traverses your cloud. If you have ever had a policy that requires you to have a firewall in every outbound datapath that leaves from a VPC, headed to the internet, you have seen the costs of your firewall licensing scale right along with your business.

The shortlist of tasks would be: deploy the firewalls, license them as appropriate, configure the interfaces, and create policies that limit application and data traffic flows as appropriate for each segment, user, server and application…. this seems like it can be manageable in concept when you have only one or two VPCs, but what happens when you have 10? 20? More? If you’re not using Panorama or Cisco Firepower to tackle these management tasks, your network engineers will become stretched very thin. Not to mention the licensing costs of a new virtual appliance every time your network expands.

3. Multiple VPCs connecting to Shared Virtual Firewall via IPSec Tunnels:

Attempting to centralize security services in a shared VPC can be a way to go, and it is a strategy that we have documented previously — we referred to it as the Shared Security VPC. This is a great option for those who have cloud infrastructure that is small and manageable with just one cloud engineer who has mastered the art of the IPSec tunnel management. Besides that and the one ( two if you go HA ) virtual firewall in the cloud, this is a strategy that can 
save money on licensing 
and have a lower management overhead. But beware the solutions that you build in your garage — they can suddenly be promoted to production!

If you are in any position that might scale beyond this type of implementation, this is something that you will need to steer clear of. Remember, each of these IPSec tunnels is limited to 1.5 Gbps and that is just to the termination point at the firewall. If you start to connect more than a few of those up, the performance of your firewall is going to tank like a blue chip on black Tuesday. Should this become an important connection supporting critical infrastructure, the fact that you need your IPSec tunnels to run the connections will make having one on standby a little tricky.

4. Running an HA pair of Firewalls for Centralized Egress through the TGW

Now with the release of the TGW, the modern hybrid networking paradigm is shifting into an era of positively exciting scalability. Because of this new interconnectivity made possible from advancement in AWS native architecture, new demands are being placed on cloud solution and network engineers to scale out the availability of their firewalls as well.

What we are seeing from the top advisors within the public clouds is a move back to a centralized model that makes the best use of the new transit service layer. This essentially recreates the Shared Security VPC and allows you to capitalize upon how the TGW easily manages VPC and VPN connections in one layer allowing you to direct traffic to on-prem, or to the internet, or both. You will still run into the BGP routing limitation and you will have to manually fill out your VPC spoke routing tables without the TGW orchestrator, but this is still a good start. The issues do not arise in this scenario until you try to move this into an HA implementation. Take a look at the graphic below:

At a glance, this looks like a great way to run a pair of firewalls across AZs to build yourself a centralized egress security VPC in HA, but if you take a closer look at the mechanics of the networking components used to build this configuration, you will understand why this is not an optimized solution.

The Transit Gateway that sends spoke traffic from VPC1 destined for the internet is not comprised of any stateful monitoring or load balancing functions and will route all traffic to the first entry of the first attachment that matches the routing rule for the traffic. It does not know if a firewall has gone down so it cannot update its own routing tables in the event of a firewall appliance going down. You can read a detailed analysis of the limitations of this design pattern here.

5. Trying to Balancing the Egress Load from the TGW using ECMP…

There is also another technique for creating a load balanced and highly available configuration which is one of the latest and greatest recommendations directly from AWS using ECMP from your TGW. ECMP is a traditional protocol that can aggregate different IPSec tunnels together and in this scenario, the power of multipath routing seems like it could be a viable answer for correcting the limitations of all the other designs we have seen so far…right?

Well… you can try it…

While in theory this will get you a little more bang for your TGW, you are still saddled with the bandwidth limitation of of the IPSec tunnels that you are stretching across your TGW attachment. There is however, one major setback associated with this topology configuration that may not be apparent. Please take a look at the diagram below:

A request packet that comes from an interior VPC and exits the cloud from an inline firewall appliance after traversing one of the ECMP connections will hit its designated web server located in a distant CDN, and when the request packets attempt to return to the originating source they come in on the wrong firewall and they are dropped. There is nothing within the request payload that will ensure that it comes back to the firewall that it went out on. Degradation in performance can be severe when this kind of connection is attempted but times out.

6. Using the Transit DMZ

With the advent of the TGW and the inherent limitations that come with it, we decided to hit the ground running and design a solution that would work with the TGW design model as well as solve all the problems that the industry has historically identified with integrating your firewall with the cloud. We have done this by decoupling the networking from the security functions and provided them with the ability to scale independently.

Using our Transit DMZ model, we have created a solution that provides 10Gbps of throughput, rescues firewall performance, eliminates the need for IPSec terminations and offers you the ability to deploy in HA pairs that can scale out as needed. We have simplified the network, in network security.

Once you have this deployed, you will have integrated your Firewall with the Transit Gateway and localized the residency of your in-line network security functions to the cloud. You will have a cloud network that is easy to manage and easier to scale. For a more detailed description of the Transit DMZ, please visit our docs page here.

Become the cloud networking hero of your business.

See how Aviatrix can increase security and resiliency while minimizing cost, skills gap, and deployment time.