Choosing the right Transit Architecture for Public Cloud
When choosing a transit network architecture for your cloud environment there are a few things we have learned that customers care about. Transit’s own ability to provide robust networking connectivity is important however, it is just one dimension of the end-state architectural requirement. Often customers are so focused on basic connectivity that they loose sight of the other dimensions needed. An analogy is building your house. You can’t focus solely on interior design, you also need to consider exterior design, garages, location, accessibility from highway, availability of basic resources like electricity, water sewer etc. If you build the perfect house but it’s under high voltage wires, it’s not good. Similarly, your transit architecture needs a balanced approach where all dimensions are combined to produce a well-rounded architecture.
Now let’s look at some of the aspects you need to consider:
- Robust connectivity
- Scale-out transit with pod-like characteristics
- End-to-End network state awareness
- Ability to easily service chain (for exsmple, insert a layer 7 firewall for inter-vpc traffic inspection)
- Operational visibility and troubleshooting
The transit should allow for robust connectivity and have awareness of, not only the transit routing, but also of VPC/VNET routing. It should be able to adapt to failures and/or updates in the architecture, without manual intervention. For example, when new VPCs/VNETs or CIDRs are added or a VPC/VNET is removed, the transit network should be auto-aware of the change and should ensure transit routes reflect the change.
Transit network architecture should not assume that the security posture of all VPCs and VNETs is going to be same, (i.e. the routing tables are going to be identical). For example, configuring 0.0.0.0/0 in VPCs to point to transit. Experience shows that application environments (e.g. VPCs, VNETs) always have varying requirements. For example, some app environments will have public IP based subnets, some app environments will need local egress to internet, some app environments may have direct peering in addition to transit connection. These are some of the examples where VPC/VNET postures may differ between across your cloud environment. Painting all routes the same is not going to result in a well designed transit network.
Scale-out transit with pod-like characteristics
One transit network is going to connect several pieces of the whole architecture, but it is not going to be entire infrastructure. Your public cloud architecture will have a needs for multiple transit networks, whether you are in single cloud or multiple cloud. For example, if you are in single cloud, single region, you will still need to provide isolation, control, security and operational visibility between applications, departments, lines of businesses, etc. The transit network architecture should allow you to scale out and have multiple transits even within a single region so each transit network can provide the isolation and control different applications and business units will require. The same applies to multiple regions or multiple clouds. You need a transit network design that has scale-out characteristic where you can create multiple pods and provide connectivity between them as needed.
End-to-End Network State Awareness:
The reality is you will end up with multiple transit networks, be it single region, multi-region or multi-region-multi-cloud. The true architecture will allow transit networks to be aware of each other and to adjust their connected networks based on what may be happening in another part of the network, be it in same or different region/cloud.
For example, if you have two transits in same region, one for Production and one for Development. Workloads attached to these transits should be able to talk to each other natively. Another example is if you have multiple transit networks in multiple regions and you add a new VPC/VNET to one of the transits. The right architecture will ensure the entire network has reachability to the new VPCs without any manual intervention.
Ability to easily service chain:
Transit should support a robust and modular services architecture where additional services like Next-Gen Firewall, IPS, IDS, DPI, etc. can be inserted without re-architecting any aspect of your deployment. Most transit options available in public cloud today serve one main purpose and that is connectivity. In most cases, you will have workloads that belong to different security domains/zones connected to the transit. This is where connectivity is needed but only after a next-gen firewall (NGFW) has done deep packet inspection on the traffic. The default transit options, such as AWS TGW or Azure do not allow easy insertion of a NGFW. In most cases, it’s either not be possible or the deployment considerably reduces the performance, introduces complexities such as SNAT which is impractical as it hides source IP of the packet. In addition, scaling out to multiple parallel active/active firewalls will be challenging. When choosing a transit, the architecture should allow you to:
- Easily insert NGFW without manual tinkering or manipulation of routes
- Remove the need for SNAT
- Maximize throughput performance
- Allow scale out A/A architecture
- Provide full visibility and control
- Ability to steer inter-VPC/VNET, egress, ingress and other traffic patterns to the NGFW
- Provide easy troubleshooting tools and visibility into packet flow
Operational visibility and troubleshooting
Last, but not least, operational visibility and troubleshooting capabilities are the most import aspects when choose a transit architecture. The ability to leverage familiar tools like ping, traceroute, packet capture will save you time and money every day. Not having visibility into routing, state, packets, connectivity, dynamic network maps, etc. will increase mean-time-to-resolution for every issue you encounter. The cost of maintaining the network will grow exponentially and the ROI will not be as expected. Most importantly, your top-tier engineers and architects will end up troubleshooting most level 1 issues as the first level of support does not have the skill set to troubleshoot native cloud constructs. Hence, you must provide level 1 support teams abstracted visibility into cloud constructs which makes sense to them and hides the complexities and differences of each cloud. You must provide a familiar troubleshooting toolset, which they have been relying on for years. Lastly, you need to give them an easy way to pull complicated configurations in simplified way.
Choosing the right transit network architecture is arguably, the most important part of your cloud network architecture. The objective of this article was to give you ways to analyze the options available to you in the marketplace and provide the basis for a transit network design that will future-proof your cloud environment.
Two key attributes you need in any transit architecture in my opinion are,
- A Repeatable Architecture, be it single cloud or multi cloud.
- Operational visibility, control, and troubleshooting capabilities that don’t require deep cloud knowledge.