Answers

How To Address Limitations of Cloud-Native Security Groups When Creating a Granular Security Policy

Security Groups and NSGs are often the primary network security construct used to secure workloads and applications in the cloud. In concept, these controls can create a very effective security posture leveraging Zero Trust principles to create granular restrictions on how different workloads communicate via ports and protocols. In practice, limitations in scale can prove incredibly challenging to deploy least-privilege-access security. “Overly permissive security groups” is one of the most common CSPM findings.

One of the primary reasons this breaks down is due to limitations in how policies are defined and associated scale limits for quotas. Security Groups, NSG (and ASGs) often work well within a VPC/VNET, but break down as soon as policy needs to be applied to leave a VNET.

In AWS, as an example, the default combined number of inbound and outbound rules per security group is 60. Reference.

There are also other restrictions that we need to be aware of:

  • Source and destination of a security group rule can either be another security group, a single, contiguous CIDR, or “prefix list” which is a pre-defined list of CIDRs.
  • Security groups as a source or destination must be in the same VPC.
  • The port needs to be a contiguous range of ports. (ex: 1000-2000)
  • If either port or CIDR are not contiguous, they need to be represented as multiple rules.
  • A workload can be a member of multiple security groups but the cumulative number of rules counts against the single quota for the instance.

Let’s use an example to see whether the default quota is sufficient. Assume that you have a 3-tier application where the front-end is a load-balancer, the app tier is a set of EC2 instances, and the database is PaaS. Let’s just look at the security group for the database tier as it houses the data and is one of the most sensitive workloads. With all of these services in the same VPC, we can create basic policies such as:

Name: app-to-db, SRC: app-sg, PROTO: TCP, PORT: 3306

The policy is efficient because we’re not using any IP addresses. Seems simple, and we’re not approaching our quota, but now imagine that that database is a shared service that multiple applications access. The database lives in a different VPC from the app tiers. Next, it needs to be accessed by administrators from a set of Jump hosts. Because many of these systems live in different VPCs, we can no longer represent the rules with a security group as the source. In addition to that, workloads when provision select random IP addresses from the subnet and are likely not contiguous. The resulting policy will look like this.

Name: app1-vm1-to-db, SRC: 10.1.0.9, PROTO: TCP, PORT: 3306

Name: app1-vm2-to-db, SRC: 10.1.0.11, PROTO: TCP, PORT: 3306

Name: app1-vm3-to-db, SRC: 10.1.0.14, PROTO: TCP, PORT: 3306

Name: app2-vm1-to-db, SRC: 10.2.0.29, PROTO: TCP, PORT: 3306

Name: app2-vm2-to-db, SRC: 10.2.0.44, PROTO: TCP, PORT: 3306

Name: adminjump-to-db, SRC: 10.4.0.15, PROTO: TCP, PORT: 3306

Now imagine a scenario where we’re trying to use Security Groups to secure outbound traffic to the Internet to 3rd party APIs, linux repos (updates.ubuntu.com), code repos (github.com), etc. This is one reason why by default, Security Groups allow all outbound traffic. It’s easy to see how these real world changes to the application architecture will require significant manual maintenance and can easily hit the quota limits. Security Groups struggle any time the traffic leaves the VPC.

While the examples above are specific to AWS, Azure and GCP have similar challenges. In Azure, Application Security Groups help solve the dynamic source/destination problem, but can only be used within a VNET. NSGs have higher rule scale quotas, but have more restrictive quotas in terms of the number of NSGs allowed per subscription. Changes to Azure rules can take a significant amount of time to propagate and apply to the workloads. In GCP, tags can be used with the VPC firewall, but the architecture of the global VPC aligns closely to single-VLAN designs on-prem and policy can quickly become complex without network-level isolation.

The Peering to Transit Migration Problem

With AWS, it used to be that Security Groups could reference other Security Groups if they were in a peered VPC, however with the move to transit architectures this puts customers in a challenging situation of evolving their network architecture, but degrading their security posture.

How to overcome Cloud-Native security group limits

Aviatrix solves this problem by providing dynamic security built-in to the network architecture with the Aviatrix Distributed Cloud Firewall and Aviatrix Secure Egress solutions.

Let’s start with Egress. For Internet-bound traffic, Aviatrix Secure Egress provides comprehensive control with minimal change to the network architecture. Aviatrix Secure Egress is a 1:1 replacement for Cloud Native NAT gateways, but with added visibility and security capabilities. Unlike legacy NAT instances, Aviatrix Gateways are managed in a PaaS-like model with fleet-wide visibility, vertical and horizontal autoscaling, one-click non-disruptive upgrades, and centrally managed policies. Immediately when deployed, the solution provides comprehensive visibility to outbound Internet traffic and standard Threat Prevention capabilities blocking communications to known malicious IP addresses such as Command and Control servers. Aviatrix Secure Egress then baselines which domains a workload or VPC/VNET is is communicating with and can will recommend and then enforce Zero Trust egress policy. Built-in Deep Packet Inspection with signature-based threat detection leveraging the industry standard Suricata IDS engine is currently in early-access preview.

For inter-VPC traffic, Aviatrix Distributed Cloud Firewall can secure communications between VPCs. The source and destination of Distributed Cloud Firewalling policies are dynamic “SmartGroups” that leverage cloud-native tags and attributes to constantly keep policy up to date as workloads change their IPs and horizontally scale. Aviatrix Distributed Cloud Firewall scales significantly higher than cloud-native controls and excels at enforcing policies between VPCs/VNETs. It maintains the simplicity of security groups in that the source and destination can be dynamic – with no need to update them as workloads change as long as they have the same tags and attributes.