I recently reviewed one of our recent sales engagements with a F500 digital services company based in the San Francisco Bay Area. Of course, networking and security in cloud emerged as one of their biggest pain points. No surprise there.
The more I studied the engagement, the more I realized there was nothing extraordinary in the customer testimony at all. It played out entirely as expected. I’ll summarize their approach for you.
When Does the Time Clock Start?
Their first networking security pattern was to “lift and shift” their firewall into a hub VNet (or VPC for AWS and GCP), peer up their VNet spokes to this secure hub, then direct inter-VNet and outbound traffic into the firewall with static routes. Security groups were then used inside VNet, to provide microsegmentation between subnets and virtual instances.
Sound familiar? This “secure hub-and-spoke” or “virtual DMZ” architecture has become the de facto standard for cloud network and security design. I should know, as I was one of the first cloud architects to help proliferate it during my career at Microsoft Azure. More on that later. My point, for now, is that this customer did everything right because they were following all the documented best practices of the CSPs themselves.
Yet, despite the virtual DMZ being considered a “best practice” design among enterprises and CSPs alike, it is not perfect. In fact, this design will eventually become a ticking time bomb in your cloud network, causing a host of costly problems when it explodes. Unfortunately, this customer had experienced all the big issues: rising costs, scale challenges, growing complexity, and a painful lack of agility.
Warning Signs for High Stakes
Why does this pattern suddenly detonate for so many customers? What are the warning signs, the issues, and how to you fix it? Or better yet, how do you avoid sitting on a time bomb all together?
To answer these questions, let’s review the virtual DMZ pattern across four categories: cost, scale, complexity, and agility. These categories are ultimately all connected, but to fully answer the questions above, we need to carefully dissect each category. This will take time, my friends, because we have a lot of yarn to unravel.
Thus, I’d like to stretch out, dig deep, and address this topic across a set of four articles that can be consumed over time. The focus on this first article will be on the category of cost. This is probably the most important piece of the puzzle, because ultimately, in cloud, everything ends up being about cost – doing more with less, rapid innovation, freedom of experimentation, removing tech-debt, and quickly hitting economies of scale.
Show Me the Money
The most expensive infra component in the virtual DMZ model is its beating heart – the virtual firewall. In this design, it performs a host of critical functions, such as routing, stateful security, decryption, IDS/IPS, logging, and so forth. Notice that this firewall is tasked with doing many things, some of which on-prem firewalls don’t typically do. Also note that it is centralized and that it has been transplanted wholesale from the data center network into the cloud network.
Knowing that the virtual firewall is an expensive piece of the puzzle, traditional firewall vendors often discount their virtual SKUs during the total contract renewal phase. This makes sense to most customers, right? Why not keep your security stack consistent, and get a great discount while you’re at it?
Unfortunately, this sales strategy comes with hidden costs that don’t surface until further along in the cloud journey. First, it is common to bundle discounts into 3- or 5-year renewal contracts. Second, discounts often apply to perpetual licenses, and not to utility billing SKUs. Third, and most importantly, security teams are ultimately held responsible for the success of this purchase.
Public Cloud Customers: Read On
You might be asking yourself why any of these three points are concerning or noteworthy. In the traditional world of on-prem, these sales tactics usually break in the customer’s favor. In public cloud, however, they are decisively not in the customer’s best interest, and here’s why. Let’s take it one point at a time.
Locking into a long-term contract in public cloud creates a host of problems. It is very difficult to predict performance outcomes here, because of the fundamental differences between public cloud networks and bare metal or private cloud networks. One key difference is that public cloud networks are based on layer 3 flows (routed SDN), not layer 2 frames (switched SDN). The flow-based limits of each virtual instance are strictly enforced by the CSPs to keep their massive fabrics stable, and these limits tend to be absolute regardless of the type or stripe of virtual instance you deploy.
This means that as your virtual DMZ receives more traffic, your virtual firewall can be sitting pretty at 50% CPU and memory use, but its network stack (which the CSP controls), can be completely overwhelmed, forcing you to deploy yet more firewalls. However, your pricing may be locked in based on your existing set of licenses, and now those extra licenses you need will be dear indeed. Couple this with the fact that application growth in public cloud can be fast, unruly, and unpredictable, and the stage is set.
Cost or Control?
Which brings us to the second point, the pain of perpetual licenses in public cloud. One of the attractions of public cloud are consumption-based models, which are often found in cloud-native PaaS and SaaS solutions. These solutions offer superior scale and agility but can cause headaches for enterprises due to skill gaps or feature gaps.
Often, enterprises side with the familiar and go for the “lift and shift” approach, which is still built around perpetual licensing. This causes a lot of cost overruns if the virtual DMZ is either under-or over-used. For this reason, traditional security vendors are now offering PaaS-based or managed solutions that run the same platform in cloud as their on-prem cousins. But here, control and visibility are sacrificed. Many find themselves stuck between the proverbial rock and a hard place.
Designing for Scale
Finally, while it initially makes sense for security teams to manage the virtual firewall, many enterprises end up regretting this choice. This is because they overlook the routing complexity of the virtual DMZ model. Cloud fabrics do not natively speak dynamic routing protocols, and even if they did, many traditional firewalls do not have advanced routing stacks. Thus, static routes built on CSP logic must be used. This is simple enough for small scale designs, but as the cloud footprint grows, the overhead and complexity of this arrangement can become untenable for most security teams.
IT organizations realize, some too late, that it takes multiple teams working in concert to pull off a successful virtual DMZ architecture at scale; security teams, cloud networking teams, and WAN teams must all collaborate, often in real time, to pull off a successful change control window or a new deployment. All this collaboration comes at the very real cost of time and energy, something public cloud was supposed alleviate.
Embracing Failure to Launch Future Success
This exact scenario played out with this particular F500 customer. Their cloud deployment grew far more quickly than they initially anticipated, causing their virtual DMZ time bomb to detonate, which cost them thousands of dollars.
First, they found that the 3-year contract model was expensive and inflexible against the scale requirements of public cloud. Second, while they realized they could benefit from a consumption-based model, they could not find a managed service that fit the existing requirements set by their customers. Third, they wasted countless hours managing the complexity of the virtual DMZ, as their security team took dependencies on other IT teams for deployment and change control.
Now, many IT teams have the experience and skills to perceive these three challenges, but they move forward anyway. Why? Because the mantra of moving to cloud, from the top down, is that it is the application that matters. The network should just be “good enough” for now, and then will be fixed later. But the problem is, “good enough” only works when you don’t have any business-critical apps to support. When customers are ready to bet the farm on cloud, “good enough” in the network or security space will eventually fail them, in one way or another.
Where’s the Emergency Exit?
Obliviously the next set of questions is: How do I avoid this situation? What really is “best-practice” now? Well, some enterprises have embraced cloud-native solutions for networking and security, an important step forward in terms of both mindset and skill set.
However, the experience here can be “out of the frying pan, into the fire.” Managed services help address the cost issues above, but they do come with their own hidden costs, which usually fall into four buckets:
- They give end-users very little control and visibility
- They do not make charge-back simple or easy for multiple BUs or tenants
- Some of their features are still not enterprise-grade
- They are bespoke and do not work in other CSPs
So back to the rock and the hard place again. There has got to be a better way, right? Luckily, there is. Here, I am a strong advocate for embracing multicloud networking software (MCNS). A mature MCNS platform should offer the following advantages and capabilities:
1. It is deployed and managed like a PaaS, using automation and orchestration, but the customer owns the total solution.
Cost implications: Agility and scale are superior due to cloud-native fit and finish, yet the customer has full control and visibility of the network and all relevant policies within. Deployments can (and should) be automated and fit into existing CI/CD pipelines. Management overhead is reduced during all phases of the lifecycle.
2. It provides a consistent, uniform network that is cloud-agnostic and distributed.
Cost implications: Uniformity and consistency is key for cost savings at economies of scale, especially across multiple clouds. Distributed systems are a welcome departure from “lift and shift” point solutions in public cloud, helping to alleviate expensive bottlenecks while creating a far simpler scale motion.
3. Network security is intrinsic to the platform. It is agentless, dynamic, and programmable.
Cost implications: The “S” in MCNS means the platform is all software. If security is an inherent function of the network, then there is no need to deploy point solutions that are responsible for multiple functions. Security policy is decoupled from network policy; each can be scaled and managed programmatically and independently. Cross-team collaboration is enhanced via fine grained RBAC and IAM controls.
4. The platform is modular and can be introduced in a controlled, methodical approach.
Cost implications: Scalable, modular platforms help decrease risk and increase cost savings. The platform can be used as needed for specific functions, like edge, intra-cloud transit, inter-cloud transit, or security, without having to deploy it all at once.
In summary, what customers need is a hybrid software platform that can leverage the best of what cloud has to offer (via cloud-native orchestration) along with enterprise-class control and visibility (via an intelligent, agentless data plane). If you have yearned for a solution that can go above and beyond the aging virtual DMZ model, I highly recommend looking at the quickly growing world of MCNS.
Next up, we will look at the virtual DMZ pattern in terms of scale implications. Stay tuned!