Availability Zones (AZs) – Utilizing the Capabilities of the AWS Global Network at the Near Edge

To understand the full benefits of utilizing the AWS global backbone, we must first start at the foundational units that make up the AWS cloud – AZs.

CLOS (leaf/spine) topology

The physical network fabric in AZs is a fully Layer 3 CLOS architecture, also known as a leaf/spine design. To simulate Layer-2 adjacency between EC2 instances in different parts of an AZ, AWS uses its own in-house developed Software-Defined Networking (SDN) platform.

This platform’s control and management planes work in tight coordination with ASICs on the network cards inside the physical servers hosting EC2 instances. These ASICs perform encapsulation and routing functions for the data plane at tremendous speed – something only custom-built hardware can do.

This means that within an AZ, the logical architecture of a customer’s VPC and subnets is completely divorced from the physical fabric. At the same time, due to the innovations AWS has made with custom silicon, performance and security are increased compared to non-virtualized networks.

Nitro

In 2017, AWS launched a new generation of EC2 instance types powered by a platform called Nitro, but its roots go back to 2013 with the Enhanced Network Adapter (ENA). The system contains multiple components:

A hypervisor that is a heavily customized version of KVM

A Trusted Platform Module (TPM) on the motherboard

A separate security chip that does things such as secure the BIOS code

Special I/O cards for storage and networking functions

It is these Nitro networking cards that we will focus on here. They are the foundation of network performance and security in AWS:

Figure 8.1 – Nitro NICs performing routing, encapsulation, and security functions in hardware

The preceding figure illustrates some of the essential functions the Nitro NICs perform for two EC2 instances in different parts of an AZ that want to communicate. In this example, both instances are a part of the same VPC, and on the same subnet (192.168.10.0/24).

From the perspective of the operating system inside both instances, they are on the same Layer 2 broadcast domain. When 192.168.10.20 opens a connection to .35, it issues an ARP broadcast like it normally would to discover the MAC address of .35. However, unlike a “real” Ethernet network, that broadcast never hits the network. The Nitro NIC intercepts it, performs an authenticated database lookup, and responds to it directly. All Layer 2 operations are managed in this way. That is why Layer 2 attacks such as poison ARP are non-sequiturs in AWS.

The next thing you will notice is the security group enforcement happening on the Nitro NICs. Security groups are stateful firewalls and they have no throughput limit, apart from whatever limit the EC2 instance type itself has. When the traffic leaves the physical server, the Nitro NIC further secures it using Authenticated Encryption with Associated Data (AEAD) algorithms, with 256-bit encryption.

Because all of this work is offloaded to specialized ASICs on the Nitro NICs, none of it imparts a latency or throughput penalty to the network flows.