AWS NAT Gateway and Private/Public Subnets

When working with AWS networking, you will often hear the terms 'public subnet' and 'private subnet'. However, if you go into the AWS console to create a subnet, you won't find any option to explicitly make it one or the other. So, what exactly makes a subnet public or private?

In this blog post, we will look at the differences between public and private subnets, see how they are defined by their routing, and understand how the AWS NAT Gateway fits into this architecture.

If you are completely new to AWS networking and want to learn the basics of setting up a VPC, feel free to check out my previous post linked below.

As always, if you find this post helpful, press the ‘clap’ button. It means a lot to me and helps me know you enjoy this type of content. If I get enough claps for this series, I’ll make sure to write more on this specific topic.

Public vs Private Subnets

The key difference between a 'public' and a 'private' subnet is simply its route to the Internet. It is not an inherent setting of the subnet itself, but a behaviour defined by the route table associated with it.

A Public Subnet is a subnet whose associated route table has a route directly to an Internet Gateway (IGW). This means any resource within this subnet, if it has a public IP address, can be directly reached from the Internet. This is what we built in the previous post; our lab-subnet-1 is a public subnet because we added a 0.0.0.0/0 route pointing to our IGW.

A Private Subnet, by contrast, is a subnet whose associated route table does not have a direct route to the Internet Gateway. Resources in a private subnet cannot be reached directly from the Internet, which makes them ideal for hosting backend services, databases, or any application that should be insulated from public traffic. This creates a problem - how do these private resources get security patches or download software from the Internet if they can't send traffic out? This is where the NAT Gateway fits into the architecture.

What is a NAT Gateway?

A NAT (Network Address Translation) Gateway is a highly available, AWS-managed service that allows resources in a private subnet to connect to the Internet, but prevents the Internet from initiating a connection with those resources.

You place the NAT Gateway in a public subnet, where it gets a private IP address from that subnet's range (e.g. an address like 10.200.1.x). You then associate a static public IP address (an Elastic IP, we will cover this shortly) with the NAT Gateway. After that, you create a route in the private subnet's route table that points all Internet-bound traffic (0.0.0.0/0) to the NAT Gateway.

When an instance in the private subnet tries to access the Internet, first, the traffic is routed to the NAT Gateway, which translates the original instance's private IP address into the NAT Gateway's own private IP address. Then, the traffic is forwarded to the Internet Gateway. The Internet Gateway translates the private IP address of the NAT Gateway into the public Elastic IP address associated with the NAT gateway before sending the traffic out to the Internet. This allows your private resources to initiate outbound connections while remaining secure. For all return traffic, the process happens in reverse.

💡

It's worth noting that AWS offers two types of NAT Gateways - 'public' and 'private'. The 'public' NAT Gateway, which is what we are describing here, is the most commonly used type. We will cover the specific use cases for a 'private' NAT Gateway at the end of this post.

Unless you are hosting a publicly available service where users from the Internet need to initiate traffic to your server, you will almost always place your instances in a private subnet. In this setup, the subnet's default route points to a NAT Gateway, not an Internet Gateway.

Elastic IP Address

We mentioned that we need an Elastic IP address to create a public NAT Gateway, but what exactly is an Elastic IP address? If you remember from our previous post, when we launched an instance and assigned it a public IP, we learned that this IP is not permanently yours. If you shut down that instance, the public IP address is released back to AWS, and you will not get the same one again.

On the other hand, an Elastic IP (EIP) address is a static, public IPv4 address that you allocate to your AWS account (as long as you don't delete it of course 😄). Instead of being temporarily assigned to an instance, it belongs to you until you choose to release it. Because you own the EIP, you have full control over it. You can associate it with an EC2 instance, and if you stop or terminate that instance, the EIP remains in your account. You can then re-associate that same EIP with a different instance at any time. This provides a fixed, reliable public IP address that is necessary for services like a NAT Gateway.

Allocating an EIP is as simple as navigating to Elastic IPs and selecting Allocate Elastic IP Address.

Please note that there is a charge for all Elastic IP addresses ($0.005 per IP per hour), whether they are in use (allocated to a resource, like an EC2 instance) or idle (created in your account but unallocated). So, make sure to delete them after you are done with your experiments.

💡

Disclaimer - Please be mindful when working in your AWS account. The Free Tier is generous, but it is possible to go over the limits and incur costs. Always double-check that you have deleted all resources when you are finished with your labs. I take no responsibility for any charges you may accrue.

Please note that when you want to remove an Elastic IP address from your account, you will not find a 'delete' option. The correct action in the AWS console is to select the IP and choose to 'Release' it.

NAT Gateway Example

Now, let's look at an example to see how all these components work together. We will build an architecture that includes.

Two subnets, one 'public' and one 'private'.
Two route tables, one for each subnet.
An Internet Gateway (IGW).
An Elastic IP address.
A NAT Gateway.

Our goal is to demonstrate the two different paths to the Internet. The instance in the public subnet will have a public IP address, and its route table will have a default route (0.0.0.0/0) pointing directly to the Internet Gateway. In contrast, the instance in the private subnet will have no public IP, and its route table will have its default route pointing to the NAT Gateway we create.

💡

It's important to be aware of the costs for a NAT Gateway. The pricing has two parts - you pay an hourly fee of about $0.05 for each hour the gateway is provisioned, and you also pay a data processing charge of about $0.05 for every gigabyte of data that passes through it.

Subnets and Route Tables

For this example, I have created two new subnets within our lab-vpc

A public subnet named lab-public-a with the CIDR block 10.200.1.0/24.
A private subnet named lab-private-a with the CIDR block 10.200.10.0/24.

I then created two new route tables, lab-public-rt-a and lab-private-rt-a, and associated them with their corresponding subnets.

If you look at the configuration for the public route table, lab-public-rt-a, you can see it has a default route (0.0.0.0/0) that points to our existing Internet Gateway (IGW). This is what makes the subnet 'public'. Please note that we covered creating an Internet Gateway and adding routes in detail in the previous post, so we will not be covering those steps again here.

💡

You may have noticed the -a at the end of a resource name like lab-public-rt-a. This is a common and helpful naming convention used in Cloud to indicate which Availability Zone a resource is located in. Since we are deploying these particular resources in Availability Zone 'a', I am naming them accordingly to make them easier to identify.

Elastic IP and NAT Gateway

First, navigate to the 'Elastic IPs' section in the VPC console and allocate a new Elastic IP address with the default settings.

Once the Elastic IP is available, go to the 'NAT Gateways' section to create the NAT Gateway. When you configure the gateway, give it a name and ensure you place it in the public subnet (lab-public-a). You must not place it in the private subnet.

From the dropdown menu, select the Elastic IP you just allocated. It will take a few minutes for the gateway to be provisioned, so you must wait for its status to become 'Available' before you continue.

Now that the NAT Gateway is available, we must tell our private subnet how to use it. To do this, we need to add a route to the private route table.

Navigate to the route table named lab-private-rt-a, which is associated with our private subnet. Edit its routes and add a new default route. As shown in the image, set the destination to 0.0.0.0/0 and for the target, select 'NAT Gateway' and then choose the lab-nat-gw we just created.

Once you save this change, any traffic from an instance in the private subnet that is destined for the Internet will be directed to the NAT Gateway.

Example and Testing

To see our configuration in action, I have launched two EC2 instances - one in our public subnet and one in our private subnet.

For the public instance, I launched it into the lab-public-a subnet and ensured 'Auto-assign public IP' was enabled. For its security group, I used the one from our previous post, which allows SSH access from my home IP address (security group named sg-007a9bcc0e8b09c2d)

For the private instance, the setup is slightly different. I launched it into the lab-private-a subnet and, importantly, set 'Auto-assign public IP' to 'Disable'. For its security group, I created a new one that allows inbound SSH traffic, but for the 'Source', instead of an IP address, I selected the security group ID of the public instance.

💡

When you use a security group ID as the source in a rule, you are allowing network traffic based on group membership, not on IP addresses. This simply means any instance using the 'source' security group is allowed to send traffic to any instance using the 'destination' security group on the specified port. This is useful because the rule keeps working even if the private IP addresses of your instances change.

First, let's connect to the public instance directly from my computer using its public IP address (35.176.221.255). Once logged in, running curl https://ipinfo.io/ip shows the instance's own public IP, confirming its traffic is going directly through the Internet Gateway.

[ec2-user@ip-10-200-1-18 ~]$ curl https://ipinfo.io/ip
35.176.221.255

[ec2-user@ip-10-200-1-18 ~]$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=115 time=0.993 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=115 time=1.05 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=115 time=1.02 ms

Next, let's test the private instance. I cannot SSH to it directly from the Internet because it has no public IP. Instead, I must first connect to the public instance and use it as a 'jump server'. To do this, I copied my .pem key file to the public instance, and from there, I can SSH to the private instance using its private IP address (10.200.10.26). Now, inside the private instance, I run the same command.

[ec2-user@ip-10-200-1-18 ~]$ ssh -i lab-key-1.pem ec2-user@10.200.10.26

[ec2-user@ip-10-200-10-26 ~]$ curl https://ipinfo.io/ip
18.168.59.54

ec2-user@ip-10-200-10-26 ~]$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=115 time=1.56 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=115 time=1.12 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=115 time=1.20 ms

This time, the command returns the Elastic IP address (18.168.59.54) that is attached to our NAT Gateway. This proves that the traffic from our private instance is being correctly routed through the NAT Gateway to reach the Internet. Simple ping tests from both instances also confirm that they can both reach the Internet through their respective routes.

NAT Translation

It is important to understand the translation steps that occur with a public NAT Gateway. When traffic from an instance in a private subnet passes through the NAT Gateway, the gateway first translates the source IP address of the instance to its own private IP, not its public one.

This traffic then continues to the Internet Gateway. It is the Internet Gateway that performs the final translation, swapping the NAT Gateway's private IP for the public Elastic IP before sending the packet out to the Internet. So, the final address translation to the public IP happens at the Internet Gateway.

NAT Gateway High Availability

A critical point to understand is that a NAT Gateway is a resource that resides in a single, specific Availability Zone. This is different from an Internet Gateway, which is automatically resilient across an entire Region. This design has important implications for high availability.

Consider the first diagram, where we have only one NAT Gateway deployed in Availability Zone A. The private subnets in all three AZs are configured to route their Internet-bound traffic to that single gateway. If the Availability Zone containing that NAT Gateway were to fail, the gateway would become unreachable. As a result, instances in all of your private subnets, even those in the healthy AZs, would lose their path to the Internet. This creates a single point of failure.

To build a fault-tolerant and highly available architecture, you should deploy a NAT Gateway in each Availability Zone that contains resources needing Internet access, as shown in the second diagram. You would then configure the route table for each private subnet to use the NAT Gateway in its own Availability Zone. For example, the private subnet in AZ A routes to the NAT Gateway in AZ A, and the private subnet in AZ B routes to the NAT Gateway in AZ B. With this setup, an outage in one AZ will not impact the ability of instances in the other AZs to reach the Internet.

Private NAT Gateway

A 'private' NAT Gateway works entirely within your private networks. Unlike the public type, it does not use an Elastic IP. Its function is to perform Network Address Translation for traffic that is going to other VPCs or to an on-premise data centre, translating an instance's private IP into its own private IP.

A useful scenario for a private NAT Gateway is when you have overlapping IP addresses between your VPC and an on-premises data centre or another VPC. For example, suppose a subnet in your VPC uses the same IP range as servers in your physical data centre; connectivity between them would fail because the routing is ambiguous.

To solve this, you can create a private NAT Gateway in a different, non-overlapping subnet. You would then route traffic from your instance in the overlapping subnet to the private NAT Gateway. The gateway translates the instance's conflicting source IP address into its own unique, non-overlapping IP address before sending the traffic to your on-premise network. This resolves the routing conflict and allows communication to flow correctly.

💡

Please note that you can't associate an elastic IP address with a private NAT gateway. You can attach an internet gateway to a VPC with a private NAT gateway, but if you route traffic from the private NAT gateway to the internet gateway, the internet gateway drops the traffic.