AWS SAA-C03: How to Design High-Performing and Scalable Network Architectures
A practical guide to VPC design, load balancing, global routing, private access, and hybrid connectivity for AWS Certified Solutions Architect – Associate (SAA-C03).
Why this domain matters in SAA-C03
SAA-C03 networking questions are rarely about packet-level trivia. They’re really checking whether you can pick an architecture that scales cleanly, holds up under failure, stays secure, and doesn’t turn into a maintenance headache. Usually, the best answer is the simplest managed design that avoids single points of failure, keeps private traffic private, and lines up the protocol and routing requirement with the right AWS service.
For exam purposes, keep four ideas in mind: Multi-AZ is the production baseline, public exposure should be minimized, private AWS service access usually beats internet egress when possible, and wording matters. “Static IPs,” “path-based routing,” “many VPCs,” “predictable hybrid performance,” and “private access to S3” each point toward very different services.
VPC foundations: CIDR, subnets, and routing
Amazon VPC is your isolated network boundary. It defines IP space, subnets, route tables, gateways, and security controls. Good designs start with CIDR planning, because overlapping ranges and undersized subnets become painful later. VPC peering does not support overlapping CIDRs, and Transit Gateway designs are also much easier when address space is planned cleanly across accounts and Regions.
A few practical rules matter a lot:
- AWS reserves 5 IP addresses in every subnet. Tiny subnets run out faster than many candidates expect.
- Plan for growth. Auto Scaling, interface endpoints, containers in
awsvpcmode, and failover capacity all consume IPs. - You can add secondary IPv4 CIDR blocks to a VPC later, but that does not erase poor original planning.
- IPv6 subnets use /64 blocks, and dual-stack design is increasingly common.
A subnet is public when its route table has a route to an Internet Gateway. But that does not mean every resource in it is internet-reachable. For IPv4 internet access, an instance also needs a public IPv4 address or Elastic IP, plus security group and NACL rules that allow the traffic. That distinction is a classic exam trap.
Route tables decide where traffic goes. AWS uses longest-prefix match, so a more specific route wins over a default route. And that matters a lot when you’re mixing local VPC routes, NAT, peering, Transit Gateway, and gateway endpoints in the same design.
Public subnet route table 10.0.0.0/16 local 0.0.0.0/0 igw-1234 Private app subnet route table 10.0.0.0/16 local pl-s3prefix vpce-gw-s3 0.0.0.0/0 nat-az-a TGW-attached subnet route table 10.0.0.0/16 local 172.16.0.0/12 tgw-1234
That route example shows the logic pretty clearly: local traffic stays local, S3 can stay private through a gateway endpoint, outbound IPv4 internet traffic can go through NAT, and other private networks can be reached through Transit Gateway.
Designing for Multi-AZ and getting the IPv6 basics right
For production workloads, I’d strongly recommend spreading subnets and targets across at least two Availability Zones. A very common pattern is to put load balancers and NAT Gateways in public subnets, application servers or containers in private app subnets, and databases in private data subnets. In most real-world designs, each AZ should have its own NAT Gateway so you don’t create a sneaky single-AZ dependency or rack up cross-AZ data charges.
With IPv6, there are two things you really want to keep straight. First, NAT Gateway is IPv4-only. Second, outbound-only IPv6 internet access from private subnets uses an egress-only Internet Gateway, not NAT. So in a dual-stack design, IPv4 and IPv6 may follow different outbound paths, and that’s completely normal.
A VPC with 10.0.0.0/16 and an IPv6 CIDR block | +-- AZ-a | +-- Public subnet -> IGW | +-- Private app -> NAT GW-a for IPv4, egress-only IGW for IPv6 | +-- Private DB -> no direct internet route | +-- AZ-b +-- Public subnet -> IGW +-- Private app -> NAT GW-b for IPv4, egress-only IGW for IPv6 +-- Private DB -> no direct internet route
This layout represents a resilient dual-stack network design. Public subnets handle internet-facing entry points, private application subnets use controlled outbound paths for IPv4 and IPv6, and database subnets stay isolated from direct internet exposure.
Stateless application tiers usually scale best when they sit behind load balancers and Auto Scaling groups. Stateful data should live in managed services where possible. That is the default SAA-C03 pattern because it improves resilience and reduces operational pain.
Private access, NAT, and VPC endpoints
If private instances need general outbound access to public endpoints, NAT Gateway is usually the right choice. It sits in a public subnet, uses an Elastic IP, and sends outbound IPv4 traffic out through the VPC’s Internet Gateway. It doesn’t allow random inbound connections back to those private instances.
But if the workload only needs supported AWS services, VPC endpoints are usually the better answer. They reduce exposure, often reduce cost, and keep traffic on AWS networking paths.
| Option | Best use | Key details |
|---|---|---|
| NAT Gateway | Outbound IPv4 access to public endpoints | Managed, scalable, per-AZ; hourly and per-GB cost; cross-AZ routing adds cost and risk |
| Gateway Endpoint | Private access to S3 or DynamoDB | No hourly charge; route-table based using AWS-managed prefix lists |
| Interface Endpoint | Private access to many supported AWS or partner services | Uses ENIs in subnets, consumes IPs, needs security groups, supports private DNS, has hourly and data charges |
Gateway endpoints are only for S3 and DynamoDB. Many other services use interface endpoints through AWS PrivateLink, but not every AWS service supports them. Interface endpoints also matter for subnet sizing because each endpoint creates ENIs in selected subnets.
Endpoint policies can further restrict access for supported services. That is useful in exam scenarios where the requirement says “private access” and “least privilege.”
Load balancer selection: ALB, NLB, and GWLB
Choose the load balancer by protocol and traffic behavior, not by habit.
| Load Balancer | Best for | Important exam clues |
|---|---|---|
| ALB | HTTP/HTTPS/gRPC applications | Host/path/header routing, redirects, WebSockets, WAF integration, internal or internet-facing |
| NLB | TCP/UDP/TLS at very high scale | Static IPs per AZ, optional TLS termination, low latency, commonly used when fixed addresses matter |
| GWLB | Transparent appliance insertion | For firewalls and inspection fleets, uses GENEVE, not a normal user-facing application balancer |
ALB is the right answer when the question mentions HTTP semantics like /api and /images, redirects, or host-based routing. NLB is better when the requirement says TCP, UDP, TLS, or static IP addresses. If the question asks for fixed global IPs rather than fixed regional IPs, Global Accelerator is usually stronger than NLB alone.
Both ALB and NLB can be internal or internet-facing. Internal load balancers are common for service-to-service traffic in private subnets. ALB also supports sticky sessions and Lambda targets in some use cases. NLB commonly preserves source IP in many deployment patterns, but do not treat that as an absolute in every target mode.
Health checks catch people out more often than they really should. If an ALB is returning 503s, I’d start by checking the health check path, whether the app is listening on the right port, whether the target security group allows traffic from the load balancer, and whether the targets are in the correct subnets.
How Route 53, CloudFront, and Global Accelerator fit into global traffic patterns
These services overlap in conversation more than in function.
| Service | Primary role | Best clue words |
|---|---|---|
| Route 53 | DNS routing | Weighted, failover, latency, geolocation, alias records |
| CloudFront | CDN and edge acceleration for HTTP/HTTPS | Caching, origin offload, static content, dynamic web acceleration |
| Global Accelerator | Static anycast IPs and optimized global pathing | Fast failover, TCP/UDP, global entry point, fixed global IPs |
Route 53 answers DNS queries; it is not a proxy. Failover is affected by TTL and client resolver caching, so DNS failover is not instantaneous. CloudFront is not just for static files; it also improves dynamic HTTP/HTTPS delivery, adds edge presence, and commonly sits in front of ALB or S3. Global Accelerator improves entry onto the AWS global network and is excellent when you need static anycast IPs or faster failover characteristics than DNS-only approaches.
If a question says global TCP/UDP application with static IPs, think Global Accelerator in front of regional NLBs. If it says global website performance and caching, think CloudFront. If it says weighted or latency-based DNS steering, think Route 53.
Connecting VPCs and hybrid networks with peering, Transit Gateway, and hybrid connectivity
VPC peering is simple and useful for a small number of one-to-one connections. But it is non-transitive, does not allow overlapping CIDRs, and does not let you transit through a peer’s IGW, NAT Gateway, or VPN. That makes it poor for large meshes.
Transit Gateway is the scalable hub-and-spoke option. It provides centralized routing between attachments according to TGW route tables, associations, and propagations. In other words, it enables controlled transitive connectivity; it does not automatically connect everything to everything.
| Need | Better fit |
|---|---|
| Two or three VPCs, simple direct connectivity | VPC peering |
| Many VPCs, multiple accounts, on-prem integration, segmentation | Transit Gateway |
For hybrid connectivity, Site-to-Site VPN is the fast, encrypted, internet-based option. Direct Connect is a private dedicated connection with more predictable performance, but it is not encrypted by default. If encryption is required, use VPN over Direct Connect or MACsec where supported. BGP is used with both Direct Connect and dynamic VPN designs, even though deep protocol tuning is outside associate-level scope.
A common enterprise pattern is Transit Gateway plus Direct Connect as primary connectivity, with Site-to-Site VPN as backup. In larger environments, Direct Connect Gateway is often used to connect Direct Connect to multiple VPCs or a Transit Gateway design.
Security, segmentation, and inspection
Security groups are stateful and allow-only. They are the main least-privilege control on ENI-backed resources. NACLs are stateless, processed in numbered order, and support both allow and deny rules. Because they are stateless, return traffic must also be allowed explicitly. That makes NACLs useful for broad subnet-level guardrails, but security groups usually do the real work.
A clean three-tier design usually ends up looking something like this:
- ALB security group: allow inbound 443 from the internet
- App security group: allow 443 or 80 only from the ALB security group
- DB security group: allow the database port only from the app security group
For centralized inspection, Gateway Load Balancer can insert firewall appliances transparently, often alongside Transit Gateway in an inspection VPC. That is the scalable answer when a question asks for appliance-based traffic inspection without brittle manual routing everywhere.
Troubleshooting patterns and exam elimination strategy
When a network design looks fine on paper but still doesn’t work, I usually check routing first, then addressing, then security, then DNS, and finally health checks.
- EC2 in a public subnet has no internet: verify IGW route, public IPv4 or Elastic IP, and outbound security rules.
- Private EC2 cannot reach S3: check whether a gateway endpoint exists, whether the route table has the S3 prefix-list route, and whether an endpoint or bucket policy blocks access.
- ALB unhealthy targets: verify target port, health check path, app listener, and security group rules from ALB to targets.
- VPN is up but traffic fails: check route advertisement or propagation, attachment route tables, and asymmetric routing.
AWS-native tools worth remembering: VPC Flow Logs, Reachability Analyzer, Route 53 Resolver query logs, CloudWatch metrics, and ELB access logs.
For exam elimination, I’d use this order: identify the protocol, decide whether the traffic has to stay private, figure out the scope, whether it’s AZ, Region, global, or hybrid, rule out answers with a hidden single point of failure, and then pick the managed option with the least exposure and the least operational overhead.
SAA-C03 rapid review: keyword-to-service map
| Requirement clue | Think first about |
|---|---|
| Path-based HTTP routing | ALB |
| TCP/UDP or static regional IPs | NLB |
| Static global anycast IPs | Global Accelerator |
| Private S3 or DynamoDB access | Gateway endpoint |
| Private access to supported AWS APIs | Interface endpoint |
| Outbound internet from private IPv4 subnets | NAT Gateway |
| Outbound-only IPv6 internet | Egress-only Internet Gateway |
| Many VPCs and centralized routing | Transit Gateway |
| Quick encrypted hybrid link | Site-to-Site VPN |
| Predictable private hybrid connectivity | Direct Connect |
| Global web acceleration and caching | CloudFront |
| DNS failover or weighted routing | Route 53 |
Final takeaways
The exam is really testing judgment. Public subnet does not mean public reachability. NAT Gateway is not for private AWS service access and does not handle IPv6. Gateway endpoints are only for S3 and DynamoDB. VPC peering is non-transitive. Direct Connect is private, not automatically encrypted. Route 53 failover isn’t instant, because DNS caching is always part of the story.
If you remember just one framework, make it this: identify the traffic type, identify the protocol, identify the scope, keep private traffic private, and eliminate single points of failure. That mindset turns most SAA-C03 networking questions from confusing to predictable.