AWS SAA-C03: How to Design Cost-Optimized Network Architectures

A practical guide to minimizing AWS network costs while preserving security, availability, and performance

For SAA-C03, cost-optimized networking is not about picking the cheapest-looking service in isolation. It is about designing the lowest-cost traffic path that still meets the requirement for security, availability, and operational simplicity. That is the real skill the exam is testing.

In AWS, the packet path is usually the cost path. If traffic crosses Availability Zones unnecessarily, hairpins through centralized egress, hits NAT when an endpoint would do, or leaves AWS when it could stay on the AWS network, the bill reflects that design choice every month.

A quick caution before we go further: AWS pricing varies by Region and changes over time. For implementation, always verify current pricing with AWS pricing resources and cost estimation tools. For the exam, focus on relative cost patterns and architectural fit rather than memorizing exact numbers.

How AWS network billing actually works

Most AWS networking spend comes from five buckets:

Data transfer — internet egress, inter-AZ, inter-Region, and some inter-VPC paths.
Hourly service charges — NAT Gateway, interface endpoints, Transit Gateway attachments, load balancers, Route 53 Resolver endpoints, AWS Network Firewall endpoints, VPN connections, and more.
Per-GB processing charges — NAT Gateway, Transit Gateway, interface endpoints, Network Firewall, and similar managed data-plane services.
Request-based charges — CloudFront requests, Route 53 queries, WAF requests, S3 requests, and related service charges.
Observability costs — VPC Flow Logs delivery and storage, CloudWatch Logs ingestion, S3 storage, Athena queries, and cost and usage analysis.

Same-AZ traffic is often cheaper than cross-AZ traffic, but do not turn that into a universal rule. Some same-AZ flows still incur service processing charges. Likewise, cross-AZ pricing depends on the service pair and path. The safe exam habit is this: if traffic travels farther or through more managed hops, it usually costs more.

A few high-yield billing facts matter a lot:

NAT Gateway: hourly charge plus per-GB processing. It is IPv4-only and does not accept unsolicited inbound connections.
Gateway endpoints: available only for S3 and DynamoDB. They do not have hourly or per-GB endpoint charges.
Interface endpoints: billed per endpoint, per AZ, plus data processing.
Transit Gateway: per-attachment hourly charges plus per-GB data processing; inter-Region TGW designs add more transfer considerations.
ALB/NLB: hourly usage plus LCUs for ALB or NLCUs for NLB.
Site-to-Site VPN: hourly connection charges plus transfer charges.
Direct Connect: port-hours and data transfer out, plus possible partner charges.

Locality, routing, and the hidden cost of “clean” designs

A subnet is generally considered public when its route table includes a route to an internet gateway and resources have the required public IP association to be internet reachable. A subnet without that path is generally private, but private does not mean “cheap.” Private subnets often rely on NAT, endpoints, TGW, VPN, Direct Connect, or egress-only internet gateways, and each path has different cost behavior.

Route tables are where cost decisions become architecture. A private subnet that sends all traffic to a NAT Gateway is more expensive than one that sends only general internet traffic to NAT while routing S3 privately through a gateway endpoint.

Example private route design

IPv4 default route: 0.0.0.0/0 -> nat-gw in same AZ
S3 route: automatically added when associating the gateway endpoint to the route table

That “same AZ” detail matters. In multi-AZ designs, each private subnet should usually route to a NAT Gateway in its own Availability Zone. If a subnet in AZ-a uses a NAT Gateway in AZ-b, you create cross-AZ transfer and a larger failure domain.

The same locality principle applies elsewhere:

Keep chatty app tiers close to the data they use most.
Be careful with centralized inspection that forces every flow through a hub VPC.
Review load balancer target placement and cross-zone behavior by load balancer type.
Use multi-Region only when the business requirement justifies the transfer and replication cost.

For IPv6, remember the exam distinction: egress-only internet gateway provides outbound-only IPv6 internet access and does not perform NAT. NAT Gateway is for IPv4.

NAT, endpoints, and the “do I really need internet egress?” decision

The best NAT bill is often the one you never create. Start with the traffic question:

Does the workload need general internet access?
Or does it mostly need AWS service access?
Could dual-stack or IPv6 reduce IPv4 NAT dependence?

NAT Gateway is the usual managed answer for outbound IPv4 access from private subnets. It scales well and has low operational overhead, but you pay hourly plus per-GB processing. In production, it is often the right answer when workloads need generic outbound internet access and you want managed availability. In multi-AZ architectures, deploy one per AZ and keep routes local.

NAT instances can be cheaper for very low, predictable throughput, but they are not managed. You must disable source/destination checks, enable IP forwarding, configure iptables or nftables, patch the OS, harden the instance, monitor throughput, and build failover yourself. AWS no longer provides a maintained managed NAT AMI. For the exam, NAT instance is usually a distractor unless the scenario strongly emphasizes lowest possible cost, low traffic, and acceptance of operational burden.

Gateway endpoints are the obvious savings move for S3 and DynamoDB. They are VPC constructs, route-table based, and AWS adds the managed route when you associate the endpoint with route tables. No hourly endpoint fee. No per-GB endpoint processing fee. Standard S3 or DynamoDB service charges still apply, but you avoid NAT processing for those paths. One caveat: S3 gateway endpoint access is Region-scoped; cross-Region S3 patterns need separate review.

Interface endpoints use AWS PrivateLink and create ENIs in your subnets. They support many AWS services and some partner or custom services, but they are billed per endpoint per AZ plus data processing. That per-AZ duplication is why endpoint sprawl gets expensive fast. They also bring design details: security groups on the endpoint ENIs, private DNS behavior, and sometimes the choice between distributed endpoints in every VPC versus centralized shared-access patterns.

Mini comparison

Need S3 or DynamoDB privately from private subnets? Usually use a gateway endpoint.
Need private access to supported AWS APIs like Systems Manager, ECR, or CloudWatch? Consider interface endpoints, but count the number of VPCs and AZs first.
Need generic outbound internet for patching, third-party APIs, or package repositories? NAT Gateway is often the practical answer.
Need outbound-only IPv6 internet? Egress-only internet gateway.

Practical lab: replace NAT-based S3 access

1. Confirm private instances reach S3 successfully.
2. Check the route table and note that default traffic goes to NAT.
3. Create an S3 gateway endpoint and associate the private route tables.
4. Validate route-table updates and confirm S3 access still works.
5. Watch NAT Gateway CloudWatch metrics and Cost Explorer over time; S3-related NAT bytes should drop.

Endpoint policy example idea: combine the endpoint with an S3 bucket policy using aws:SourceVpce or aws:SourceVpc so the workload can reach the bucket privately while reducing public exposure.

Peering, Transit Gateway, VPN, and Direct Connect

VPC peering is often the lower-cost answer for a small number of VPCs. It is direct and simple, but it is non-transitive. It also does not let you use a peer VPC as transit to that VPC’s IGW, NAT Gateway, or VPN. Bidirectional route updates are required, overlapping CIDRs are not allowed, and inter-Region peering has different transfer economics than same-Region peering.

Transit Gateway is the scale answer. It is Region-scoped, supports transitive routing, and simplifies multi-account connectivity, shared services, and segmented route domains. You pay for attachments and data processing, so it is easy to overbuild with TGW in a small environment. But once you have many VPCs, multiple accounts, or centralized inspection, TGW often becomes the better total-cost design because it reduces routing sprawl and operational complexity.

A good mental model is:

3 VPC startup: peering is often simpler and cheaper.
40 VPC enterprise: TGW is usually operationally and economically justified.

Site-to-Site VPN is usually the faster, lower-initial-cost hybrid option. It has hourly charges and uses internet paths, so performance is less predictable. Direct Connect is the steady-state choice for high-volume or predictable hybrid traffic, with port-hour and transfer charges. It is private connectivity, but not inherently end-to-end encrypted; if encryption is required, use MACsec where supported or run VPN over DX.

Staged migration pattern

Phase 1: VPN for quick connectivity.
Phase 2: Add DX for bulk transfer or steady-state traffic.
Phase 3: Run DX as primary and keep VPN as backup.

For exam purposes, that lifecycle thinking matters more than memorizing BGP details, though you should know that DX and many VPN designs use BGP for route exchange and failover.

Edge delivery, centralized inspection, and service-specific caveats

CloudFront can reduce origin load and origin transfer by caching, but it does not automatically lower total cost. Whether it saves money depends on cache hit ratio, request volume, object size, geographic distribution, and origin type. For global static content with repeat access, CloudFront is often a strong performance and cost answer. For low-cacheability content, it may improve performance without reducing total spend. If you add AWS WAF at CloudFront or ALB, remember that WAF adds separate request-based charges.

Load balancers need more precision than “they cost money.” ALB is billed hourly plus LCUs and is usually the application-aware choice for HTTP/HTTPS. NLB is billed hourly plus NLCUs and is common for TCP/UDP or static IP requirements. Cross-zone behavior and pricing differ by load balancer type and can change over time, so avoid blanket statements. The exam lesson is simpler: choose the load balancer for the protocol and requirement first, then evaluate locality and current pricing.

Centralized inspection is where many architectures become expensive. Patterns using Transit Gateway with AWS Network Firewall or Gateway Load Balancer can be correct for compliance, IDS/IPS, or centralized egress control, but they add hourly charges, traffic processing charges, and often extra cross-AZ transfer if routing is not symmetric and local. Centralized inspection is not automatically wasteful; in regulated environments it may reduce duplicated appliance cost and satisfy mandatory policy. But you must justify the extra hops.

Distributed egress with per-AZ NAT and local security controls is often cheaper for simple outbound patterns. Centralized egress is usually justified when governance and auditability are explicit requirements.

Observability and troubleshooting playbook

Network cost optimization is easier when you can prove the path. Use multiple tools together:

Cost Explorer / CUR — identify which transfer or managed network categories increased.
VPC Flow Logs — metadata only, not packet payloads; useful for source, destination, port, protocol, bytes, packets, and accept/reject patterns. They also have logging and storage cost.
CloudWatch metrics — NAT Gateway bytes, VPN tunnel state, load balancer metrics, and endpoint activity.
Route table and endpoint review — confirm intended paths actually exist.
Reachability Analyzer — validate path logic and catch routing or security misconfiguration.

Case study: unexpected NAT bill in a private app

A private EC2 fleet uploads artifacts to S3. NAT Gateway charges spike. The workflow is:

1. In Cost Explorer, confirm the increase is NAT Gateway processing rather than internet egress.
2. In CloudWatch, review NAT Gateway bytes out.
3. Check private subnet route tables. If there is no S3 gateway endpoint association, S3 traffic is likely using NAT.
4. Add an S3 gateway endpoint, associate the route tables, and keep the default route to NAT for true internet traffic.
5. Recheck NAT metrics after deployment. If S3 was the dominant flow, NAT processing should drop materially.

Common failure patterns

Single NAT Gateway serving private subnets in multiple AZs.
Using interface endpoints everywhere without checking traffic volume.
Assuming “private” automatically means “cheaper.”
Forgetting peering is non-transitive.
Choosing egress-only IGW for an IPv4 requirement.
Forcing every flow through centralized inspection without a stated control requirement.

What the exam is really testing

SAA-C03 is usually testing whether you can recognize the traffic pattern and choose the least expensive valid architecture without breaking security or availability.

If you see X, think Y

Private access to S3 or DynamoDB → usually gateway endpoint.
Few VPCs, simple connectivity → often VPC peering.
Many VPCs across accounts → usually Transit Gateway.
Low initial hybrid cost / quick setup → Site-to-Site VPN.
High-volume steady hybrid traffic → Direct Connect.
IPv6 outbound-only internet → egress-only internet gateway.
Global static content → often CloudFront.

Wrong-answer elimination

Eliminate answers that send traffic to the internet when a private endpoint path exists.
Eliminate answers that assume peering is transitive.
Eliminate answers that overbuild with TGW for tiny environments unless transitive routing or centralized control is required.
Eliminate answers that centralize egress without a governance reason.

Mini scenarios

1. Private EC2 needs lowest-cost S3 access.
Best answer: S3 gateway endpoint.
Why not NAT Gateway? More expensive path for this requirement.

2. Three VPCs need simple communication.
Best answer: VPC peering.
Why not TGW? Usually overbuilt unless shared routing control is required.

3. Enterprise with dozens of VPCs and centralized inspection.
Best answer: Transit Gateway, potentially with Network Firewall or GWLB-based inspection.
Why not peering mesh? Operationally messy and hard to govern.

4. Company starts hybrid migration with limited budget.
Best answer: Site-to-Site VPN first, DX later if traffic grows.
Why not DX immediately? Higher commitment and slower setup.

Conclusion

Cost-optimized AWS networking is really traffic-path design. Keep flows local when possible, avoid NAT when endpoints fit, deploy per-AZ NAT when NAT is required, use peering for small environments and Transit Gateway for scale, start hybrid with VPN when cost and speed matter, and use CloudFront when cache behavior justifies it.

If you remember one line for both the exam and real architecture reviews, make it this: the packet path is the cost path. Follow the traffic first, and the right AWS service choice becomes much easier.