AWS SAA-C03: How to Design High-Performing and Elastic Compute Solutions
Introduction: What “high-performing and elastic compute” means in SAA-C03
In SAA-C03, compute questions are really service-selection questions. What AWS is really checking is whether you can look at a workload, figure out what’s actually slowing it down or limiting it, and then choose the compute option that gives you the best balance of performance, elasticity, availability, cost, and day-to-day ops burden.
Use these terms precisely:
- High performance: low latency, high throughput, efficient startup, and enough CPU, memory, network, and storage performance for the workload.
- Elasticity: the ability to scale out and in automatically as demand changes.
- Scalability: the ability to handle growth over time.
- High availability: the system remains available despite failures, usually with redundancy across multiple Availability Zones; brief interruption may still be possible.
- Fault tolerance: the system continues operating through failure with no single point of failure and little to no interruption.
That distinction matters. Multi-Availability Zone web tiers usually improve high availability. Full fault tolerance is a stronger claim and is less common than candidates assume.
Compute Selection Flowchart for SAA-C03
Start with this decision path:
- Event-driven, short-lived, minimal ops? Choose Lambda.
- Traditional app, OS access, custom agents, legacy software, licensing, specialized networking, GPU, or deep tuning? Choose EC2.
- Containers, but no Kubernetes requirement? Choose ECS. If you want less node management, use the Fargate launch type.
- Kubernetes explicitly required? Choose EKS.
- Standard web app and you want managed deployment scaffolding? Choose Elastic Beanstalk is AWS’s managed application platform that takes some of the deployment and environment management off your plate..
- Queued batch jobs needing managed scheduling and elastic compute fleets? Choose AWS Batch.
- Simple HTTP app or container with very low ops and less platform control? App Runner may appear, but ECS/Fargate is still more central for SAA-C03.
Quick cues: “no servers” points to Lambda; “minimal container ops” points to ECS on Fargate; “custom OS” points to EC2; “Kubernetes” points to EKS.
Core Service Comparison
| Service | Best fit | Why choose it | Common trap |
|---|---|---|---|
| EC2 | Custom, legacy, performance-tuned workloads | Full control, broad instance choice, specialized hardware | Higher ops burden |
| Lambda | Event-driven, bursty, short-lived tasks | Automatic scaling, pay per use, minimal server management | 15-minute max execution, concurrency limits, no OS control |
| ECS on Fargate | Containers with minimal node management | Good elasticity, lower ops than EC2-backed clusters | Can cost more for steady heavy workloads |
| ECS on EC2 | Containers with more control or cost optimization | Control over instances and capacity strategy | You manage nodes |
| EKS | Kubernetes workloads | Managed Kubernetes control plane, ecosystem compatibility | More complexity than ECS |
| Elastic Beanstalk is AWS’s managed application platform that takes some of the deployment and environment management off your plate. | Opinionated deployment for common app stacks | Managed environment built on EC2, load balancing, and Auto Scaling | Not serverless; still uses underlying infrastructure |
| AWS Batch | Batch or high-performance computing style queued jobs | Managed job queues, retries, and compute environments | Overkill for simple always-on apps |
EC2: When control and performance matter most
EC2 is usually the right answer when you need operating system access, custom software, specialized accelerators, or tight performance tuning. For exam purposes, know the major families: M for general purpose, C for compute optimized, R/X for memory optimized, I/D/H for storage-oriented patterns, and P/G/Inf/Trn for accelerators.
Also know the common traps:
- T-family burstable instances are good for variable CPU workloads, but poor for sustained CPU-heavy demand because of CPU credit behavior.
- Newer generations usually offer better price/performance than older ones.
- Graviton can be excellent for price/performance if your binaries, dependencies, and container images support ARM64.
Performance on EC2 is not just instance size. It is often limited by EBS volume type and settings, instance EBS bandwidth limits, or network throughput. For storage, know the basics:
- gp3: strong default SSD choice with configurable IOPS and throughput.
- io1/io2: for sustained high IOPS and mission-critical latency-sensitive storage.
- Instance store: very fast ephemeral storage; data is lost if the instance stops, terminates, or underlying hardware fails.
- EFS: shared file storage across instances, useful for shared access but not a substitute for high-performance local block storage.
Advanced EC2 performance features matter too. ENA improves network performance for many modern instances. EFA is specialized for tightly coupled high-performance computing workloads. Placement groups help with placement strategy: cluster for low latency, partition for large distributed systems like Kafka or Cassandra, and spread for a small number of critical isolated instances. They are powerful, but capacity constraints and availability tradeoffs can apply.
Use launch templates for repeatable scale-out. I’d think about it like this:
Launch template example: use AMI ami-123, instance type c7g.large, IAM instance profile WebAppProfile, security group web-sg, require IMDSv2, and point user data to bootstrap.sh.
When EC2 is implemented well, it usually means Auto Scaling is set up properly and the instances can scale in and out without making anyone’s life miserable.
For web and app tiers, the standard pattern is Application Load Balancer plus Multi-Availability Zone Auto Scaling group. Auto Scaling works best when the instances are stateless and anything that needs to stick around, like session data, is stored somewhere else.
Key Auto Scaling group details candidates often miss:
- Min / desired / max capacity define floor, target, and ceiling.
- Health check type can use EC2 checks, load balancer checks, or both. Load balancer health checks are useful when the instance is running but the application is unhealthy.
- Health check grace period prevents premature replacement during startup.
- Default instance warmup helps avoid scaling decisions before new instances are actually contributing.
- Lifecycle hooks let you pause launch or termination for bootstrapping or connection draining.
- Instance refresh supports rolling replacement for AMI or launch template updates.
- Warm pools reduce scale-out delay for slow-starting instances.
- Mixed instances policies and capacity rebalance improve cost and resilience when using Spot.
Choose the metric that matches demand. CPUUtilization is common, but for load-balanced services, RequestCountPerTarget is often better. For queue workers, scale from monitoring alarms on queue depth or ApproximateAgeOfOldestMessage, usually with step scaling or custom metrics rather than predefined target tracking alone.
Good pseudo-config:
Auto Scaling example: min 2, desired 4, max 12, subnets private-a and private-b, ELB health checks, 180-second grace period, 120-second warmup, and a target tracking policy based on RequestCountPerTarget.
For load balancing, the main players are ALB, NLB, and GWLB.
Remember this: ALB for HTTP/HTTPS/gRPC, NLB for TCP/UDP/TLS and very high throughput, GWLB for transparent appliance insertion.
| Load balancer | Use it when | Notable exam clues |
|---|---|---|
| ALB | Path-based routing, host-based routing, APIs, microservices, WebSocket/gRPC | HTTP routing, containers, multiple services behind one endpoint |
| NLB | Low latency, static IPs, source IP preservation, TCP/UDP/TLS | Non-HTTP, millions of connections, static IP requirement |
| GWLB | Scaling firewalls or inspection appliances | Traffic inspection, security appliance insertion |
Implementation details worth knowing:
- ALB target types can include instances, IPs, and Lambda.
- ECS/Fargate commonly uses IP targets with
awsvpcnetworking. - ACM is commonly used for TLS certificates.
- Deregistration delay helps drain in-flight requests.
- Internal load balancers live in private subnets; internet-facing ones are used for public entry points.
- Cross-zone behavior isn’t the same across ALB, NLB, and GWLB, so I wouldn’t assume they all work exactly alike just because they’re all load balancers.
Lambda is usually the best fit for event-driven workloads that show up in bursts and don’t need servers sitting around all day.
Lambda is a great fit when the work is short-lived, triggered by events, and you really don’t want to manage servers at all. Lambda scales by running more executions at the same time, not by launching servers the way EC2 does. You’ll commonly see Lambda triggered by S3, EventBridge, SNS, SQS, API Gateway, DynamoDB Streams, and Kinesis.
A few exam details are definitely worth keeping top of mind:
- Maximum execution time: 15 minutes.
- Reserved concurrency: reserves and limits concurrency for a function.
- Provisioned concurrency: keeps execution environments pre-initialized to reduce cold starts.
- Memory setting affects CPU allocation; increasing memory can improve speed as well as RAM.
- Ephemeral storage in
/tmpis configurable and useful for temporary processing. - Lambda is weaker for persistent connection-oriented server models or workloads requiring durable in-memory state.
Failure handling matters. With asynchronous invocations, retries and destinations are part of the picture. And if SQS is the trigger, you’ve really got to pay attention to batch size, visibility timeout, and idempotency. Standard SQS queues use at-least-once delivery, so duplicates can absolutely happen. So your function logic needs to be idempotent.
If cold starts are a problem, I’d look at smaller deployment packages, efficient runtimes, lazy initialization, provisioned concurrency, and skipping VPC attachment unless the function really needs private resource access.
When it comes to containers on AWS, the big three are ECS, Fargate, and EKS.
Fargate is a launch type for ECS and EKS, not a separate orchestrator. That distinction matters.
ECS is the default answer when the question says containers but does not require Kubernetes. Use task definitions, services, and Service Auto Scaling. Store images in ECR. Separate the task execution role from the task role: execution role pulls images and writes logs; task role is what the application uses for AWS API access.
With awsvpc networking, each task gets its own network interface and can use security groups directly. That is common for Fargate and useful for isolation. ECS services often sit behind ALB or NLB and scale on CPU, memory, ALB request count, or custom metrics.
EKS is right when Kubernetes APIs, tooling, or portability are explicit requirements. EKS does manage the control plane for you, but you still have to think about managed node groups or Fargate profiles, ingress, pod scaling, and cluster autoscaling. It’s more complex than ECS, so I definitely wouldn’t pick it just because the word “containers” shows up in the question.
Elastic Beanstalk is AWS’s managed application platform that takes some of the deployment and environment management off your plate. fits standard application platforms, including some Docker-based deployments, but it is less favored for modern microservice orchestration. It reduces platform management, not infrastructure awareness.
Decoupling and elastic workers: SQS, SNS, EventBridge, and Batch
If you need burst absorption, decoupling, and independent worker scaling, think SQS. Standard queues give you at-least-once delivery and only best-effort ordering. FIFO queues are the choice when you need ordered processing and deduplication, as long as you stay within FIFO’s limits.
Design details that matter:
- Visibility timeout should exceed normal processing time.
- Long polling reduces empty receives and cost.
- DLQs handle poison messages after repeated failures.
- Idempotency is required because duplicates can happen.
- Age of oldest message often tells you more than raw queue depth.
Use SNS for fan-out and EventBridge for event routing based on rules. For managed queued batch processing, AWS Batch is often better than hand-building EC2 worker fleets. It gives you job queues, compute environments, retries, and Spot integration.
For optimization, security, monitoring, and troubleshooting, I tend to think in terms of what’s actually causing pain in production.
In practice, performance tuning is mostly about finding the real bottleneck instead of guessing. The first question I always ask is whether the problem is CPU, memory, network, storage, startup time, or some downstream dependency that’s dragging everything else down.
Useful metrics:
- EC2: CPUUtilization, NetworkIn/Out, disk metrics, status checks.
- ALB: RequestCount, RequestCountPerTarget, TargetResponseTime, HTTPCode_Target_5XX_Count, HealthyHostCount.
- Lambda: Duration, Errors, Throttles, ConcurrentExecutions.
- ECS: service/task CPU and memory utilization, running task count.
- SQS: ApproximateNumberOfMessagesVisible, ApproximateAgeOfOldestMessage.
Security basics by compute model:
- Use instance profiles for EC2, execution roles for Lambda, and task roles for ECS.
- Store secrets in Secrets Manager or Parameter Store, not in AMIs or user data.
- When you can, I’d keep application compute in private subnets.
- Use Systems Manager Session Manager instead of broad SSH access.
- Require IMDSv2 on EC2 and encrypt EBS volumes.
A quick troubleshooting map:
- ASG not scaling: wrong metric, warmup too long, max capacity reached, or health checks failing.
- ALB target unhealthy: wrong path/port, security group mismatch, app not listening, startup too slow.
- Lambda throttles: concurrency limit hit; use reserved concurrency, backoff, or redesign trigger flow.
- Queue backlog rising: workers too few, visibility timeout too short, poison messages, or downstream dependency slow.
- High latency with low CPU: suspect storage, network, database, or external service bottlenecks.
Exam strategy, distractors, and mini scenarios
Use elimination aggressively:
- Containers + no Kubernetes + minimal ops → ECS on Fargate, not EKS.
- HTTP routing by path or host → ALB, not NLB.
- Low-latency TCP with static IP → NLB, not ALB.
- Long-running job over 15 minutes → not Lambda.
- Critical service that cannot be interrupted → not Spot-only.
- Legacy app needing OS agents → EC2, not Lambda.
Mini scenarios:
Flash-sale web app: ALB + Multi-Availability Zone EC2 Auto Scaling group or ECS service. Scale on RequestCountPerTarget, keep app stateless, store sessions externally.
S3 image upload pipeline: S3 triggers Lambda for short transforms. If processing becomes heavy or long-running, place work on SQS and use ECS or Batch workers.
Nightly batch analytics: AWS Batch or EC2/ECS workers using mixed On-Demand and Spot for fault-tolerant cost optimization.
Cost logic to remember: use Compute Savings Plans for flexible commitments across EC2, Fargate, and Lambda; EC2 Instance Savings Plans for more specific EC2 usage; Reserved Instances still matter for EC2-focused commitment models; and use mixed On-Demand/Spot rather than Spot-only for resilient elasticity.
Final memory aids:
- HTTP intelligence = ALB
- Raw speed and static IP = NLB
- Containers without Kubernetes = ECS first
- Burst + events + short runtime = Lambda
- Queued batch at scale = Batch or SQS workers
- Custom OS or legacy constraints = EC2
AWS features, defaults, quotas, and pricing evolve over time, so current official product guidance should be reviewed before production use. But for SAA-C03, if you reason from workload behavior first, the right compute choice usually becomes obvious.