Design High-Performing and Elastic Compute Solutions for AWS SAA-C03
1. Designing high-performing and elastic compute on AWS
For SAA-C03, the real question is rarely “which service runs code?” It is “which design delivers the required performance, scales correctly, stays resilient across failures, and does not create unnecessary operational overhead?” That difference really matters, because on AWS exams you’ll often see one answer that works on paper and another that’s actually the better architecture.
Performance means latency, throughput, efficiency, and the ability to meet workload goals under load. Scalability means handling growth. Elasticity means scaling up and down automatically as demand changes. Horizontal scaling adds more instances, tasks, or function executions. Vertical scaling makes one node larger. AWS usually favors stateless, horizontally scalable, Multi-AZ designs because they improve both resilience and operational simplicity.
A useful exam model is: requirement → compute choice, workload pattern → scaling model, constraint → tradeoff. When traffic comes in bursts or waves instead of just trickling in at the same pace all day, Lambda is usually the cleanest fit. If you need actual control over the operating system, custom drivers, or specialized hardware, EC2 is usually the better choice. If you need containers but you don’t really need Kubernetes, ECS or Fargate is usually the simpler answer. If Kubernetes is explicitly required, use EKS. If the work is queued and asynchronous, think SQS plus workers or AWS Batch is for workloads that wait in a queue and get processed once compute capacity is available..
2. Compute service selection cheat sheet
| Service | Best fit | Avoid when | Exam clues |
|---|---|---|---|
| Amazon EC2 | Full OS control, custom runtimes, GPUs, HPC, legacy apps, deep tuning | You just need event-driven execution with low ops | Custom OS, specialized hardware, placement groups, licensing, low-level control |
| AWS Lambda | Event-driven, bursty, short-to-medium tasks, APIs, automation | Execution exceeds 15 minutes, needs persistent host control, or long-lived process models | Basically, you’re not managing servers at all, and you’re only paying when the code actually runs. That makes it a really strong fit for sporadic traffic and for integrations with API Gateway, SQS, SNS, and EventBridge. |
| Containers on AWS with Amazon ECS and AWS Fargate | Containerized services with lower ops than Kubernetes | Kubernetes is mandatory or deep host tuning is required | Containers, microservices, low operational overhead |
| Amazon EKS | Kubernetes standardization, existing K8s tooling, portability requirements | ECS would meet requirements more simply | Kubernetes, pods, cluster standardization |
| AWS Batch is for workloads that wait in a queue and get processed once compute capacity is available. | Those queued batch jobs can run on EC2, Spot Instances, Fargate, or Fargate Spot, depending on how much control, startup speed, and cost flexibility you want. | Customer-facing workloads where low latency really matters, like APIs, interactive apps, or anything users are sitting there waiting on | Batch queues, compute environments, and jobs that can handle interruptions |
| AWS Elastic Beanstalk for simpler app deployment | Fast deployment of supported web app platforms using AWS-managed underlying resources | You need a more explicit architecture choice or non-supported platform pattern | One of the easiest ways to deploy a web app while AWS handles the EC2 instances, load balancing, and Auto Scaling setup for you |
Rapid clues: event-driven and no servers → Lambda. Containers and no Kubernetes requirement → ECS or Fargate. Kubernetes requirement → EKS. HPC or MPI → EC2 with a cluster placement group and EFA. Queued async processing → SQS with workers or Batch.
3. EC2 for performance-focused architectures
EC2 matters because it gives you the most control over CPU, memory, networking, storage, and even the operating system itself, which is exactly why people reach for it when they need real tuning room. That makes it the default answer when performance depends on the host itself.
The real trick is just matching the workload to the right instance family. General purpose instances are a solid fit for balanced workloads, compute optimized is better when the job is CPU-heavy, memory optimized works well for in-memory analytics and caching, storage optimized is the one to reach for when you’ve got heavy local I/O, and accelerated computing is what you want when GPUs or another accelerator are part of the requirement. Also evaluate generation, network bandwidth, and EBS bandwidth; the instance type can become the bottleneck even when the storage volume itself is configured correctly.
Graviton instances are worth considering whenever the software stack supports ARM. In my experience, and honestly in a lot of exam scenarios too, Graviton often gives you better price/performance than a comparable x86 option. So if the question is hinting at better cost efficiency without giving up performance, and the workload can run on it, Graviton is usually a very strong choice.
Burstable instances can be fine for dev/test or lightly used services, but they are usually not the best answer for sustained production performance. If you need steady, high throughput, fixed-performance instance families are usually the safer and more predictable option.
Bare metal instances fit cases where the workload needs direct access to hardware features, specialized licensing, or minimal virtualization overhead.
4. Storage and network performance for compute
Many “compute” problems are really storage or network problems. For EBS, know the basics: gp3 is the general SSD choice and lets you tune IOPS and throughput independently; io1/io2 are for provisioned IOPS and latency-sensitive workloads; st1 and sc1 are HDD options for throughput-oriented or cold data use cases, not typical low-latency application volumes. Also remember that instance-level EBS bandwidth limits still apply.
Instance store provides very fast ephemeral local storage. Use it for scratch space, temporary processing, caches, or data that can be recreated. Do not choose it for durable state.
EFS is regional shared file storage with POSIX semantics, useful when many instances or containers need the same files. It is not usually the lowest-latency choice compared with local block storage. Performance and throughput modes matter, so it should be chosen for shared access requirements, not as a generic “fast storage” answer.
FSx is more specialized. FSx for Lustre is highly relevant for HPC and high-performance analytics. FSx for Windows File Server fits Windows SMB workloads. FSx for NetApp ONTAP and FSx for OpenZFS serve enterprise and advanced file-system use cases.
For networking, modern instances use the Nitro System and enhanced networking for better performance. If the workload needs tightly coupled node-to-node traffic, think cluster placement groups and Elastic Fabric Adapter (EFA). Cluster placement groups are usually used within a single AZ when you need the fastest possible traffic patterns. Partition placement groups improve failure isolation for large distributed systems. Spread placement groups are for a small number of critical instances that should be kept on distinct hardware.
5. How I usually think about elasticity in EC2 Auto Scaling
EC2 Auto Scaling pulls elasticity, self-healing, and Multi-AZ resilience into one design. The core building blocks are the Auto Scaling group, a Launch Template, subnets across multiple AZs, and often an ALB target group.
A good Launch Template includes the AMI, instance type, IAM instance profile, security groups, block device mappings, user data bootstrap, metadata options such as IMDSv2, and versioning. Mixed instances policies let one Auto Scaling group use multiple instance types and purchase options, which is useful for both resilience and Spot diversification.
Target tracking is usually the easiest scaling policy. Step scaling is useful when you need aggressive response at known thresholds. Scheduled and predictive scaling fit predictable demand. Health-based replacement is not a scaling policy; it is an Auto Scaling capability driven by EC2 or load balancer health checks. If instances are “running” but failing application traffic, load balancer health checks are especially important.
Use metrics that scale proportionally with demand. For ALB-backed web tiers, ALBRequestCountPerTarget is a standard target-tracking metric. For CPU-bound fleets, CPUUtilization may be fine. For queue workers, queue depth or age of oldest message is often better than CPU. Bad metric choice is one of the most common exam traps.
Warm pools are an EC2 Auto Scaling feature that keeps pre-initialized instances ready for faster scale-out. Default instance warmup, cooldown settings, health check grace periods, and lifecycle hooks all affect how quickly scaling becomes effective. If scaling feels slow, I’d check those first before blaming the scaling policy itself.
Common failure modes: bad AMI, broken user data, subnet IP exhaustion, missing instance profile permissions, capacity shortages, wrong health check path, or CloudWatch alarms tied to the wrong metric.
6. Load balancing choices that affect performance
| Load balancer | Best for | Key features |
|---|---|---|
| ALB | HTTP/HTTPS applications | Layer 7 routing, host/path/header rules, WebSockets, HTTP/2, gRPC, TLS termination |
| NLB | TCP, UDP, TLS, very high throughput | Layer 4, static IPs, TLS termination, low latency, common source IP preservation scenarios |
| GWLB | Appliance insertion | Traffic steering through firewalls and inspection appliances |
Choose ALB when the app needs content-aware routing. Choose NLB when the requirement is protocol-level performance, static IPs, or non-HTTP traffic. Be precise: NLB is not “the millisecond load balancer”; the real differentiators are Layer 4 behavior, high throughput, static IPs, and protocol support. NLB preserves client source IP in many common designs, while ALB passes client IP information in headers such as X-Forwarded-For for HTTP/HTTPS workloads.
It’s worth knowing the moving parts too: listeners, listener rules, target groups, health checks, deregistration delay, idle timeout, and TLS policies. Internal load balancers are for private east-west traffic, while internet-facing load balancers are for the public entry point into the application. Cross-zone load balancing support and defaults can vary by load balancer type and can change over time, so I’d always double-check how it behaves for the specific load balancer you’re using.
7. Lambda for elastic event-driven compute
Lambda is a really strong fit for event-driven architectures and for traffic that arrives in bursts. The critical limits matter: maximum execution time is 15 minutes, memory ranges from 128 MB to 10,240 MB, CPU scales with memory, and ephemeral /tmp storage is configurable up to 10 GB. So if something runs for several minutes, that doesn’t automatically rule Lambda out. The real question is whether the task actually fits the timeout and execution model.
Reserved concurrency both guarantees concurrency for a function and caps that function at the specified amount by reserving capacity from the account concurrency pool. Provisioned concurrency keeps environments pre-initialized to reduce cold starts for latency-sensitive functions.
Invocation models matter. API Gateway usually invokes Lambda synchronously. Services like SNS, EventBridge, and S3 usually trigger Lambda asynchronously, and that changes the way you handle retries and failures. Event source mappings are used for poll-based sources such as SQS, Kinesis, DynamoDB Streams, and Kafka integrations. That difference absolutely changes how retries, batching, and failure handling work.
For Lambda triggered by SQS, make sure the queue visibility timeout matches the expected processing time, use DLQs or on-failure destinations, and consider partial batch responses so one bad message doesn’t make you reprocess a whole batch that mostly succeeded. If a downstream database cannot absorb a sudden concurrency spike, reserved concurrency is a clean protection mechanism.
Lambda is commonly used for stream processing, so “streaming” alone is not a reason to reject it. The real question is whether Lambda fits the workload’s needs for throughput, ordering, batching, retries, and duration, because that’s where the design either works beautifully or falls apart.
8. Containers with ECS, Fargate, and EKS
ECS is the simpler AWS-native orchestration answer for containers. Fargate is the serverless compute option for ECS or EKS when you do not want to manage nodes. EKS is for teams that truly need Kubernetes APIs and ecosystem tooling.
With ECS, the main building blocks you’ll work with are task definitions and services. A task definition specifies CPU, memory, container image, port mappings, logging, environment configuration, secrets, execution role, and task IAM role. Services maintain desired task count and can integrate with ALB target groups for rolling deployments and autoscaling. Capacity providers let you blend On-Demand and Spot capacity cleanly.
Fargate reduces operational overhead but gives less host-level control. It is excellent for many microservices and worker patterns, though startup behavior, task sizing, and networking should still be considered. At larger scale, ECS on EC2 can give you better cost control if the team’s comfortable managing the underlying capacity beneath it.
With EKS, keep the pieces straight: AWS manages the control plane, while the data plane runs on managed node groups, self-managed nodes, or Fargate profiles. Horizontal Pod Autoscaler scales pod count. Cluster Autoscaler or Karpenter handles node scaling. For IAM, EKS commonly uses IAM Roles for Service Accounts (IRSA), and newer environments may use EKS Pod Identity. If the scenario does not require Kubernetes, EKS is often an overcomplicated answer.
9. Event-driven elasticity and queue-based load leveling
Some scaling problems should be solved with decoupling, not more front-end compute. SQS buffers work, SNS fans out notifications, and EventBridge routes events based on rules across services and accounts.
For SQS, know the practical settings: Standard queues maximize throughput and support at-least-once delivery; FIFO queues preserve ordering and support deduplication, but with lower throughput characteristics. Visibility timeout, long polling, the retention period, and the DLQ redrive policy all shape how the system behaves in the real world. Scale workers from ApproximateNumberOfMessagesVisible or, when latency matters, ApproximateAgeOfOldestMessage.
A classic pattern is: user request accepted quickly → job written to SQS → worker fleet on EC2, ECS, or Lambda processes asynchronously → poison messages go to a DLQ. That pattern absorbs spikes, protects the front end, and supports Spot-friendly worker fleets.
10. Security, observability, and troubleshooting
Elastic compute still needs strong security. Use IAM roles instead of embedded credentials: instance profiles for EC2, execution and task roles for ECS, function roles for Lambda, and IRSA or Pod Identity for EKS. Enforce least privilege, keep instances private where possible, prefer Systems Manager Session Manager over direct SSH, store secrets in Secrets Manager or Parameter Store, patch AMIs regularly, and use security groups to tightly control traffic. For containers, use trusted images and image scanning. For public apps, terminate TLS at ALB or NLB as appropriate.
For observability, tie the metrics back to the service: for EC2 and Auto Scaling, look at CPUUtilization, NetworkIn and NetworkOut, StatusCheckFailed, desired capacity, and in-service counts. ALB → RequestCountPerTarget, TargetResponseTime, HTTPCode_Target_5XX_Count. Lambda → Duration, Errors, Throttles, ConcurrentExecutions, IteratorAge where relevant. SQS → visible messages and age of oldest message. EC2 memory is not a default CloudWatch metric; you need the CloudWatch agent or custom metrics.
Fast troubleshooting guide: Auto Scaling not scaling → inspect alarms, warmup, cooldown, launch failures, and subnet IPs. ALB unhealthy targets → check health check path, port, security groups, and app startup time. Lambda throttling → inspect concurrency limits and downstream protection settings. ECS tasks pending → check CPU and memory placement, image pull access, subnet or NAT connectivity, and IAM. EKS pods pending → inspect node capacity, taints, and autoscaler behavior. SQS backlog growing → increase worker concurrency, inspect processing time, and isolate poison messages.
11. Cost-performance tradeoffs and exam traps
Use On-Demand for variable demand, Savings Plans or Reserved Instances for steady baseline usage, and Spot for interruption-tolerant workloads. Spot instances can be interrupted with a two-minute warning, so design for checkpointing, multiple instance types, multiple AZs, and capacity-optimized allocation strategies. A common strong pattern is baseline On-Demand plus burst or worker capacity on Spot.
Common traps: choosing EKS when ECS is enough; choosing EC2 when Lambda clearly fits; choosing ALB for non-HTTP traffic; scaling queue workers on CPU instead of backlog; using sticky sessions instead of externalizing state; assuming EFS is always the right shared storage answer; or picking Spot for interruption-sensitive services.
Final exam checklist: Is the workload event-driven, always-on, batch, or HPC? Does it need full OS control? Is Kubernetes explicitly required? What metric truly reflects demand: CPU, request count, queue depth, or latency? Does the architecture externalize state, span multiple AZs, and self-heal? If you answer those correctly, the compute choice usually becomes obvious.