Determine High-Performing and Scalable Storage Solutions for AWS Certified Solutions Architect Associate (SAA-C03)

Determine High-Performing and Scalable Storage Solutions for AWS Certified Solutions Architect Associate (SAA-C03)

1. Introduction: How SAA-C03 Tests Storage Decisions

For SAA-C03, storage questions are rarely about definitions. They are about fit. What AWS is really checking is whether you can spot the access pattern first and then map it to the right storage type, whether that ends up being object, block, file, or ephemeral storage. Once you’ve got that piece sorted out, then you can start tuning for the rest of the story, like latency, IOPS, throughput, scope, resilience, and cost.

And honestly, that part matters a lot, because when storage is the wrong fit, it tends to fail in pretty predictable ways. Databases suffer on the wrong latency profile. Shared content breaks when it sits on single-instance storage. Costs rise when teams provision premium performance they never use. Recovery plans fail when architects ignore AZ scope or one-zone durability tradeoffs.

The fastest exam method is this: pattern first, service second, tuning third. If the requirement says shared Linux file access, think EFS first. If it says low-latency EC2 database storage, think EBS. If it says SMB and Active Directory, think FSx for Windows File Server. If it says temporary scratch, think instance store. If it says petabyte-scale content or archive, think S3.

2. Storage Selection Framework

Use the same checklist every time:

  • Access model: object, block, file, or ephemeral
  • Sharing model: single host or many clients
  • Performance: latency, IOPS, throughput
  • Scope: AZ-scoped or Regional
  • Persistence: must survive stop, terminate, or host loss?
  • Protocol: API, NFS, SMB, iSCSI, SFTP
  • Cost model: provisioned performance, elastic usage, or archival pricing

Scope reminder: EBS volumes are AZ-scoped. S3 is Regional. EFS is Regional or One Zone and is accessed through mount targets in your VPC. FSx resilience depends on the FSx variant and deployment type. Instance store is tied to the EC2 host.

Persistence reminder: EBS is persistent block storage, but whether a root volume survives instance termination depends on the DeleteOnTermination setting. Instance store data will usually survive a simple reboot, but once you stop the instance, terminate it, or lose the underlying host, that data’s gone for good.

3. Core Storage Services That Dominate This Exam Domain

Service Type Best fit Scope Exam clue
Amazon S3, which is AWS’s object storage service Object I usually think of S3 as the place you go when you need massive scale, very strong durability, static content, archives, or a data lake that can just keep growing without much drama. Regional Objects, archive, website content, logs, durable repository
Amazon EBS Block Low-latency EC2 storage, boot volumes, databases AZ-scoped Volume, attached disk, transactional database, boot disk
Amazon EFS Shared file Shared Linux NFS storage across many clients Regional or One Zone Shared Linux, NFS, containers, home directories
Amazon FSx Specialized file SMB, HPC, enterprise NAS, OpenZFS/ONTAP features Varies by type SMB, AD, Lustre, multiprotocol, enterprise NAS
EC2 Instance Store Ephemeral block Very fast temporary local data, cache, scratch Host-bound Scratch, cache, temporary buffers, disposable intermediates

Keep transfer and hybrid tools separate in your mind. DataSync, Storage Gateway, Snow Family, Transfer Family, and Direct Connect are there to help you move data or get it into the right place, but they’re not usually the main answer when the question is really asking where the workload should live.

4. High-performance storage is where EBS, Instance Store, and FSx come into play. Honestly, this is the bucket I keep coming back to when a workload needs quick responses or has to move data at a steady, predictable pace.

Amazon EBS is the default answer for persistent, low-latency EC2 block storage. Most of the time, an EBS volume is tied to a single EC2 instance, so it acts a lot like that instance’s private disk rather than something the whole fleet can share. And yes, Multi-Attach is a real feature, but it’s pretty specialized. I definitely wouldn’t start there unless the workload actually needs shared block access. It only works with io1 and io2, it needs supported Nitro-based instances in the same AZ, and the application or file system still has to be designed for clustered access. It is not a substitute for EFS or FSx.

The exam-relevant EBS volume distinctions are:

  • gp3: general-purpose SSD and the usual starting point. It decouples size from performance and includes baseline performance of 3,000 IOPS and 125 MiB/s, with higher performance provisioned separately.
  • gp2: older general-purpose SSD. Performance scales with volume size, which is why gp3 often replaces it in modern designs.
  • io2 / io2 Block Express: provisioned IOPS SSD for critical databases and sustained high IOPS with low latency.
  • st1: throughput-optimized HDD for large sequential workloads.
  • sc1: cold HDD for infrequently accessed data.

Important constraint: st1 and sc1 cannot be used as boot volumes. If the question mentions an operating system disk, eliminate them.

EBS performance is not just about the volume. The EC2 instance type can cap achievable EBS bandwidth and IOPS. A lot of modern instance families are EBS-optimized by default, but the exam still expects you to remember that the instance itself can become the bottleneck if its throughput limit is too low.

EBS snapshots are incremental point-in-time backups, and by default they’re crash-consistent, which is usually fine for a lot of exam scenarios. If you need application-consistent backups, you’ve got to pause or quiesce the application or file system first. AWS handles the snapshot mechanics behind the scenes, and yes, S3 is part of how it works, but you still wouldn’t think of EBS snapshots as ordinary S3 objects. They’re incredibly useful for backup, restore, cloning, and disaster recovery, and the fact that you can copy them across Regions makes them even more useful in real projects.

Operationally, EBS supports Elastic Volumes, so you can often modify size, type, IOPS, or throughput online. That’s a very common optimization move when a workload starts outgrowing the settings you picked at the beginning. For very high aggregate performance, architects may stripe multiple EBS volumes in RAID 0, but that improves speed, not durability.

EC2 instance store is local ephemeral block storage physically attached to the host. On many modern instance families it is NVMe SSD, but not universally, so do not memorize “instance store = always NVMe.” Use it for cache, scratch, temporary ETL staging, rendering intermediates, or other disposable data. If the data must survive stop events or host failure, instance store is the wrong answer.

Amazon FSx matters when the file protocol or workload is specialized:

  • FSx for Windows File Server: SMB shares, Windows ACLs, and Active Directory integration.
  • FSx for Lustre: HPC, ML training, rendering, and parallel file workloads; commonly integrated with S3 for import and export workflows.
  • FSx for NetApp ONTAP: enterprise NAS features, snapshots, cloning, and multiprotocol access such as NFS, SMB, and iSCSI.
  • FSx for OpenZFS: low-latency Linux and UNIX file workloads needing OpenZFS-style semantics.

For exam purposes, read the protocol words carefully. SMB + AD means FSx for Windows. Parallel HPC means FSx for Lustre. Advanced NAS or multiprotocol points toward ONTAP. Linux shared NFS without specialized NAS requirements usually stays with EFS.

5. Scalable Storage: S3 and EFS

Amazon S3, which is AWS’s object storage service is object storage with Regional scale and 11 nines of durability. S3 isn’t a native block device, and it isn’t a POSIX file system either, even if some tools can make it look file-like. For the exam, I’d think of S3 as API-driven object storage for things like content, backups, logs, analytics data, and archives.

S3 now provides strong read-after-write consistency for PUT, DELETE, GET, and LIST operations, plus metadata, tags, and ACL changes. That matters because a lot of the older eventual-consistency assumptions just don’t apply the way they used to.

Key storage classes to know:

  • S3 Standard: frequent access, low latency, multi-AZ resilience.
  • S3 Intelligent-Tiering: good when access patterns are unknown, but monitor small-object and monitoring-cost tradeoffs.
  • S3 Standard-IA: infrequent access, retrieval fees, minimum storage duration charges.
  • S3 One Zone-IA: lower cost but stored in one AZ, so resilience is lower than multi-AZ classes.
  • S3 Glacier Instant Retrieval: archive-like economics with millisecond access for rare retrieval.
  • S3 Glacier Flexible Retrieval: archival storage with slower retrieval options.
  • S3 Glacier Deep Archive: lowest-cost long-term archive with the slowest retrieval.

When a question is clearly pointing toward archive storage, I’d immediately look for clues about retrieval delays, retrieval fees, and minimum storage duration charges, because that’s usually what decides the answer. Those details often eliminate tempting but wrong answers.

S3 data protection features include versioning, replication, Object Lock, lifecycle policies, and MFA Delete in tightly controlled scenarios. For static websites, S3 can host content, but if the requirement includes HTTPS on a custom domain, a content delivery layer is typically the cleaner exam answer.

Amazon EFS is managed NFS file storage for Linux workloads. You access it through mount targets in subnets inside your VPC, and clients in each AZ should use the local mount target path so you get better resilience and lower latency. EFS is the default answer for shared Linux content across EC2, ECS, EKS, and Lambda functions that are configured inside a VPC.

Current EFS design points that matter:

  • Regional vs One Zone: Regional is the resilient default; One Zone lowers cost but reduces resilience.
  • Throughput modes: Bursting, Provisioned, and Elastic Throughput. Elastic is attractive for variable workloads; Provisioned fits predictable sustained demand.
  • Storage classes: EFS Standard and EFS Infrequent Access, with lifecycle management to move colder files automatically.
  • Access controls: POSIX permissions, EFS Access Points, file system policies, security groups on mount targets, and KMS encryption at rest.

When I mount EFS, I usually use the EFS mount helper and enable TLS:

sudo mount -t efs -o tls fs-12345678:/ /mnt/efs

That requires the EFS utilities package. If a mount fails, check DNS resolution, mount target availability in the AZ, security groups allowing NFS port 2049, and VPC routing.

6. Most Testable Comparisons and Keyword Decoder

Keyword or requirement Likely answer Why the distractor is wrong
Low-latency EC2 database EBS gp3 or io2 S3 is object storage; instance store is not durable enough
Shared Linux / NFS / many EC2 instances EFS Standard EBS is not shared file storage
SMB / Windows shares / Active Directory FSx for Windows File Server EFS is NFS for Linux, not SMB
Parallel HPC / rendering / ML training FSx for Lustre EFS is shared Linux storage but not optimized for parallel HPC throughput
Scratch / cache / temporary buffer Instance store EBS costs more if persistence is unnecessary
Archive / retention / rarely accessed objects S3 Glacier classes EBS and EFS are the wrong economic model
Move data online to AWS repeatedly DataSync It moves data; it is not the final storage target
Offline bulk transfer from low-bandwidth site Snow Family DataSync over a weak link may miss the deadline

Eliminate wrong answers fast: If the app needs a mounted filesystem, eliminate S3. If many instances need concurrent file access, eliminate standard EBS. If data must survive stop or host failure, eliminate instance store. If the question says SMB, eliminate EFS. If the requirement is really about moving data instead of storing it, keep the transfer service separate from the storage target in your head.

7. Monitoring, Security, Backup, and Troubleshooting — because the right storage choice still has to run well in production

Good storage answers include operations. For EBS, watch metrics such as VolumeQueueLength, read and write operations, throughput, and latency. If database latency starts climbing, I’d check both the volume settings and the EC2 instance’s EBS bandwidth limit, because either one could be the real bottleneck. For EFS, inspect PercentIOLimit and throughput-related metrics; sustained saturation may indicate the wrong throughput mode or even the wrong service choice. For S3, monitor request errors, replication status, transfer behavior, and use AWS auditing and storage analytics features for visibility.

aws cloudwatch get-metric-statistics \ --namespace AWS/EBS \ --metric-name VolumeQueueLength \ -- -- ------dimensions Name=VolumeId,Value=vol-0123456789abcdef0 \ -- -- --------start-time 2026-04-03T00:00:00Z \ -- -- --------end-time 2026-04-03T01:00:00Z \ --period 300 \ --statistics Average

Security is service-specific:

  • S3: Block Public Access, bucket policies, IAM, Access Points, default encryption with SSE-S3 or SSE-KMS, and VPC endpoints for private access.
  • EBS: KMS encryption at rest, snapshot protection, and careful control of snapshot sharing and copy permissions.
  • EFS: KMS at rest, TLS in transit, security groups on mount targets, EFS Access Points, and file system policies.
  • FSx: KMS encryption plus directory-aware or protocol-aware access controls depending on the FSx type.

For backup and disaster recovery, remember AWS Backup can protect EBS, EFS, FSx, and other services with centralized policies. EBS snapshots can also be copied across Regions, and that’s a really important advantage when you’re designing for disaster recovery. With S3, versioning, replication, and Object Lock are the features I’d lean on when stronger data protection is needed. On the exam, your answer should line up with the target RPO and RTO, not just the raw durability number.

Symptom Likely issue Likely fix
EC2 database is slow That usually points to an EBS IOPS bottleneck, a throughput bottleneck, or sometimes even the EC2 instance’s own storage bandwidth limit. The fix is often to move to gp3 or io2, tune the IOPS and throughput settings, or step up to a larger EC2 instance with stronger storage bandwidth.
Auto Scaling web fleet cannot share uploads Uploads stored on one instance or EBS volume Use EFS for shared files or S3 for object-based uploads
Linux shared file app has mount failures EFS mount target, DNS, or security group problem Check mount targets, NFS 2049, routing, and mount helper
Storage bill spikes Idle EBS volumes, stale snapshots, no S3 lifecycle, overused io2 Right-size, delete unused resources, add lifecycle policies

8. Hybrid and Migration Tools: Short but Important

These are exam-adjacent and easy to confuse with storage services:

  • DataSync: managed online transfer and synchronization to S3, EFS, and FSx; supports scheduling and verification.
  • Storage Gateway: hybrid access. File Gateway presents NFS or SMB backed by S3 object storage; other modes support hybrid block or backup patterns.
  • Snow Family: offline bulk transfer for huge datasets or constrained links.
  • Transfer Family: managed SFTP, FTPS, and FTP ingestion into S3 or EFS.
  • Direct Connect: private connectivity to AWS, not storage itself.

That distinction is simple but critical: movement tool versus storage target.

9. Scenario Walkthroughs for SAA-C03, where the decision logic starts to feel real

Scenario 1: High-transaction relational database on EC2. Choose EBS. For a lot of workloads, I’d start with gp3. If the scenario keeps stressing sustained high IOPS and mission-critical latency, then io2 or io2 Block Express starts to make a lot more sense. Reject S3 because it is object storage. Reject instance store because the data must persist.

Scenario 2: Shared Linux content for an EC2 or container fleet. Choose EFS. The clue is many Linux clients mounting the same files. Reject EBS because normal EBS is not shared file storage. Consider FSx for OpenZFS or ONTAP only if the question adds specialized NAS features.

Scenario 3: Departmental Windows share integrated with AD. Choose FSx for Windows File Server. The protocol clue is SMB and Windows ACL behavior. Reject EFS immediately.

Scenario 4: A petabyte-scale media archive where retrieval is rare. In that case, I’d choose S3 and then pick the right archival class, usually Glacier Flexible Retrieval or Deep Archive depending on how fast the retrieval needs to be. Add lifecycle policies. Reject EBS and EFS because the cost model is wrong.

Scenario 5: Temporary analytics scratch space with maximum local performance. Choose instance store. The clue is disposable intermediate data. Reject durable services unless the question explicitly requires persistence.

10. Final Exam Cram Sheet

  • S3 = scalable Regional object storage; strong consistency; best for content, backups, logs, archives
  • EBS = AZ-scoped persistent EC2 block storage; best for low-latency databases and boot volumes
  • EFS = shared Linux NFS file storage; accessed through mount targets in a VPC
  • FSx = specialized managed file systems; choose by protocol and workload
  • Instance store = fastest temporary local storage; not for durable business data

If you remember one sentence, make it this: identify the access pattern first, then choose the storage type, then tune for performance, resilience, and cost.