AZ-900 Azure Cost Management and SLAs: Pricing, Budgets, Availability, and Exam Essentials

1. Introduction

Azure cost management and service level agreements matter because cloud decisions always have business consequences. In Azure, lower cost can mean less redundancy, while higher availability usually requires additional resources, replication, or broader architecture choices that increase spend. This tradeoff is one of the big ideas in AZ-900. You’re not just memorizing Azure services—you’re learning how each technical choice can change cost, risk, and uptime.

For the exam, I’d really anchor on this pattern: estimate before you deploy, monitor after you deploy, keep optimizing as you go, and make availability a deliberate design choice. Azure absolutely gives you tools for all of this, but none of them replace good architecture or solid governance.

2. Azure Cost Basics

Traditional IT often relies on CapEx, or capital expenditure, where organizations buy hardware and infrastructure upfront. Cloud computing shifts much of that to OpEx, or operational expenditure, where services are consumed over time and billed according to usage, selected service tiers, licensing, and pricing commitments. Azure gives you both consumption-based and commitment-based pricing, which is handy. So depending on the service, you might pay in a pay-as-you-go model, or you might lock in savings with a reservation or a savings plan.

Key definitions: A subscription is a primary management, access, and common billing scope for Azure resources. A resource group is a management container used to organize resources, though it is not itself a billing boundary and resources inside it can have different lifecycles. A region is a geographic area containing one or more datacenters connected through a low-latency network where Azure services are deployed.

One of the biggest beginner shifts is that cloud cost changes with usage. A workload that runs for a few hours will usually cost less than one that’s left running day and night. That’s one of those cloud basics that sounds obvious, but it catches people out all the time. Costs can also creep up from storage growth, higher-performance tiers, extra redundancy, logging, backups, and outbound network traffic. Honestly, those are some of the most common reasons a bill surprises people. Azure’s flexibility is a huge benefit, but it also means you can’t just set it and forget it—you’ve got to manage cost actively.

3. What Affects Azure Costs?

Azure pricing depends on the service you choose, how you configure it, how heavily you use it, and how the whole architecture is designed. In Azure, a VM isn’t just a VM, and storage isn’t just storage. The exact options you pick can change the cost quite a bit. The exact SKU, region, redundancy choice, and data movement pattern all matter more than people expect.

Cost Factor What It Means Why It Matters
Resource type Compute, storage, databases, networking, analytics, and the rest of the stack Different services have different pricing models
Service tier / SKU Performance and feature level selected Higher tiers usually cost more
Region Where the resource is deployed Pricing varies by region, and not all services exist in all regions
Usage / consumption Runtime hours, transactions, storage used, requests, throughput, and more More usage usually means more cost
Data transfer Traffic entering or leaving Azure, and in some designs traffic between regions or zones Inbound data is often free, but outbound data is commonly billed
Licensing You might have included licenses, bring-your-own-license rights, or other licensing benefits Licensing can make a pretty big difference to the total cost
Redundancy Extra instances, replication, backup, or multi-zone/multi-region design Improves resilience but often increases spend
Billing scope and tagging How costs are grouped, filtered, and allocated Affects visibility, governance, and chargeback rather than raw service price
Marketplace purchases Third-party software or images deployed through Azure Marketplace Can add separate software publisher charges
Preview services Features not yet generally available May still be billable, but can have limited support and no formal SLA

Some practical examples make this easier. A B-series VM is often cheaper than a D-series VM because it’s designed for a different performance pattern. Basically, you’re not just paying for size—you’re paying for how the machine is meant to behave. Azure Storage also varies by redundancy option: LRS keeps copies within a single datacenter, ZRS spreads copies across zones in supported regions, and options like GRS or GZRS add cross-region replication. More redundancy usually means higher cost. Databases follow the same pattern: a basic or low-tier option costs less than a premium or business-critical tier, but gives you fewer performance and availability features.

Networking is another common surprise. A public-facing application might have low compute cost but noticeable outbound data charges. A design using VPN Gateway, ExpressRoute, premium load balancing, or cross-region replication can also add non-obvious networking cost.

Important exam note: Preview services are not the same as generally available services. They may be useful for testing, but they often do not include the same support commitments or SLA terms.

4. Common Azure Pricing Options

Azure has several common pricing options for compute and other eligible services. The best pricing model really comes down to how predictable the workload is.

Pay-as-you-go means you pay for what you consume with no long-term commitment. That’s a great fit for labs, pilots, seasonal workloads, or anything where demand is still a bit uncertain.

Azure Reservations provide discounted pricing when you commit to eligible resource usage for a term, commonly one or three years. Reservations apply to eligible services and scopes, such as particular VM families or other reserved capacity offerings, rather than simply “one exact machine.” They work best for stable, predictable workloads.

Azure Savings Plan for Compute is a commitment model for eligible compute services. Instead of reserving a specific resource family in the same way, you commit to a consistent hourly compute spend for a term, and Azure applies savings across eligible usage. It is generally more flexible than reservations.

Azure Spot Virtual Machines offer very low-cost compute for interruptible workloads. Azure can evict Spot VMs because of capacity pressure, and depending on configuration, price conditions may also matter. They are useful for batch jobs, rendering, testing, or large-scale fault-tolerant processing, but not for critical workloads that require stable availability.

Option Best Fit Key Tradeoff
Pay-as-you-go Unknown or changing workloads Highest flexibility, usually lowest discount
Reservation Steady long-running eligible workloads Better savings, less flexibility if usage changes
Savings Plan for Compute Variable eligible compute usage More flexible than reservations, but still a commitment
Spot VM Interruptible, noncritical processing Very low cost, but can be evicted

If a reservation or savings plan is underused, realized savings are lower because you committed to more than you actually consumed. That is why utilization matters. For licensing, also remember Azure Hybrid Benefit: it can reduce cost for eligible Windows Server and SQL Server workloads when you bring qualifying existing licenses.

5. Estimating Costs Before Deployment

Before deployment, Azure gives you two major estimation tools with different purposes.

Azure Pricing Calculator estimates the expected cost of an Azure solution. It is used when you already know, or are planning, the Azure services you want to deploy.

Azure Total Cost of Ownership (TCO) Calculator compares estimated Azure costs with on-premises costs such as servers, storage, networking, power, cooling, facilities, and maintenance. It is used for migration planning rather than simple Azure pricing.

Tool Main Use Typical Question
Pricing Calculator Estimate Azure deployment cost What might this solution cost in Azure?
TCO Calculator Compare on-premises and Azure cost How does Azure compare with our current datacenter spend?

A simple Pricing Calculator workflow looks like this: identify workload components, select a region, choose SKUs or tiers, estimate runtime hours, storage size, transactions, and outbound data, then add any support plan or Marketplace cost if relevant. The result is an estimate, not a guaranteed bill. If you change the architecture later, the estimate should be revised.

Example: for a small two-tier application, you might choose App Service, Azure SQL Database, a storage account, and estimated outbound bandwidth. If you move from a basic region and tier to a premium tier with zone redundancy, the estimate rises. That is exactly the kind of business tradeoff AZ-900 wants you to recognize.

A TCO workflow is different. You enter current on-prem server counts, storage, virtualization, networking, power, cooling, and operational assumptions. This helps leadership compare current ownership cost with a cloud operating model. If the assumptions are unrealistic, the comparison will also be unrealistic, so inputs matter.

Authoritative source note: Azure pricing pages and official service pricing documentation provide the most reliable way to confirm current rates and terms.

6. Monitoring and Controlling Costs After Deployment

After resources are deployed, the main financial visibility tool is Cost Management + Billing. This is where you review actual spend, analyze trends, create budgets, view forecasts, and understand where costs are coming from. Unlike the Pricing Calculator, this is based on real usage data.

In Cost Management, you typically analyze spend by time period, subscription, resource group, service, location, or tag. Depending on billing account type, agreement type, RBAC permissions, and scope, the exact views and budget options can vary. For AZ-900, the key idea is simple: use it to understand actual and forecasted cloud spend after deployment.

A common budget workflow is: choose the scope, define the amount, set a reset period such as monthly, and configure alert thresholds like 50%, 80%, and 100%. Budgets generate notifications, but they do not automatically stop resources or charges. If you want automatic action, that usually requires separate automation such as Azure Monitor alerts integrated with Logic Apps, Functions, or operational runbooks.

Tags help with cost allocation, especially for department, project, owner, or environment reporting. A practical standard might require tags such as Environment, Owner, and CostCenter. However, tags are only useful if applied consistently. They are not retroactive for past usage before tagging, and billing data can take time to reflect them.

Cost data can also be exported for reporting, FinOps analysis, or chargeback processes. That becomes especially useful in larger environments where teams need recurring reports rather than manual portal review.

7. Governance for Cost Control

Good cost management depends on good governance. The basic hierarchy to remember is management group → subscription → resource group → resource. Use management groups when you need broad governance across several subscriptions, subscriptions when you want a management and billing scope, resource groups to keep related resources together, and tags to add business context like owner, environment, or cost center.

Azure Policy helps audit or enforce rules. For example, you can require tags, restrict allowed locations, or limit allowed SKUs. Policy does not directly “reduce cost” by itself, but it supports cost control by preventing wasteful or noncompliant deployments. Common policy effects include audit and deny.

RBAC, or role-based access control, is different. RBAC controls who’s allowed to do what. Policy controls what is allowed. Tags describe resources. Cost Management reports spend. Those distinctions are very testable.

8. Cost Optimization and Common Billing Pitfalls

Cost optimization means matching architecture and service level to real business need. The biggest cost-saving wins usually come from rightsizing, autoscaling, using commitment discounts where they actually fit, picking the right storage tier, and cleaning up idle resources that nobody’s using anymore.

For VMs, one critical technical detail is this: shutting down the operating system from inside the guest does not always stop compute billing. To stop compute charges, you generally need to stop/deallocate the VM from Azure. This is a classic billing trap in dev/test environments.

Other frequent cost surprises include unattached managed disks, old snapshots, unused public IP addresses, backup vault retention, log ingestion growth, premium SKUs selected by default, Marketplace software charges, and outbound data transfer. A sudden bill increase often comes from one of those.

A practical troubleshooting flow is pretty straightforward. Start in Cost Management to find the biggest cost driver by service or resource, narrow it down by time period to see when the spike started, and then check whether a new deployment, scaling event, logging change, or forgotten resource caused it. After that, look at the tags and ownership details so the right team can actually do something about it. Azure Advisor can be really helpful for spotting underused resources, especially VMs that are oversized or just sitting there idle doing nothing.

Storage optimization matters too. If data is infrequently accessed, a cooler or archive-oriented tier may be more cost-effective where supported. Redundancy choice also matters: LRS is usually cheaper than ZRS or GRS, but less resilient. Again, the exam theme is tradeoff, not “always cheapest.”

9. Understanding Azure SLAs

An SLA, or Service Level Agreement, is Microsoft’s financially backed commitment for expected service availability, usually measured monthly. If Microsoft does not meet the documented SLA conditions, eligible customers may receive service credits. That is important: an SLA is not a promise that outages never happen. It is a documented availability target with terms, exclusions, and credit conditions.

SLA applicability depends on how the service is deployed and configured. Some services require specific architecture to qualify for a published SLA. That means reading the official SLA documentation matters. Also, a service SLA is not the same as your application uptime. Azure does give you service commitments, but you’re still responsible for the workload architecture, dependency design, failover behavior, and day-to-day operations. That part doesn’t go away just because a service has an SLA.

SLA Approximate monthly downtime for a 30-day month Meaning
99% About 7 hours and 18 minutes That’s still noticeable downtime, so an SLA isn’t the same thing as always-on availability
99.9% About 43 minutes 49 seconds Common service-level target, but not always-on
99.95% About 21 minutes 56 seconds Higher availability target
99.99% About 4 minutes 23 seconds Very high availability, still not zero downtime

Exam trap: “Guaranteed no downtime” is wrong. SLA does not mean zero downtime, and support plans do not increase SLA.

10. Composite SLAs and Availability Design

A composite SLA is the effective availability of a solution made up of multiple dependent components. For simple AZ-900 questions, if services are in a serial dependency chain and failures are treated as independent with no redundancy, you multiply the SLA values.

Example: if a web application depends on Service A at 99.9% and Service B at 99.9%, the simple composite calculation is 0.999 × 0.999 = 0.998001, or about 99.8001%. The lesson is that adding dependencies can reduce overall availability.

However, that multiplication rule is not universal. If you add redundancy in parallel, the math changes. That is why architecture matters so much.

A single VM hosting a business-critical application is a weak design because it creates a single point of failure. A more resilient design usually requires at least two VM instances, a load balancer, health probes, and application behavior that can tolerate instance failure. Simply placing a single VM in Azure does not create meaningful workload resilience.

Design Resilience Cost Pattern
Single VM Low; one failure can stop the app Lowest cost
Two VMs in Availability Set + Load Balancer Better protection against host and maintenance issues in one datacenter Higher cost because of extra instances and components
Two VMs across Availability Zones + Load Balancer Stronger fault isolation against datacenter-level failure within a region Usually higher cost and complexity

An Availability Set distributes VMs across fault domains and update domains on distinct hardware within a datacenter environment. An Availability Zone is a unique physical location within a region with independent power, cooling, and networking. Zones generally provide stronger fault isolation than availability sets, but actual workload availability still depends on service support, region support, replication design, and application architecture. Not all regions support zones, and not all services are zone-aware or zone-redundant.

11. High Availability vs Disaster Recovery

High availability focuses on minimizing interruption during local failures, such as host problems, maintenance events, or a datacenter issue within a region. Disaster recovery focuses on recovering from larger failures, such as regional outages. For AZ-900, think of zones as a common high-availability pattern and cross-region replication or failover as a disaster recovery pattern.

These are not the same thing. A zonal design may help if one datacenter fails, but it does not automatically protect against a full regional outage. Cross-region resilience often involves additional replication, backup, failover planning, and higher cost. It also introduces complexity around data consistency, recovery time, and testing.

12. Monitoring, Service Health, Advisor, and Support

These tools are easy to confuse, so separate them clearly. Azure Monitor collects telemetry such as metrics, logs, and alerts. Service Health informs you about Azure platform incidents, planned maintenance, and advisories that affect your environment. Azure Advisor gives recommendations across cost, reliability, security, performance, and operational excellence. Cost Management + Billing shows financial usage and spend. A support plan gives access to support response options.

In an outage scenario, a practical workflow is: Monitor alert fires, you check resource health and logs, then review Service Health to determine whether Azure has a platform issue. If the issue is application-side, you troubleshoot your workload. If you need help, your support plan determines support access, not service availability.

Key distinction: SLA is the availability commitment. Support is help when something goes wrong. They are related operationally, but they are not the same thing.

13. Exam Scenarios and Pitfalls

Which tool before deployment? Pricing Calculator.

Which tool compares on-premises with Azure? TCO Calculator.

Which tool analyzes actual spend? Cost Management + Billing.

Which tool gives recommendations? Azure Advisor.

Which tool shows telemetry and alerts? Azure Monitor.

Which tool shows Azure platform incidents? Service Health.

Which service enforces required tags or allowed regions? Azure Policy.

Which model fits steady 24/7 eligible workloads? Reservation.

Which model fits changing eligible compute usage? Savings Plan for Compute.

Which model fits interruptible batch jobs? Spot VM.

Common traps: budgets do not automatically stop spending, preview services may still be billable without a formal SLA, shutting down a VM inside the OS may not stop compute charges, support plans do not increase SLA, and a service SLA does not equal end-to-end application uptime.

14. Key Takeaways

For AZ-900, the big ideas are straightforward. Azure cost depends on service type, tier, region, usage, licensing, data transfer, and redundancy. Estimate with the Pricing Calculator, compare migration economics with the TCO Calculator, and monitor actual spend with Cost Management + Billing. Use tags, policy, and governance to improve visibility and control.

On the availability side, understand that SLAs are documented service commitments with conditions and possible service credits, not guarantees of zero downtime. Composite SLA for serial dependencies goes down as dependencies increase. Higher availability usually requires redundant design, such as multiple instances, load balancing, zone-aware deployment, or even cross-region recovery, and that usually costs more.

If you remember the flow of estimate → deploy → monitor → optimize, and you can explain the difference between cost tools, governance tools, and availability design choices, you will be in strong shape for this AZ-900 topic.