Azure Cost Management and Service Level Agreements for AZ-900: Practical Foundations for Pricing, Governance, and Availability
1. Introduction: Why Cost and Availability Matter in Azure
When people first get started with Azure, they tend to focus on the obvious bits first — virtual machines, storage, networking, web apps, databases, that sort of thing. In practice, though, two themes shape almost every real deployment decision: cost and availability. Teams can move fast and get something up and running, but still run into trouble later if they don’t really understand how Azure charges for services or what Microsoft’s uptime commitments actually mean.
That is exactly why Azure Cost Management and Service Level Agreements, or SLAs, matter in AZ-900. Cost management helps you estimate, monitor, allocate, and optimize spending. SLAs help you understand the availability commitment for a service when it is configured according to Microsoft’s documented terms. These two topics are tied together, really. Once you start adding more resilience, you’re usually adding more redundancy too, and that nearly always pushes the cost up.
For the exam, keep one core idea in mind: cost management is about controlling spend, while SLA is about understanding uptime commitments. Good Azure decisions balance both.
2. Azure Pricing Fundamentals
Azure commonly shifts IT spending toward an operational expense model, where you pay for services as you consume them, instead of buying all infrastructure up front. That is one of the major cloud advantages. Now, here’s the thing: Azure pricing isn’t one simple flat model. Azure doesn’t bill everything the same way, obviously. What you end up paying depends on the service you deploy, the region you choose, how long it runs, and whether you go for higher performance or extra redundancy.
The most common model is pay-as-you-go. It’s flexible, and that makes it a great fit for labs, development work, short-term projects, and anything where demand jumps around a bit. But flexibility doesn’t always mean cheapest, and that’s where a lot of beginners get caught out. If a workload runs steadily and predictably, commitment-based pricing can often bring the cost down quite a bit.
At a high level, Azure pricing usually falls into a few broad patterns:
- Free services or free allowances for certain products and learning scenarios
- Consumption-based pricing where you pay for compute time, storage used, transactions, requests, or bandwidth
- Commitment-based pricing such as Reservations and Azure Savings Plan for Compute
- Special pricing models such as Spot pricing for interruptible workloads and Dev/Test offers where applicable
Azure pricing also varies by region, SKU, service tier, operating system, redundancy option, and licensing model. A VM does not simply “cost X.” Its price depends on the VM family, size, region, hours used, disk type, software licensing, and related services such as backup or public IP addresses.
Factors That Affect Azure Cost
| Factor | Why it changes price | Example |
|---|---|---|
| Resource type | Different Azure services use different pricing models | A VM, storage account, and SQL database are billed differently |
| Usage / consumption | More runtime, capacity, requests, or transactions increases cost | A VM running 24/7 costs more than one used only in office hours |
| Region | Prices vary by geography and service availability | UK South may be priced differently from West Europe |
| Pricing tier / SKU | Higher tiers usually add performance, features, or resilience | Premium storage costs more than standard storage |
| Performance level | More CPU, memory, IOPS, throughput, or database capacity costs more | A larger VM size has a higher hourly rate |
| Storage amount and access pattern | Capacity, access tier, redundancy, and transactions all matter | Hot storage with frequent reads costs differently from archive storage |
| Outbound data transfer | Data leaving Azure is commonly billed | A public website serving many downloads increases egress cost |
| Inter-region traffic | Traffic between regions can add bandwidth charges | Replication between two regions increases network cost |
| Licensing model | Included licenses or bring-your-own-license options change price | Windows Server VMs cost differently from comparable Linux VMs |
| Marketplace software | Third-party products can add software charges on top of Azure infrastructure | A security appliance image may include vendor licensing fees |
Some billing mechanics are especially useful to know:
- Compute: billed based on instance size and runtime
- Managed disks: billed for provisioned disk size and type, not just actual data written
- Storage accounts: billed for capacity, transactions, redundancy, and sometimes retrieval operations
- Databases: billed by provisioned compute/storage or consumption model depending on the database service
- Networking: inbound data transfer to Azure is generally free; outbound data transfer and some inter-region transfers are billed
One of the most common beginner mistakes is assuming the smallest or cheapest SKU is automatically the right one. If the workload really needs more memory, throughput, or IOPS, then going too small can cause performance pain — and that often ends up costing more to fix later.
Comparing Pricing Models
| Model | Commitment | Best fit | Flexibility | Exam cue |
|---|---|---|---|---|
| Pay-as-you-go | None | Variable or short-term workloads | High | Use what you consume |
| Reservation | Term commitment for eligible resources | Stable, predictable usage | Lower | Commit for discount |
| Azure Savings Plan for Compute | Hourly spend commitment for eligible compute | Predictable compute usage with some flexibility | Higher than reservation | Flexible compute discount |
| Spot | No long-term commitment, interruptible capacity | Batch jobs, testing, fault-tolerant workloads | Variable availability | Cheap but can be evicted |
3. Billing Scope and Governance Basics
For AZ-900, the subscription is a key scope for resource deployment, access control, policy application, and cost tracking. But it is not the only billing scope. Depending on the agreement type, costs can also be viewed and managed at broader account or enrollment scopes. That matters because large organisations often want visibility above a single subscription.
The basic Azure hierarchy looks like this:
- Management groups – organize multiple subscriptions for governance at scale
- Subscriptions – key scope for billing, policy, and access control
- Resource groups – organize related resources for lifecycle management
- Resources – actual services such as VMs, storage accounts, and databases
A resource group is not the same as a billing account. It is mainly an organizational container. Cost analysis can be grouped by resource group, but billing constructs can exist above the subscription depending on how the organisation buys Azure.
Cost governance relies on several Azure features working together:
- RBAC controls who can create, modify, or view resources and cost data
- Azure Policy can enforce rules such as allowed regions, required tags, or approved SKUs
- Resource locks help prevent accidental deletion or modification
- Tags help organize and allocate costs by department, environment, owner, or project
- Budgets track spending thresholds and trigger alerts
Tags are brilliant for showback and chargeback, but on their own they don’t actually enforce anything. If you want tagging to stay consistent, you’ll usually lean on Azure Policy or some kind of automation to make tags mandatory during deployment. Likewise, budgets generate alerts and notifications; they do not automatically stop resources by default.
A practical governance setup might look something like this:
- The finance team gets read-only access to cost data
- Platform administrators can create shared infrastructure and policies
- Application teams can deploy only into approved resource groups and regions
- Production resources require tags such as
Environment,Owner, andCostCenter - Premium SKUs are restricted unless there is approval
This is also part of the shared responsibility model. Microsoft is responsible for the underlying cloud platform, but customers remain responsible for sizing, shutdown schedules, tagging discipline, policy configuration, access control, and budget setup.
4. Azure Cost Estimation and Analysis Tools
Azure provides different tools for different stages of the cost lifecycle. For exam purposes, remember this mnemonic: Plan, Compare, Monitor, Recommend.
- Plan = Azure Pricing Calculator
- Compare = Azure TCO Calculator
- Monitor = Cost Management + Billing in Azure
- Recommend = Azure Advisor
Tool Comparison
| Tool | Primary use | When to use it | Example output |
|---|---|---|---|
| Azure Pricing Calculator | Estimate Azure costs before deployment | Planning a new solution | Estimated monthly cost for selected services |
| Azure TCO Calculator | Compare on-premises costs with Azure | Building a migration business case | Estimated cost comparison over time |
| Cost Management + Billing in Azure | Track actual spend, budgets, forecasts, exports | Operating live Azure environments | Actual cost by scope, service, tag, or resource |
| Azure Advisor | Provide recommendations | Optimizing deployed resources | Resize, shutdown, reliability, security, and performance suggestions |
Azure Pricing Calculator is for pre-deployment estimation. A typical workflow usually goes something like this:
- First, you choose the services you think you’ll need — things like App Service, Azure SQL Database, Storage, and bandwidth.
- Choose the region.
- Select the SKU or pricing tier.
- Enter expected instance count, hours, storage capacity, and outbound data.
- Then you check the monthly estimate and compare the different tiers.
For example, a small web app might use one App Service plan, one Azure SQL Database, a storage account, and a bit of outbound bandwidth. The most common estimating mistakes I see are forgetting bandwidth, backups, monitoring ingestion, or just getting the runtime assumptions wrong.
Azure TCO Calculator is different. It compares the cost of an on-premises environment with Azure. Typical inputs include number of servers, storage, databases, virtualization, networking, electricity, facilities, and IT operations assumptions. The output is directional rather than exact. It helps support business-case conversations, not live billing analysis.
Cost Management + Billing in Azure is the operational tool. You use it to look at actual spend, set budgets, view forecasts, filter by service or tag, and export data for reporting. Cost analysis can often be scoped beyond a single subscription depending on the billing setup.
Azure Advisor provides recommendations across cost, reliability, security, operational excellence, and performance. It might suggest rightsizing, flag idle resources, or point you toward commitment discounts where they actually make sense. Advisor recommends; it does not enforce.
Budgets, Alerts, and Forecasting in Practice
Budgets are one of the most commonly tested cost-control ideas in AZ-900. You set a spending threshold for a scope such as a subscription or resource group, and then configure alert points like 80%, 90%, or 100% of that budget. When the threshold gets hit, Azure sends out notifications.
Important exam point: budgets alert on spend; they do not automatically shut down resources by default. If an organisation wants enforcement, it must connect alerts to separate automation or operational processes.
Forecasting is also useful. Cost Management can estimate expected end-of-period spend based on current trends. That helps teams react before the invoice arrives.
Investigating Spend with Cost Analysis
When a bill rises unexpectedly, a practical workflow is:
- Open Cost Analysis for the correct scope.
- First, check whether the increase happened suddenly or crept up over time.
- Then group the spend by service, resource group, or location.
- Filter by tags such as environment or department.
- Identify the top cost contributor.
- Review Activity Log and deployment history to see what changed.
- After that, check Azure Advisor for optimization recommendations.
- And don’t forget to look for egress, logging, backups, snapshots, or scale-out events.
Cost data can also be exported on a schedule for reporting or FinOps analysis. That is useful for finance teams or central cloud governance teams.
5. Cost Optimization Strategies in Azure
Cost optimization isn’t about blindly picking the cheapest option. At the end of the day, it’s about matching the service and pricing model to the workload you really have.
- Rightsize resources based on actual CPU, memory, throughput, and usage trends
- Deallocate unused VMs when they are not needed
- Use autoscaling for variable demand
- Choose the right pricing model for predictable versus unpredictable usage
- Review Advisor recommendations regularly
- Set budgets and alerts to detect overspend early
VM Billing: Running vs Stopped vs Deallocated
This is a really important technical distinction. If you shut down a VM from inside the guest operating system, it might look stopped, but it can still be allocated in Azure. In that state, compute charges may still continue. To stop VM compute charges, the VM must be stopped (deallocated) from Azure.
Even when a VM is deallocated, some costs can remain, including:
- Managed disks
- Snapshots
- Backup storage
- Reserved public IP addresses in some scenarios
- Monitoring and log retention
That is why a deallocated VM can still appear on the bill, even though compute charges have stopped.
Autoscaling and Scheduling
Autoscaling helps align cost with demand. A simple example is a web application that runs two instances during business hours and scales to four when CPU stays above a threshold for several minutes. When demand falls, the service scales back down.
Common autoscale patterns include:
- Metric-based scaling such as CPU, memory, or request count
- Schedule-based scaling such as more capacity during office hours
- Cooldown periods to avoid rapid scaling up and down
Autoscale is useful for fluctuating workloads, but less useful for constant 24/7 demand where commitment discounts may provide better savings.
Reservations vs Savings Plan vs Pay-As-You-Go
People mix these up all the time, so it’s worth separating them clearly:
- Pay-as-you-go: no commitment, maximum flexibility
- Reservations: commit to eligible resource usage for a term to get discounted pricing
- Azure Savings Plan for Compute: commit to an hourly spend amount for eligible compute services, with more flexibility across instance types and regions than some reservation scenarios
Reservations are usually a strong fit for highly stable workloads. Savings Plan is useful when compute usage is predictable in total spend but may vary across services or sizes. Actual savings depend on how well real usage matches the commitment and scope.
Common Hidden Azure Cost Drivers
- Outbound bandwidth and inter-region traffic
- Idle managed disks left after VM deletion
- Snapshots and backup retention
- Diagnostic log ingestion and long retention settings
- NAT Gateway and gateway-related networking charges
- Overprovisioned premium tiers
- Unused public IP addresses or other orphaned resources
6. Storage and Networking Cost Choices
Storage and networking often create surprise charges because teams focus on compute first.
For storage, cost is influenced by three separate ideas that beginners often mix up:
- Access tier – hot, cool, archive; affects cost based on how often data is accessed
- Performance tier – standard versus premium; affects latency and throughput
- Redundancy option – such as LRS, ZRS, GRS, or GZRS; affects durability and sometimes resilience characteristics
These are not the same thing. A higher redundancy setting usually relates more to durability and resilience of stored data, while performance tier affects speed, and access tier affects storage economics based on retrieval pattern.
A practical example:
- Active application files that are read frequently may belong in hot storage
- Backup data accessed occasionally may fit cool storage
- Long-term archive data may use archive tier if retrieval delays are acceptable
For networking, the most common cost driver is egress, meaning data leaving Azure. Other networking charges can come from VPN gateways, ExpressRoute, load balancer SKUs, NAT Gateway, public IP usage, and inter-region replication traffic. A design that moves large amounts of data between regions can cost much more than one kept local to a single region.
7. Introduction to Azure Service Level Agreements
A Service Level Agreement, or SLA, is a financially backed commitment from Microsoft about service availability over a defined period. If the service does not meet the documented SLA terms, the remedy is typically a service credit, not compensation for business losses.
An SLA is not the same as backup, disaster recovery, or fault tolerance. It is also not a promise of zero downtime. It applies only when the service is used according to Microsoft’s SLA terms and required configuration.
Also remember that not all services have the same SLA model. Some free services, preview services, or specific feature tiers may have no SLA.
Availability vs Durability vs Backup vs Disaster Recovery
- Availability = whether the service is accessible and running
- Durability = likelihood that data remains intact over time
- Backup = point-in-time recovery of data
- Disaster recovery = restoring service after a major outage, such as regional failure
A storage service can be highly durable for data without automatically giving your full application high availability. That distinction matters.
SLA Percentage to Downtime
| SLA | Approximate downtime per month | Approximate downtime per year |
|---|---|---|
| 99% | About 7 hours 18 minutes | About 3 days 15 hours |
| 99.9% | About 43 minutes | About 8 hours 45 minutes |
| 99.95% | About 21 minutes | About 4 hours 23 minutes |
| 99.99% | About 4 minutes 23 seconds | About 52 minutes |
Higher percentages usually require more redundant design and therefore more cost.
8. Composite SLA and Multi-Service Applications
Composite SLA is the end-to-end availability of a solution that depends on multiple services. For AZ-900, composite SLA is typically calculated by multiplying the SLAs of required dependent services when the services are treated as serial dependencies and failures are assumed independent for exam purposes.
Example with two required services:
- Service A = 99.9% = 0.999
- Service B = 99.95% = 0.9995
Composite SLA:
0.999 × 0.9995 = 0.9985005 = about 99.85%
Example with a simple three-tier application:
- Web tier = 99.95% = 0.9995
- App tier = 99.95% = 0.9995
- Database = 99.99% = 0.9999
Composite SLA:
0.9995 × 0.9995 × 0.9999 ≈ 0.9989 = about 99.89%
The key lesson is that the overall application availability can be lower than the SLA of each individual component.
In real architectures, redundancy can improve effective availability. For example, if a front-end tier has multiple healthy instances behind a load balancer, the design may tolerate one instance failing. That is why real-world availability is about architecture, not just multiplying published numbers.
9. Azure Features That Influence Availability
Azure offers several design options to improve availability, but they protect against different failure scopes.
| Feature | What it helps with | Cost impact | Typical use |
|---|---|---|---|
| Availability Set | Spreads VMs across fault domains and update domains within a datacenter | Moderate | Redundant VM deployment in one datacenter scope |
| Availability Zone | Protects against datacenter-level failure within a region | Higher | Business-critical regional resilience |
| Zone-redundant service | Service-managed redundancy across zones | Varies | Managed services with built-in resilience |
| Multi-region DR | Protects against regional outage | Highest | Disaster recovery and global resilience |
Availability Sets help protect multiple VMs from planned maintenance or localized hardware failure within a datacenter. Availability Zones are physically separate locations within a region and provide stronger resilience.
A single VM can have an SLA under certain conditions, depending on the service-specific SLA terms and configuration. However, higher availability targets generally require redundant design such as multiple VMs in an Availability Set or Availability Zones.
Load balancing is also important. Multiple instances without a load balancer may still leave traffic handling or failover poorly designed. Health probes and traffic distribution help the application continue serving users when one instance fails.
Region pairs are a Microsoft resiliency construct used for platform recovery priorities and continuity planning. But region pairs do not automatically make your application multi-region. Customers must still design replication, failover, testing, and recovery procedures.
10. Cost vs Availability Trade-Offs
This is where Azure design becomes practical. Better availability usually means duplicate resources, more networking, more data replication, and more operational complexity. That increases cost. The right design depends on business criticality.
| Scenario | Cost profile | Availability profile | Best fit |
|---|---|---|---|
| Single VM, pay-as-you-go | Low | Basic | Labs, dev/test, temporary workloads |
| Two VMs in Availability Set | Moderate | Improved within one datacenter scope | Basic production apps |
| Zonal deployment with load balancing | Higher | High regional resilience | Business-critical production |
| Multi-region active-passive or active-active | Highest | Very high with DR capability | Mission-critical services |
A useful decision framework is:
- How critical is the workload to the business?
- How much downtime can users tolerate?
- What recovery time and recovery point expectations exist?
- What budget constraints apply?
- Are there compliance or regional requirements?
For AZ-900, you do not need deep DR engineering, but you should understand that SLA is not the same as DR, and higher availability usually costs more.
11. Troubleshooting Unexpected Azure Spend and Availability Issues
When cost spikes happen, use a repeatable process instead of guessing.
- Confirm the scope in Cost Management.
- Check forecast versus actual spend.
- Group costs by service and resource.
- Look for recent deployments or configuration changes in Activity Log.
- Check whether autoscaling increased instance count.
- Review network egress and inter-region transfer.
- Look for snapshots, backups, or logging retention growth.
- Use Advisor to identify idle or oversized resources.
Example: if the monthly bill jumps by 35%, the root cause might be a sudden increase in outbound traffic, a premium SKU selected during a deployment, diagnostic logs retained too long, or forgotten test resources left running.
For availability issues, check whether the architecture included redundancy at all. Many outages are not caused by Azure breaking its SLA, but by a workload being deployed as a single point of failure.
12. Support Plans and Service Lifecycle Concepts
Azure support plans affect how quickly you can get help and what support channels are available. They do not increase the uptime SLA of Azure services. Support plans are about response and guidance, not service availability guarantees.
Support plan names, response targets, and features can change over time, so current details should always be verified in Microsoft's official documentation. For AZ-900, the exam-safe point is simple: production environments often justify stronger support coverage than a lab or dev/test subscription.
Service lifecycle also matters:
- Generally Available (GA) services are intended for production use and typically have normal support and SLA expectations
- Preview services or features may have limited support and may not have production SLAs
Preview does not automatically mean “unsafe,” but it does mean you should verify service-specific terms before relying on it for critical workloads.
13. AZ-900 Exam Tips, Scenarios, and Common Pitfalls
AZ-900 usually tests whether you can distinguish similar concepts, not whether you can design a full enterprise platform.
Exam Trap Summary
- Pricing Calculator estimates future cost; Cost Management + Billing shows actual spend
- TCO Calculator compares on-premises with Azure; it does not show your Azure invoice
- Tags organize and report cost; Policy enforces rules; budgets alert on thresholds
- Budgets do not automatically stop resources by default
- SLA is a promise; high availability is a design; DR is recovery; backup is data protection
- Support plan does not increase service SLA
- Preview does not offer the same production assurances as GA
- Single resource does not automatically mean high availability
- Stopping a VM inside the OS is not the same as deallocating it in Azure
Mini Exam Scenarios
Scenario 1: A company wants to estimate the monthly cost of a new Azure deployment before creating resources. The correct tool is the Azure Pricing Calculator.
Scenario 2: A finance team wants to compare three years of on-premises server costs against moving to Azure. The correct tool is the Azure TCO Calculator.
Scenario 3: A subscription exceeded 90% of its monthly spending threshold and sent an email alert. That is a budget alert, not an automatic shutdown.
Scenario 4: A workload requires better uptime than a single VM can provide. The likely improvement is redundant instances with load balancing, possibly using Availability Sets or Availability Zones depending on the requirement.
What Microsoft Expects You to Know for AZ-900
- Know the purpose of pricing, TCO, cost management, and advisor tools
- Know the difference between tags, policy, RBAC, and budgets
- Know what an SLA means and what it does not mean
- Know that architecture affects availability and cost
- Do not over-focus on memorizing every SKU or exact support-plan detail
14. Conclusion
Azure cost management is both planning and operations. You estimate before deployment, monitor after deployment, investigate changes, and optimize continuously. SLAs tell you what availability Microsoft commits to under documented conditions, but they do not remove the need for sound architecture, backup, or disaster recovery planning.
The best Azure decisions balance cost, governance, performance, and business criticality. A low-cost design may be perfect for dev/test. A production system may justify zones, load balancing, commitment discounts, and stronger governance. For AZ-900, the most valuable skill is understanding the differences between the tools, pricing models, and availability concepts so you can choose the right answer deliberately rather than by guesswork.