Azure Cost Management and Service Level Agreements for AZ-900: Practical Foundations for Pricing, Governance, and Availability

1. Introduction: Why Cost and Availability Matter in Azure

When people first get started with Azure, they tend to focus on the obvious bits first — virtual machines, storage, networking, web apps, databases, that sort of thing. In practice, though, two themes shape almost every real deployment decision: cost and availability. Teams can move fast and get something up and running, but still run into trouble later if they don’t really understand how Azure charges for services or what Microsoft’s uptime commitments actually mean.

That is exactly why Azure Cost Management and Service Level Agreements, or SLAs, matter in AZ-900. Cost management helps you estimate, monitor, allocate, and optimize spending. SLAs help you understand the availability commitment for a service when it is configured according to Microsoft’s documented terms. These two topics are tied together, really. Once you start adding more resilience, you’re usually adding more redundancy too, and that nearly always pushes the cost up.

For the exam, keep one core idea in mind: cost management is about controlling spend, while SLA is about understanding uptime commitments. Good Azure decisions balance both.

2. Azure Pricing Fundamentals

Azure commonly shifts IT spending toward an operational expense model, where you pay for services as you consume them, instead of buying all infrastructure up front. That is one of the major cloud advantages. Now, here’s the thing: Azure pricing isn’t one simple flat model. Azure doesn’t bill everything the same way, obviously. What you end up paying depends on the service you deploy, the region you choose, how long it runs, and whether you go for higher performance or extra redundancy.

The most common model is pay-as-you-go. It’s flexible, and that makes it a great fit for labs, development work, short-term projects, and anything where demand jumps around a bit. But flexibility doesn’t always mean cheapest, and that’s where a lot of beginners get caught out. If a workload runs steadily and predictably, commitment-based pricing can often bring the cost down quite a bit.

At a high level, Azure pricing usually falls into a few broad patterns:

Free services or free allowances for certain products and learning scenarios
Consumption-based pricing where you pay for compute time, storage used, transactions, requests, or bandwidth
Commitment-based pricing such as Reservations and Azure Savings Plan for Compute
Special pricing models such as Spot pricing for interruptible workloads and Dev/Test offers where applicable

Azure pricing also varies by region, SKU, service tier, operating system, redundancy option, and licensing model. A VM does not simply “cost X.” Its price depends on the VM family, size, region, hours used, disk type, software licensing, and related services such as backup or public IP addresses.

Factors That Affect Azure Cost

Factor	Why it changes price	Example
Resource type	Different Azure services use different pricing models	A VM, storage account, and SQL database are billed differently
Usage / consumption	More runtime, capacity, requests, or transactions increases cost	A VM running 24/7 costs more than one used only in office hours
Region	Prices vary by geography and service availability	UK South may be priced differently from West Europe
Pricing tier / SKU	Higher tiers usually add performance, features, or resilience	Premium storage costs more than standard storage
Performance level	More CPU, memory, IOPS, throughput, or database capacity costs more	A larger VM size has a higher hourly rate
Storage amount and access pattern	Capacity, access tier, redundancy, and transactions all matter	Hot storage with frequent reads costs differently from archive storage
Outbound data transfer	Data leaving Azure is commonly billed	A public website serving many downloads increases egress cost
Inter-region traffic	Traffic between regions can add bandwidth charges	Replication between two regions increases network cost
Licensing model	Included licenses or bring-your-own-license options change price	Windows Server VMs cost differently from comparable Linux VMs
Marketplace software	Third-party products can add software charges on top of Azure infrastructure	A security appliance image may include vendor licensing fees

Some billing mechanics are especially useful to know:

Compute: billed based on instance size and runtime
Managed disks: billed for provisioned disk size and type, not just actual data written
Storage accounts: billed for capacity, transactions, redundancy, and sometimes retrieval operations
Databases: billed by provisioned compute/storage or consumption model depending on the database service
Networking: inbound data transfer to Azure is generally free; outbound data transfer and some inter-region transfers are billed

One of the most common beginner mistakes is assuming the smallest or cheapest SKU is automatically the right one. If the workload really needs more memory, throughput, or IOPS, then going too small can cause performance pain — and that often ends up costing more to fix later.

Comparing Pricing Models

Model	Commitment	Best fit	Flexibility	Exam cue
Pay-as-you-go	None	Variable or short-term workloads	High	Use what you consume
Reservation	Term commitment for eligible resources	Stable, predictable usage	Lower	Commit for discount
Azure Savings Plan for Compute	Hourly spend commitment for eligible compute	Predictable compute usage with some flexibility	Higher than reservation	Flexible compute discount
Spot	No long-term commitment, interruptible capacity	Batch jobs, testing, fault-tolerant workloads	Variable availability	Cheap but can be evicted

3. Billing Scope and Governance Basics

For AZ-900, the subscription is a key scope for resource deployment, access control, policy application, and cost tracking. But it is not the only billing scope. Depending on the agreement type, costs can also be viewed and managed at broader account or enrollment scopes. That matters because large organisations often want visibility above a single subscription.

The basic Azure hierarchy looks like this:

Management groups – organize multiple subscriptions for governance at scale
Subscriptions – key scope for billing, policy, and access control
Resource groups – organize related resources for lifecycle management
Resources – actual services such as VMs, storage accounts, and databases

A resource group is not the same as a billing account. It is mainly an organizational container. Cost analysis can be grouped by resource group, but billing constructs can exist above the subscription depending on how the organisation buys Azure.

Cost governance relies on several Azure features working together:

RBAC controls who can create, modify, or view resources and cost data
Azure Policy can enforce rules such as allowed regions, required tags, or approved SKUs
Resource locks help prevent accidental deletion or modification
Tags help organize and allocate costs by department, environment, owner, or project
Budgets track spending thresholds and trigger alerts

Tags are brilliant for showback and chargeback, but on their own they don’t actually enforce anything. If you want tagging to stay consistent, you’ll usually lean on Azure Policy or some kind of automation to make tags mandatory during deployment. Likewise, budgets generate alerts and notifications; they do not automatically stop resources by default.

A practical governance setup might look something like this:

The finance team gets read-only access to cost data
Platform administrators can create shared infrastructure and policies
Application teams can deploy only into approved resource groups and regions
Production resources require tags such as Environment, Owner, and CostCenter
Premium SKUs are restricted unless there is approval

This is also part of the shared responsibility model. Microsoft is responsible for the underlying cloud platform, but customers remain responsible for sizing, shutdown schedules, tagging discipline, policy configuration, access control, and budget setup.

4. Azure Cost Estimation and Analysis Tools

Azure provides different tools for different stages of the cost lifecycle. For exam purposes, remember this mnemonic: Plan, Compare, Monitor, Recommend.

Plan = Azure Pricing Calculator
Compare = Azure TCO Calculator
Monitor = Cost Management + Billing in Azure
Recommend = Azure Advisor

Tool Comparison

Tool	Primary use	When to use it	Example output
Azure Pricing Calculator	Estimate Azure costs before deployment	Planning a new solution	Estimated monthly cost for selected services
Azure TCO Calculator	Compare on-premises costs with Azure	Building a migration business case	Estimated cost comparison over time
Cost Management + Billing in Azure	Track actual spend, budgets, forecasts, exports	Operating live Azure environments	Actual cost by scope, service, tag, or resource
Azure Advisor	Provide recommendations	Optimizing deployed resources	Resize, shutdown, reliability, security, and performance suggestions

Azure Pricing Calculator is for pre-deployment estimation. A typical workflow usually goes something like this:

First, you choose the services you think you’ll need — things like App Service, Azure SQL Database, Storage, and bandwidth.
Choose the region.
Select the SKU or pricing tier.
Enter expected instance count, hours, storage capacity, and outbound data.
Then you check the monthly estimate and compare the different tiers.

For example, a small web app might use one App Service plan, one Azure SQL Database, a storage account, and a bit of outbound bandwidth. The most common estimating mistakes I see are forgetting bandwidth, backups, monitoring ingestion, or just getting the runtime assumptions wrong.

Azure TCO Calculator is different. It compares the cost of an on-premises environment with Azure. Typical inputs include number of servers, storage, databases, virtualization, networking, electricity, facilities, and IT operations assumptions. The output is directional rather than exact. It helps support business-case conversations, not live billing analysis.

Cost Management + Billing in Azure is the operational tool. You use it to look at actual spend, set budgets, view forecasts, filter by service or tag, and export data for reporting. Cost analysis can often be scoped beyond a single subscription depending on the billing setup.

Azure Advisor provides recommendations across cost, reliability, security, operational excellence, and performance. It might suggest rightsizing, flag idle resources, or point you toward commitment discounts where they actually make sense. Advisor recommends; it does not enforce.

Budgets, Alerts, and Forecasting in Practice

Budgets are one of the most commonly tested cost-control ideas in AZ-900. You set a spending threshold for a scope such as a subscription or resource group, and then configure alert points like 80%, 90%, or 100% of that budget. When the threshold gets hit, Azure sends out notifications.

Important exam point: budgets alert on spend; they do not automatically shut down resources by default. If an organisation wants enforcement, it must connect alerts to separate automation or operational processes.

Forecasting is also useful. Cost Management can estimate expected end-of-period spend based on current trends. That helps teams react before the invoice arrives.

Investigating Spend with Cost Analysis

When a bill rises unexpectedly, a practical workflow is:

Open Cost Analysis for the correct scope.
First, check whether the increase happened suddenly or crept up over time.
Then group the spend by service, resource group, or location.
Filter by tags such as environment or department.
Identify the top cost contributor.
Review Activity Log and deployment history to see what changed.
After that, check Azure Advisor for optimization recommendations.
And don’t forget to look for egress, logging, backups, snapshots, or scale-out events.

Cost data can also be exported on a schedule for reporting or FinOps analysis. That is useful for finance teams or central cloud governance teams.

5. Cost Optimization Strategies in Azure

Cost optimization isn’t about blindly picking the cheapest option. At the end of the day, it’s about matching the service and pricing model to the workload you really have.

Rightsize resources based on actual CPU, memory, throughput, and usage trends
Deallocate unused VMs when they are not needed
Use autoscaling for variable demand
Choose the right pricing model for predictable versus unpredictable usage
Review Advisor recommendations regularly
Set budgets and alerts to detect overspend early

VM Billing: Running vs Stopped vs Deallocated

This is a really important technical distinction. If you shut down a VM from inside the guest operating system, it might look stopped, but it can still be allocated in Azure. In that state, compute charges may still continue. To stop VM compute charges, the VM must be stopped (deallocated) from Azure.

Even when a VM is deallocated, some costs can remain, including:

Managed disks
Snapshots
Backup storage
Reserved public IP addresses in some scenarios
Monitoring and log retention

That is why a deallocated VM can still appear on the bill, even though compute charges have stopped.

Autoscaling and Scheduling

Autoscaling helps align cost with demand. A simple example is a web application that runs two instances during business hours and scales to four when CPU stays above a threshold for several minutes. When demand falls, the service scales back down.

Common autoscale patterns include:

Metric-based scaling such as CPU, memory, or request count
Schedule-based scaling such as more capacity during office hours
Cooldown periods to avoid rapid scaling up and down

Autoscale is useful for fluctuating workloads, but less useful for constant 24/7 demand where commitment discounts may provide better savings.

Reservations vs Savings Plan vs Pay-As-You-Go

People mix these up all the time, so it’s worth separating them clearly:

Pay-as-you-go: no commitment, maximum flexibility
Reservations: commit to eligible resource usage for a term to get discounted pricing
Azure Savings Plan for Compute: commit to an hourly spend amount for eligible compute services, with more flexibility across instance types and regions than some reservation scenarios

Reservations are usually a strong fit for highly stable workloads. Savings Plan is useful when compute usage is predictable in total spend but may vary across services or sizes. Actual savings depend on how well real usage matches the commitment and scope.

Common Hidden Azure Cost Drivers

Outbound bandwidth and inter-region traffic
Idle managed disks left after VM deletion
Snapshots and backup retention
Diagnostic log ingestion and long retention settings
NAT Gateway and gateway-related networking charges
Overprovisioned premium tiers
Unused public IP addresses or other orphaned resources

6. Storage and Networking Cost Choices

Storage and networking often create surprise charges because teams focus on compute first.

For storage, cost is influenced by three separate ideas that beginners often mix up:

Access tier – hot, cool, archive; affects cost based on how often data is accessed
Performance tier – standard versus premium; affects latency and throughput
Redundancy option – such as LRS, ZRS, GRS, or GZRS; affects durability and sometimes resilience characteristics

These are not the same thing. A higher redundancy setting usually relates more to durability and resilience of stored data, while performance tier affects speed, and access tier affects storage economics based on retrieval pattern.

A practical example:

Active application files that are read frequently may belong in hot storage
Backup data accessed occasionally may fit cool storage
Long-term archive data may use archive tier if retrieval delays are acceptable

For networking, the most common cost driver is egress, meaning data leaving Azure. Other networking charges can come from VPN gateways, ExpressRoute, load balancer SKUs, NAT Gateway, public IP usage, and inter-region replication traffic. A design that moves large amounts of data between regions can cost much more than one kept local to a single region.

7. Introduction to Azure Service Level Agreements

A Service Level Agreement, or SLA, is a financially backed commitment from Microsoft about service availability over a defined period. If the service does not meet the documented SLA terms, the remedy is typically a service credit, not compensation for business losses.

An SLA is not the same as backup, disaster recovery, or fault tolerance. It is also not a promise of zero downtime. It applies only when the service is used according to Microsoft’s SLA terms and required configuration.

Also remember that not all services have the same SLA model. Some free services, preview services, or specific feature tiers may have no SLA.

Availability vs Durability vs Backup vs Disaster Recovery

Availability = whether the service is accessible and running
Durability = likelihood that data remains intact over time
Backup = point-in-time recovery of data
Disaster recovery = restoring service after a major outage, such as regional failure

A storage service can be highly durable for data without automatically giving your full application high availability. That distinction matters.

SLA Percentage to Downtime

SLA	Approximate downtime per month	Approximate downtime per year
99%	About 7 hours 18 minutes	About 3 days 15 hours
99.9%	About 43 minutes	About 8 hours 45 minutes
99.95%	About 21 minutes	About 4 hours 23 minutes
99.99%	About 4 minutes 23 seconds	About 52 minutes

Higher percentages usually require more redundant design and therefore more cost.

8. Composite SLA and Multi-Service Applications

Composite SLA is the end-to-end availability of a solution that depends on multiple services. For AZ-900, composite SLA is typically calculated by multiplying the SLAs of required dependent services when the services are treated as serial dependencies and failures are assumed independent for exam purposes.

Example with two required services:

Service A = 99.9% = 0.999
Service B = 99.95% = 0.9995

Composite SLA:

0.999 × 0.9995 = 0.9985005 = about 99.85%

Example with a simple three-tier application:

Web tier = 99.95% = 0.9995
App tier = 99.95% = 0.9995
Database = 99.99% = 0.9999

Composite SLA:

0.9995 × 0.9995 × 0.9999 ≈ 0.9989 = about 99.89%

The key lesson is that the overall application availability can be lower than the SLA of each individual component.

In real architectures, redundancy can improve effective availability. For example, if a front-end tier has multiple healthy instances behind a load balancer, the design may tolerate one instance failing. That is why real-world availability is about architecture, not just multiplying published numbers.

9. Azure Features That Influence Availability

Azure offers several design options to improve availability, but they protect against different failure scopes.

Feature	What it helps with	Cost impact	Typical use
Availability Set	Spreads VMs across fault domains and update domains within a datacenter	Moderate	Redundant VM deployment in one datacenter scope
Availability Zone	Protects against datacenter-level failure within a region	Higher	Business-critical regional resilience
Zone-redundant service	Service-managed redundancy across zones	Varies	Managed services with built-in resilience
Multi-region DR	Protects against regional outage	Highest	Disaster recovery and global resilience

Availability Sets help protect multiple VMs from planned maintenance or localized hardware failure within a datacenter. Availability Zones are physically separate locations within a region and provide stronger resilience.

A single VM can have an SLA under certain conditions, depending on the service-specific SLA terms and configuration. However, higher availability targets generally require redundant design such as multiple VMs in an Availability Set or Availability Zones.

Load balancing is also important. Multiple instances without a load balancer may still leave traffic handling or failover poorly designed. Health probes and traffic distribution help the application continue serving users when one instance fails.

Region pairs are a Microsoft resiliency construct used for platform recovery priorities and continuity planning. But region pairs do not automatically make your application multi-region. Customers must still design replication, failover, testing, and recovery procedures.

10. Cost vs Availability Trade-Offs

This is where Azure design becomes practical. Better availability usually means duplicate resources, more networking, more data replication, and more operational complexity. That increases cost. The right design depends on business criticality.

Scenario	Cost profile	Availability profile	Best fit
Single VM, pay-as-you-go	Low	Basic	Labs, dev/test, temporary workloads
Two VMs in Availability Set	Moderate	Improved within one datacenter scope	Basic production apps
Zonal deployment with load balancing	Higher	High regional resilience	Business-critical production
Multi-region active-passive or active-active	Highest	Very high with DR capability	Mission-critical services

A useful decision framework is:

How critical is the workload to the business?
How much downtime can users tolerate?
What recovery time and recovery point expectations exist?
What budget constraints apply?
Are there compliance or regional requirements?

For AZ-900, you do not need deep DR engineering, but you should understand that SLA is not the same as DR, and higher availability usually costs more.

11. Troubleshooting Unexpected Azure Spend and Availability Issues

When cost spikes happen, use a repeatable process instead of guessing.

Confirm the scope in Cost Management.
Check forecast versus actual spend.
Group costs by service and resource.
Look for recent deployments or configuration changes in Activity Log.
Check whether autoscaling increased instance count.
Review network egress and inter-region transfer.
Look for snapshots, backups, or logging retention growth.
Use Advisor to identify idle or oversized resources.

Example: if the monthly bill jumps by 35%, the root cause might be a sudden increase in outbound traffic, a premium SKU selected during a deployment, diagnostic logs retained too long, or forgotten test resources left running.

For availability issues, check whether the architecture included redundancy at all. Many outages are not caused by Azure breaking its SLA, but by a workload being deployed as a single point of failure.

12. Support Plans and Service Lifecycle Concepts

Azure support plans affect how quickly you can get help and what support channels are available. They do not increase the uptime SLA of Azure services. Support plans are about response and guidance, not service availability guarantees.

Support plan names, response targets, and features can change over time, so current details should always be verified in Microsoft's official documentation. For AZ-900, the exam-safe point is simple: production environments often justify stronger support coverage than a lab or dev/test subscription.

Service lifecycle also matters:

Generally Available (GA) services are intended for production use and typically have normal support and SLA expectations
Preview services or features may have limited support and may not have production SLAs

Preview does not automatically mean “unsafe,” but it does mean you should verify service-specific terms before relying on it for critical workloads.

13. AZ-900 Exam Tips, Scenarios, and Common Pitfalls

AZ-900 usually tests whether you can distinguish similar concepts, not whether you can design a full enterprise platform.

Exam Trap Summary

Pricing Calculator estimates future cost; Cost Management + Billing shows actual spend
TCO Calculator compares on-premises with Azure; it does not show your Azure invoice
Tags organize and report cost; Policy enforces rules; budgets alert on thresholds
Budgets do not automatically stop resources by default
SLA is a promise; high availability is a design; DR is recovery; backup is data protection
Support plan does not increase service SLA
Preview does not offer the same production assurances as GA
Single resource does not automatically mean high availability
Stopping a VM inside the OS is not the same as deallocating it in Azure

Mini Exam Scenarios

Scenario 1: A company wants to estimate the monthly cost of a new Azure deployment before creating resources. The correct tool is the Azure Pricing Calculator.

Scenario 2: A finance team wants to compare three years of on-premises server costs against moving to Azure. The correct tool is the Azure TCO Calculator.

Scenario 3: A subscription exceeded 90% of its monthly spending threshold and sent an email alert. That is a budget alert, not an automatic shutdown.

Scenario 4: A workload requires better uptime than a single VM can provide. The likely improvement is redundant instances with load balancing, possibly using Availability Sets or Availability Zones depending on the requirement.

What Microsoft Expects You to Know for AZ-900

Know the purpose of pricing, TCO, cost management, and advisor tools
Know the difference between tags, policy, RBAC, and budgets
Know what an SLA means and what it does not mean
Know that architecture affects availability and cost
Do not over-focus on memorizing every SKU or exact support-plan detail

14. Conclusion

Azure cost management is both planning and operations. You estimate before deployment, monitor after deployment, investigate changes, and optimize continuously. SLAs tell you what availability Microsoft commits to under documented conditions, but they do not remove the need for sound architecture, backup, or disaster recovery planning.

The best Azure decisions balance cost, governance, performance, and business criticality. A low-cost design may be perfect for dev/test. A production system may justify zones, load balancing, commitment discounts, and stronger governance. For AZ-900, the most valuable skill is understanding the differences between the tools, pricing models, and availability concepts so you can choose the right answer deliberately rather than by guesswork.