Azure Virtual Machines (VMs): Core IaaS in Microsoft Azure — The Complete, Practical Guide
A definitive, black-and-white handbook to plan, deploy, secure, monitor, and troubleshoot Azure Virtual Machines (VMs) at scale.
1) Introduction to Azure Virtual Machines
Azure Virtual Machines (VMs) are the core Infrastructure-as-a-Service (IaaS) compute offering in Microsoft Azure. They let you run Windows or Linux workloads on demand, without buying or maintaining physical servers. You provision CPU, memory, and storage capacity in minutes, attach network interfaces, assign public or private IPs, and deploy applications just like on-premises—except you inherit cloud elasticity, global reach, and integrated security/operations tooling.
VMs are ideal when you need full OS-level control, custom images, low-level tuning, or software that isn’t yet cloud-native. They integrate tightly with Entra ID, Azure Backup, Azure Site Recovery, Azure Policy, Microsoft Defender for Cloud, Azure Arc, Load Balancer, and Application Gateway.
When to choose VMs over PaaS or Serverless
- Applications require specific OS versions, kernel modules, or agent-based tools.
- Lift-and-shift of legacy apps with minimal refactoring.
- Consistent performance with reserved capacity and fine-grained tuning.
2) Key Features of Azure VMs
Elastic & On-Demand
Scale up/down or in/out. Pay only for what you use with the pay-as-you-go model.
Availability & Resilience
Availability Zones, Availability Sets, and cross-region DR with ASR.
Images & Acceleration
Custom images, Accelerated Networking, Disk Encryption, and Azure Monitor integration.
- Native integration with Defender for Cloud for endpoint hardening.
- Encryption at rest with platform-managed or Key Vault customer-managed keys.
- Premium/Ultra disks, ephemeral OS disk for fast, stateless fleets.
- Autoscale using Virtual Machine Scale Sets (VMSS).
3) Architecture of Azure Virtual Machines
Azure VMs run on the Azure fabric, orchestrated across compute clusters. Each VM attaches to a Virtual Network (VNet) and subnet, with one or more NICs. Storage resides on Managed Disks (Standard HDD/SSD, Premium SSD, Ultra Disk). Network traffic is shaped by Network Security Groups and optionally by load balancers or gateways.
| Layer | Component | Description |
|---|---|---|
| Compute | VM instance | vCPU, memory, and GPU/HPC options from various VM families. |
| Network | VNet/Subnet/NIC | Private isolation, routing, DNS, public/private IP assignments. |
| Storage | OS/Data disks | Managed Disks (S, P, Ultra), snapshots, encryption, backup. |
| Control | ARM | Declarative resource manager for idempotent deployments. |
| Ops | Monitor/Logs | Metrics, logs, alerts, and insights via Azure Monitor & Log Analytics. |
4) Azure VM Types and Sizes
General Purpose
B, D series — balanced CPU/memory, ideal for app servers, small DBs, and dev/test.
Compute Optimized
F series — higher CPU-to-memory, good for batch processing, web servers, and analytics engines.
Memory Optimized
E, M series — in-memory databases, SAP, big caches, and high-concurrency workloads.
Storage Optimized
L series — high-throughput storage, NoSQL, data warehousing, and large data lakes.
GPU / HPC
NC, ND, HB — AI/ML training, rendering, CFD, EDA, and scientific computing.
Selecting the right size
- Profile your workload (CPU %, RAM, disk IOPS, network pps).
- Choose Premium or Ultra disks for latency-sensitive apps.
- Benchmark representative test loads before committing to reservations.
5) Azure VM Deployment Models
- Azure Portal: Guided wizard for one-off or exploratory builds.
- Azure PowerShell & Azure CLI: Repeatable scripts for teams and CI.
- ARM/Bicep: Declarative templates for idempotent environments.
- Terraform: Multi-cloud IaC with state mgmt and policy integration.
- Azure DevOps/GitHub: Pipelines for versioned, gated releases.
# Requires Az PowerShell (Install-Module Az -Scope CurrentUser)
$rg = "rg-prod-web-01"
$loc = "eastus"
New-AzResourceGroup -Name $rg -Location $loc
$cred = Get-Credential -Message "Enter local admin for the VM"
New-AzVm `
-ResourceGroupName $rg `
-Name "vm-web-01" `
-Location $loc `
-VirtualNetworkName "vnet-web-01" `
-SubnetName "sn-web" `
-SecurityGroupName "nsg-web" `
-PublicIpAddressName "pip-web-01" `
-OpenPorts 80,443 `
-Image "Win2022Datacenter" `
-Size "D4s_v5" `
-Credential $cred
# Requires: az login
rg=rg-app-01
loc=eastus
az group create -n $rg -l $loc
az vm create \
-g $rg -n vm-api-01 \
--image Ubuntu2204 \
--size Standard_D4s_v5 \
--vnet-name vnet-app-01 --subnet sn-app \
--public-ip-address "" \
--nsg "" \
--generate-ssh-keys
param location string = 'eastus'
param rgName string = 'rg-bicep-demo'
param vmName string = 'vm-bicep-01'
resource nic 'Microsoft.Network/networkInterfaces@2023-09-01' = {
name: '${vmName}-nic'
location: location
properties: {
ipConfigurations: [
{
name: 'ipconfig1'
properties: {
subnet: { id: '/subscriptions//resourceGroups/${rgName}/providers/Microsoft.Network/virtualNetworks/vnet-app-01/subnets/sn-app' }
privateIPAllocationMethod: 'Dynamic'
}
}
]
}
}
resource vm 'Microsoft.Compute/virtualMachines@2023-09-01' = {
name: vmName
location: location
identity: { type: 'SystemAssigned' }
properties: {
hardwareProfile: { vmSize: 'Standard_D4s_v5' }
storageProfile: {
imageReference: { publisher: 'Canonical', offer: '0001-com-ubuntu-server-jammy', sku: '22_04-lts', version: 'latest' }
osDisk: { createOption: 'FromImage' }
}
osProfile: {
computerName: vmName
adminUsername: 'azureuser'
linuxConfiguration: { disablePasswordAuthentication: true }
}
networkProfile: { networkInterfaces: [ { id: nic.id } ] }
}
}
terraform {
required_providers { azurerm = { source = "hashicorp/azurerm", version = "~> 4.0" } }
}
provider "azurerm" { features {} }
resource "azurerm_network_interface" "nic" {
name = "vm-tf-01-nic"
location = "eastus"
resource_group_name = "rg-tf-demo"
ip_configuration {
name = "ipconfig1"
subnet_id = "/subscriptions//resourceGroups/rg-tf-demo/providers/Microsoft.Network/virtualNetworks/vnet-app-01/subnets/sn-app"
private_ip_address_allocation = "Dynamic"
}
}
resource "azurerm_linux_virtual_machine" "vm" {
name = "vm-tf-01"
resource_group_name = "rg-tf-demo"
location = "eastus"
size = "Standard_D4s_v5"
admin_username = "azureuser"
network_interface_ids = [azurerm_network_interface.nic.id]
admin_ssh_key { username = "azureuser" public_key = file("~/.ssh/id_rsa.pub") }
os_disk { caching = "ReadWrite" storage_account_type = "Premium_LRS" }
source_image_reference {
publisher = "Canonical" offer = "0001-com-ubuntu-server-jammy" sku = "22_04-lts" version = "latest"
}
}
6) Networking in Azure Virtual Machines
Master VNets, NSGs, and IPs before scaling. Use Private DNS for internal names, Standard Public IP for internet endpoints, and Load Balancer or Application Gateway for traffic distribution.
- VNet Peering: Low-latency mesh between VNets.
- VPN Gateway: IPsec tunnels across sites.
- ExpressRoute: Private circuit to Azure for predictable throughput.
az network nsg rule create \
-g rg-net-01 --nsg-name nsg-web \
-n allow-https --priority 100 \
--direction Inbound --access Allow --protocol Tcp \
--source-address-prefixes Internet \
--destination-port-ranges 443
7) Azure VM Storage Architecture
- Standard HDD/SSD: Cost-effective dev/test, lower IOPS.
- Premium SSD: Production-grade, predictable IOPS/latency.
- Ultra Disk: Highest IOPS/throughput with configurable performance.
- Ephemeral OS: Local SSD backed; faster scale but non-persistent.
$vm = Get-AzVM -Name "vm-web-01" -ResourceGroupName "rg-prod-web-01"
Set-AzVMDiskEncryptionExtension -ResourceGroupName $vm.ResourceGroupName -VMName $vm.Name `
-VolumeType All -Force
az vm disk attach -g rg-app-01 --vm-name vm-api-01 --name data01 --new --size-gb 256 --sku Premium_LRS
8) VM Availability and Scalability
Achieve resiliency with Availability Sets (fault/update domains) and Availability Zones (physically separate datacenters). Use VM Scale Sets for autoscaling fleets based on CPU, custom metrics, or schedules. Review Azure’s SLA commitments per deployment pattern.
az vmss create -g rg-web-ss --name vmss-web --image Win2022Datacenter --upgrade-policy-mode automatic \
--instance-count 2 --vm-sku Standard_D2s_v5 --admin-username adminuser --generate-ssh-keys
# Autoscale rule: scale out if average CPU > 70% for 10 mins, scale in if < 30%
az monitor autoscale create -g rg-web-ss --resource vmss-web --resource-type Microsoft.Compute/virtualMachineScaleSets \
--name vmss-web-autoscale --min-count 2 --max-count 10 --count 2
az monitor autoscale rule create -g rg-web-ss --autoscale-name vmss-web-autoscale \
--condition "Percentage CPU > 70 avg 10m" --scale out 1
az monitor autoscale rule create -g rg-web-ss --autoscale-name vmss-web-autoscale \
--condition "Percentage CPU < 30 avg 10m" --scale in 1
9) Azure VM Security Best Practices
Protect Data
- Azure Disk Encryption / BitLocker for Windows, DM-Crypt for Linux.
- Keys in Key Vault with RBAC and purge protection.
Harden Access
- RBAC — least privilege for owners/operators.
- JIT VM Access via Defender for Cloud.
- Disable password logins; use SSH keys and Serial Console when needed.
# Example: Enable JIT on a single VM (requires Defender for Cloud)
$vmId = (Get-AzVM -Name "vm-web-01" -ResourceGroupName "rg-prod-web-01").Id
$rule = @{
id = $vmId
ports = @(@{ number = 22; protocol = "*"; allowedSourceAddressPrefix = @("1.2.3.4/32"); maxRequestAccessDuration = "PT3H" })
}
# For full implementation, use the Defender for Cloud REST API (securityjitpolicies).
az disk update -g rg-prod-web-01 -n vm-web-01_OsDisk_1 --encryption-type EncryptionAtRestWithCustomerKey \ --disk-encryption-set "/subscriptions//resourceGroups/rg-kv/providers/Microsoft.Compute/diskEncryptionSets/des-vm"
10) Monitoring and Management Tools
Consolidate metrics and logs in Azure Monitor and Log Analytics. Use Update Management for patching, Azure Automation for runbooks, and Azure Policy to prevent drift.
// Log Analytics: InsightsMetrics (VMInsights) or AzureMetrics
AzureMetrics
| where ResourceProvider == "MICROSOFT.COMPUTE" and Resource == "virtualMachines"
| where MetricName == "Percentage CPU"
| summarize AvgCPU=avg(Total), P95CPU=percentile(Total,95) by ResourceId
| where P95CPU > 80
# Requires Update Management / Azure Automation Update resources
Get-AzAutomationSoftwareUpdateMachineRun `
-ResourceGroupName "rg-ops" -AutomationAccountName "aa-ops" `
| Where-Object {$_.MissingCriticalUpdates -gt 0} `
| Select-Object ComputerName, MissingCriticalUpdates, LastScanTime
11) Integration with Azure Services
- Azure Backup for point-in-time restore.
- Azure Site Recovery (ASR) for DR and region failover.
- Azure Arc to bring on-prem/other clouds under Azure control.
- Entra ID for identity, Managed Identity for secretless auth.
- Application Gateway, WAF, Front Door for web traffic & security.
12) Use Cases of Azure Virtual Machines
- Hosting web apps and databases with secure perimeter and stable performance.
- Running legacy apps and line-of-business services not yet containerized.
- Dev/Test environments with ephemeral OS disks for fast cycles.
- SAP/Oracle vertical workloads requiring tuned memory and storage.
- HPC with HB/HC and GPU families for AI, ML, and simulations.
13) Azure VM Backup and Disaster Recovery
Configure Azure Backup for daily/weekly retention; layer snapshots for rapid rollbacks; and protect critical tiers with ASR. Use GZRS storage for higher durability and Cross-Zonal restore where supported.
az backup protection enable-for-vm \
--resource-group rg-prod-web-01 \
--vault-name vault-prod-backup \
--vm vm-web-01 \
--policy-name "Daily-30D-Weekly-12W"
$disk = Get-AzDisk -ResourceGroupName "rg-prod-web-01" -DiskName "vm-web-01_OsDisk_1"
$snapConfig = New-AzSnapshotConfig -SourceUri $disk.Id -Location $disk.Location -CreateOption Copy
New-AzSnapshot -ResourceGroupName "rg-prod-web-01" -SnapshotName "vm-web-01-os-snap-$(Get-Date -f yyyyMMddHHmm)" -Snapshot $snapConfig
14) Cost Optimization and Pricing Models
- Azure Reservations (1/3 years) for steady-state savings.
- Spot VMs for interruptible workloads at deep discounts.
- Azure Hybrid Benefit to reuse Windows Server/SQL licenses.
- Azure Cost Management + Billing to analyze cost drivers, budgets, alerts.
- Use the Azure Pricing Calculator to model options before rollout.
# Example: stop a VM outside business hours
az vm deallocate -g rg-app-01 -n vm-api-01
15) Common Troubleshooting Scenarios (with Scripts)
15.1 VM won’t start / boot issues
- Check recent Activity Logs & Boot Diagnostics.
- Use Serial Console for kernel/boot logs.
- Detach recently added extensions or data disks if suspected.
Get-AzVM -Name "vm-web-01" -ResourceGroupName "rg-prod-web-01" -Status |
Select-Object Name, ResourceGroupName, @{n='PowerState';e={$_.Statuses[-1].DisplayStatus}}
az vm redeploy -g rg-prod-web-01 -n vm-web-01
SNAP_ID="/subscriptions//resourceGroups/rg-prod-web-01/providers/Microsoft.Compute/snapshots/vm-web-01-os-snap-20251030" az disk create -g rg-prod-web-01 -n vm-web-01-os-from-snap --source $SNAP_ID az vm update -g rg-prod-web-01 -n vm-web-01 --os-disk "/subscriptions/ /.../disks/vm-web-01-os-from-snap"
15.2 RDP/SSH connection failures
- Confirm NSG rules (inbound 3389/22) and effective security rules.
- Validate local firewall on the VM and route tables/UDRs.
- Use Just-in-Time access or Azure Bastion.
NICID=$(az vm show -g rg-prod-web-01 -n vm-web-01 --query "networkProfile.networkInterfaces[0].id" -o tsv)
az network nic list-effective-nsg --ids $NICID
# Conceptual example — use Defender for Cloud JIT API for production
$myIp = (Invoke-RestMethod -Uri "https://api.ipify.org").Content + "/32"
# Request 3-hour RDP window
# POST to securityJitNetworkAccessPolicies with sourceAddressPrefix = $myIp
15.3 Disk performance issues
- Check disk caching modes (ReadOnly for data disks hosting DB logs can help; ReadWrite for OS).
- Move from Standard to Premium or Ultra; stripe data disks if needed.
$vm = Get-AzVM -Name "vm-db-01" -ResourceGroupName "rg-db-01"
$vm.StorageProfile.DataDisks[0].Caching = "ReadOnly"
Update-AzVM -VM $vm -ResourceGroupName $vm.ResourceGroupName
15.4 Network latency / IP conflicts
- Verify route tables (UDR) and effective routes.
- Ensure no duplicate static IPs inside the subnet range.
az network nic show-effective-route-table --ids $NICID
15.5 VM extension failures
- Review
/var/log/waagent.log(Linux) orC:\WindowsAzure\Logs\WaAppAgent.log(Windows). - Remove/redeploy the failing extension.
az vm extension delete -g rg-prod-web-01 --vm-name vm-web-01 --name CustomScriptExtension
Azure Resource Graph (ARG) — Fleet Troubleshooting
Resources
| where type =~ 'microsoft.compute/virtualmachines'
| where properties.provisioningState =~ 'Failed'
| project name, resourceGroup, location, provisioningState
Resources
| where type =~ 'microsoft.compute/virtualmachines'
| extend powerState = tostring(properties.extended.instanceView.powerState.displayStatus)
| summarize count() by powerState
Microsoft Graph (identity-adjacent checks)
While VM control lives in Azure Resource Manager, you can use Microsoft Graph to validate managed identities or app role assignments used by services running on VMs.
GET https://graph.microsoft.com/v1.0/servicePrincipals?$filter=appDisplayName eq 'MyVMApp' Authorization: Bearer
Connect-MgGraph -Scopes "Application.Read.All","AppRoleAssignment.Read.All"
Get-MgServicePrincipal -Filter "appDisplayName eq 'MyVMApp'" |
Get-MgServicePrincipalAppRoleAssignedTo |
Select-Object PrincipalDisplayName, ResourceDisplayName, AppRoleId
Use cases: confirming that a VM’s system-assigned identity is allowed to access Key Vault, Storage, or APIs via role assignments in Entra ID.
16) Performance Tuning and Optimization
- Right-size VMs; avoid low CPU readiness or constant throttling.
- Enable Accelerated Networking for high packet rates.
- Choose Premium/Ultra disks and align caching modes.
- Use Azure Advisor recommendations regularly.
az network nic update --ids $NICID --accelerated-networking true
# Example: Pull CPU metric over the last 30 minutes for a VM $subId=""; $rg="rg-prod-web-01"; $vm="vm-web-01" $metric="Percentage CPU" $uri = "https://management.azure.com/subscriptions/$subId/resourceGroups/$rg/providers/Microsoft.Compute/virtualMachines/$vm/providers/microsoft.insights/metrics?metricnames=$metric×pan=PT30M&api-version=2018-01-01" # Acquire AAD token, then Invoke-RestMethod -Uri $uri -Headers @{Authorization="Bearer $token"}
17) Security Compliance and Governance
Map workloads to compliance baselines (ISO, SOC, HIPAA, GDPR). Use Azure Policy to enforce standards (e.g., require tags, restrict regions, enforce disk encryption), and Blueprints/Landing Zones for consistent foundations.
{
"properties": {
"displayName": "Require costCenter tag on VMs",
"policyRule": {
"if": {
"allOf": [
{ "field": "type", "equals": "Microsoft.Compute/virtualMachines" },
{ "field": "tags['costCenter']", "exists": "false" }
]
},
"then": { "effect": "deny" }
}
}
}
18) Automation and Infrastructure as Code
Treat infrastructure like code with Bicep, ARM, and Terraform. Use Azure DevOps or GitHub Actions for pipelines, policy checks, and drift remediation.
name: deploy-bicep
on: [push]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: azure/login@v2
with: { creds: ${{ secrets.AZURE_CREDENTIALS }} }
- run: az account show
- run: az deployment group validate -g rg-infra -f main.bicep
- run: az deployment group create -g rg-infra -f main.bicep
19) Comparison with Other Cloud Providers
| Provider | Comparable Service | Notable Strengths | Considerations |
|---|---|---|---|
| Azure | Virtual Machines | Hybrid capabilities (Arc, Stack), Windows/SQL integration, enterprise identity. | Region features vary by SKU. |
| AWS | EC2 | Granular instance catalog, spot ecosystem. | Cross-service identity model differs from Entra ID. |
| Google Cloud | Compute Engine | Live migration maturity, per-second billing. | Windows/SQL licensing paths differ. |
| Oracle | OCI Compute | High bandwidth shapes, DB adjacency. | Ecosystem breadth compared to Azure. |
20) Future Prospects and Innovations
- Confidential Computing (TEE/SEV) for in-use encryption of data.
- AI-assisted operations with anomaly detection and self-healing runbooks.
- Azure Automanage for hands-off baselining and drift control.
- Storage innovations like Project Silica research for long-term archival patterns.
FAQs
What is the fastest way to start a production-ready VM?
Use a Bicep/Terraform module with your organization’s baseline (NSGs, diagnostics, extensions) and parameterize size/zone.
Can I run databases on Azure VMs?
Yes—tune disk layout, caching, and choose Premium/Ultra disks. Consider PaaS DBs when possible for simplified ops.
How do I control costs?
Right-size, shut down dev/test off-hours, use Reservations/Spot, and monitor with budgets/alerts in Cost Management.
How do I secure RDP/SSH?
Use Azure Bastion or JIT, allowlisted IPs, and MFA for jump hosts via Conditional Access.
How do I migrate from on-prem?
Discover with Azure Migrate, plan right-sizing, test cutovers, then use ASR or migration services to move workloads.
Operational Toolkit: PowerShell / CLI / ARG
List unhealthy VMs
Get-AzVM | ForEach-Object {
$s = (Get-AzVM -ResourceGroupName $_.ResourceGroupName -Name $_.Name -Status).Statuses[-1].DisplayStatus
if($s -notmatch "running|succeeded"){ "{0} ({1})" -f $_.Name,$s }
}
Find missing backup
Resources
| where type =~ 'microsoft.compute/virtualmachines'
| join kind=leftouter (
Resources
| where type =~ 'microsoft.recoveryservices/vaults/backupFabrics/protectionContainers/protectedItems'
| project vmId = tostring(properties.sourceResourceId)
) on $left.id == $right.vmId
| where isnull(vmId)
| project name, resourceGroup, location
Check accelerated networking
az network nic show --ids $NICID --query "enableAcceleratedNetworking"
Identify NSG rules blocking RDP/SSH
$nicId = (Get-AzNetworkInterface -ResourceGroupName "rg-prod-web-01" -Name "vm-web-01-nic").Id
(Get-AzEffectiveNetworkSecurityGroup -NetworkInterfaceId $nicId).EffectiveSecurityRules |
Where-Object {$_.DestinationPortRange -match "3389|22"} |
Select-Object Name, Access, Direction, Priority, DestinationPortRange
Spot oversized/undersized VMs with Advisor
az advisor recommendation list --category Cost --query "[?contains(recommendationTypeId,'RightSize')]"
Continue Learning
Explore related deep dives on Azure App Registrations, Entra ID, Conditional Access, Azure Policy, Azure Arc, and Azure Site Recovery.
Visit CloudKnowledge.in →











Leave a Reply