Summary: Azure VMs explained: sizing, security, networking, backup/DR, cost optimization, and deep troubleshooting using PowerShell, Azure CLI, Azure Resource Graph, KQL, Bicep, and Terraform.
Azure Virtual Machines (VMs): A Complete, Practical Guide
Azure Virtual Machines (VMs) are Microsoft Azure’s core Infrastructure-as-a-Service (IaaS) capability for running Windows and Linux servers on demand—without managing physical hardware. This guide covers architecture, deployment patterns, security baselines, networking, storage, monitoring, backup/DR, cost controls, hybrid/migration options, and real-world troubleshooting runbooks with ready-to-run scripts.
1) What Are Azure VMs & Why They Matter
With Azure VMs, you provision CPU, memory, disks, and networking in minutes and pay only for what you use. Key advantages include on-demand scalability, global reach, tight integration with services like Azure Backup, Azure Monitor, Microsoft Defender for Cloud, and policy-driven governance through Azure Policy.
- On-demand scalability and pay-as-you-go
- Global availability across regions and zones
- Broad OS support: Windows Server & many Linux distributions
- Automation-friendly: Portal, CLI, PowerShell, ARM/Bicep, Terraform
2) Supported Operating Systems
Common choices include Windows Server 2022/2019/2016 and Linux flavors such as Ubuntu, RHEL, CentOS, and SUSE. You can also import custom VHDs to reuse on-prem images. For consistent governance, register base images in a shared Compute Gallery (formerly Shared Image Gallery).
3) VM Sizes & Families
- General Purpose (e.g., D-series): balanced CPU/memory; app servers, web, small DBs
- Compute Optimized (F-series): high CPU; batch, build agents, analytics
- Memory Optimized (E/M-series): in-memory DBs, caching tiers
- Storage Optimized (L-series): high IOPS/throughput; data warehouses, log processing
- GPU/HPC: visualization, AI/ML training/inference, simulation
4) Azure VM Building Blocks
- Compute: the VM size (vCPU/RAM)
- Storage: OS disk + data disks on Managed Disks
- Networking: NICs, private IPs, optional public IP
- Security: NSGs, JIT, disk encryption via Key Vault
5) Security Essentials
- Network Security Groups (NSGs) to control inbound/outbound flows
- Just-in-Time (JIT) VM access to time-box RDP/SSH exposure
- Defender for Cloud recommendations and threat detection
- Disk Encryption with Key Vault
- Managed Identity to eliminate local secrets
6) Storage Choices
- Standard HDD / Standard SSD: dev/test, low-cost
- Premium SSD: prod workloads needing predictable IOPS
- Premium SSD v2 / Ultra Disk: high IOPS/low latency, tunable performance
7) Availability & Scale
- Availability Sets: isolate across fault/update domains within a datacenter
- Availability Zones: zone-redundant resilience across separate datacenters in a region
- Scale Sets: automated scale-out/scale-in across identical instances
8) Deployment Options
- Azure Portal for guided UI provisioning
- Azure CLI / PowerShell for repeatable automation
- Infrastructure as Code: ARM, Bicep, Terraform
9) Quick-Start: Create a VM (CLI, PowerShell, Bicep, Terraform)
9.1 Azure CLI
# Login & set subscription
az login
az account set --subscription "<SUBSCRIPTION_NAME_OR_ID>"
# Variables
RG=rg-azvm-demo
LOC=eastus
VM=myvm01
IMG=UbuntuLTS
SIZE=Standard_D2s_v5
ADMIN=azureuser
az group create -n $RG -l $LOC
# Create VM with Premium SSD OS disk and SSH key
az vm create \
-g $RG -n $VM \
--image $IMG \
--size $SIZE \
--admin-username $ADMIN \
--ssh-key-values ~/.ssh/id_rsa.pub \
--os-disk-size-gb 64 \
--os-disk-delete-option Delete \
--storage-sku Premium_LRS \
--public-ip-sku Standard
# Open port 22 just-in-time via Defender (preferred) or NSG rule (fallback)
# Fallback example (use JIT in production):
az vm open-port -g $RG -n $VM --port 22
9.2 PowerShell (Az Module)
# Install-Module Az -Scope CurrentUser
Connect-AzAccount
Select-AzSubscription -SubscriptionId "<SUB_ID>"
$rg = "rg-azvm-demo"
$loc = "EastUS"
$vm = "myvm02"
New-AzResourceGroup -Name $rg -Location $loc | Out-Null
$cred = Get-Credential -Message "Provide local admin credentials"
New-AzVM -ResourceGroupName $rg -Name $vm -Location $loc `
-Image "Win2022Datacenter" -Size "Standard_D2s_v5" `
-PublicIpSku Standard -OpenPorts 3389 -Credential $cred `
-Tag @{ workload="demo"; env="dev" }
9.3 Bicep (Idempotent IaC)
// main.bicep
param location string = resourceGroup().location
param vmName string = 'bicepvm01'
param adminUsername string
@secure()
param adminPassword string
resource nic 'Microsoft.Network/networkInterfaces@2023-11-01' = {
name: '${vmName}-nic'
location: location
properties: {
ipConfigurations: [
{
name: 'ipconfig1'
properties: {
subnet: {
id: '/subscriptions/<subId>/resourceGroups/<rg>/providers/Microsoft.Network/virtualNetworks/<vnet>/subnets/<subnet>'
}
privateIPAllocationMethod: 'Dynamic'
}
}
]
}
}
resource vm 'Microsoft.Compute/virtualMachines@2023-09-01' = {
name: vmName
location: location
properties: {
hardwareProfile: { vmSize: 'Standard_D2s_v5' }
storageProfile: {
osDisk: {
createOption: 'FromImage'
managedDisk: { storageAccountType: 'Premium_LRS' }
diskSizeGB: 64
}
imageReference: {
publisher: 'MicrosoftWindowsServer'
offer: 'WindowsServer'
sku: '2022-datacenter'
version: 'latest'
}
}
osProfile: {
computerName: vmName
adminUsername: adminUsername
adminPassword: adminPassword
windowsConfiguration: {
enableAutomaticUpdates: true
}
}
networkProfile: {
networkInterfaces: [
{ id: nic.id }
]
}
}
}
9.4 Terraform (HCL)
provider "azurerm" {
features {}
}
resource "azurerm_resource_group" "rg" {
name = "rg-azvm-tf"
location = "East US"
}
resource "azurerm_virtual_network" "vnet" {
name = "vnet01"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
address_space = ["10.10.0.0/16"]
}
resource "azurerm_subnet" "snet" {
name = "snet01"
resource_group_name = azurerm_resource_group.rg.name
virtual_network_name = azurerm_virtual_network.vnet.name
address_prefixes = ["10.10.1.0/24"]
}
resource "azurerm_network_interface" "nic" {
name = "vmnic01"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
ip_configuration {
name = "ipconfig1"
subnet_id = azurerm_subnet.snet.id
private_ip_address_allocation = "Dynamic"
}
}
resource "azurerm_windows_virtual_machine" "vm" {
name = "tfvm01"
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
size = "Standard_D2s_v5"
admin_username = "azureuser"
admin_password = "ChangeM3Now!"
network_interface_ids = [azurerm_network_interface.nic.id]
os_disk {
caching = "ReadWrite"
storage_account_type = "Premium_LRS"
disk_size_gb = 64
}
source_image_reference {
publisher = "MicrosoftWindowsServer"
offer = "WindowsServer"
sku = "2022-datacenter"
version = "latest"
}
tags = { env = "dev" }
}
10) Networking Patterns
- Place VMs in a VNet/Subnet with appropriately scoped NSGs
- Use Private Endpoints to reach PaaS securely
- Use Azure Bastion or JIT instead of exposing RDP/SSH
- Choose Standard public IPs and Load Balancers for prod
- Consider Application Gateway for L7, WAF protection
Network Diagnostics (CLI)
# Reachability and IP flows
az network watcher flow-log configure -g <RG> --nsg <NSG_NAME> --enabled true --retention 7 --storage-account <STORAGE>
az network watcher test-ip-flow -g <RG> --direction Inbound --local <VM_PRIV_IP> --protocol TCP --remote 203.0.113.20 --local-port 443 --remote-port 55000
# Effective security rules at NIC
az network nic list-effective-nsg --ids <NIC_ID>
# Port connectivity test from Azure to target
az network watcher test-connectivity --source-resource <VM_ID> --dest-address <FQDN_OR_IP> --dest-port 443
11) Monitoring, Logs & Insights
- Azure Monitor metrics: CPU, disk, network
- Log Analytics for system/agent logs and VMInsights
- Alerts on thresholds, dynamic baselines, and activity logs
- Diagnostics: boot diagnostics, serial console, guest logs
KQL: Top CPU Consumers (Linux)
// Log Analytics - VM performance: top 10 VMs by CPU over 1h
InsightsMetrics
| where TimeGenerated > ago(1h)
| where Name == "Percentage CPU"
| summarize AvgCPU = avg(Val) by Computer
| top 10 by AvgCPU desc
KQL: Disk Queue Alerts
InsightsMetrics
| where Name in ("Disk Read Queue Length","Disk Write Queue Length")
| where TimeGenerated > ago(30m)
| summarize Avg = avg(Val) by Computer, Name
| where Avg > 2
12) Backup & Disaster Recovery
Protect VMs with Azure Backup policies (daily/weekly/monthly/yearly) to a Recovery Services vault. For region-level incidents, use Azure Site Recovery (ASR) to replicate VMs cross-region.
# CLI: enable backup quickly
az backup protection enable-for-vm \
--resource-group <RG> --vault-name <RSV> \
--vm <VM_NAME> --policy-name "DefaultPolicy"
13) Cost Optimization Essentials
- Right-size based on perf data; change sizes off-peak
- Reserved Instances or Savings Plans for 1/3-year horizons
- Spot VMs for batch/fault-tolerant jobs
- Auto-shutdown dev/test; deallocate outside business hours
- Disk tiering: match storage SKU with IOPS needs
PowerShell: Off-Hours Auto-Shutdown (Dev/Test)
# Tags: schedule auto-shutdown on all VMs tagged env=dev
$rg = "rg-dev"
$vms = Get-AzVM -ResourceGroupName $rg
foreach($v in $vms){
New-AzTag -ResourceId $v.Id -Tag @{ autoShutdown="1900-0700 IST" } -Operation Merge
# Implement your runbook/Automation to deallocate by tag parsing
}
14) Hybrid & Migration
- Azure Migrate: discovery, assessment, agentless replication
- Azure Arc: govern on-prem/other-cloud VMs with Azure services
- ExpressRoute/VPN: private connectivity to VNets
15) Compliance & Governance
- Use Azure Policy to deny non-compliant VM SKUs, public IPs, or untagged resources
- Apply RBAC least privilege and Privileged Identity Management (PIM) for elevation
- Maintain BluePrint/Bicep baselines for consistent environments
16) Real-World Troubleshooting—Deep Dive Runbooks
16.1 VM Won’t Boot (Windows/Linux)
- Check Boot Diagnostics (screenshot/serial)
- Serial Console: review GRUB/boot logs (Linux) or blue screen (Windows)
- Repair Disk: mount OS disk on a helper VM, fix fstab/registry/driver
- Redeploy VM to new host; if needed, recreate from snapshot
PowerShell: Automated “Repair Disk” Flow
# High-level: stop VM, detach OS disk, mount to repair VM, run fix, reattach
$rg="rg-prod"; $vmName="appvm01"; $repairVm="repairvm01"
Stop-AzVM -ResourceGroupName $rg -Name $vmName -Force
$vm = Get-AzVM -Name $vmName -ResourceGroupName $rg
$osDiskId = $vm.StorageProfile.OsDisk.ManagedDisk.Id
$disk = Get-AzDisk -ResourceGroupName $rg -DiskName ($osDiskId.Split("/")[-1])
# Attach OS disk to repair VM as data disk
$repair = Get-AzVM -Name $repairVm -ResourceGroupName $rg
Add-AzVMDataDisk -VM $repair -Name "$($vmName)-os-as-data" -ManagedDiskId $disk.Id -Lun 1 -Caching ReadWrite -CreateOption Attach | Out-Null
Update-AzVM -ResourceGroupName $rg -VM $repair | Out-Null
# RDP/SSH to repair VM: fix fstab/registry/drivers, then detach and reattach
# After fix:
$repair = Get-AzVM -Name $repairVm -ResourceGroupName $rg
$repair.StorageProfile.DataDisks = $repair.StorageProfile.DataDisks | Where-Object { $_.Lun -ne 1 }
Update-AzVM -ResourceGroupName $rg -VM $repair | Out-Null
# Reattach as OS disk and start the original VM
Set-AzVMOSDisk -VM $vm -ManagedDiskId $disk.Id -Name $vm.StorageProfile.OsDisk.Name -CreateOption Attach -Windows
Update-AzVM -ResourceGroupName $rg -VM $vm | Out-Null
Start-AzVM -ResourceGroupName $rg -Name $vmName
Linux: Quick fstab/GRUB Checks on the Attached Disk
# On the repair VM (Linux):
sudo lsblk
sudo mkdir -p /mnt/os
sudo mount /dev/<DATA_DISK_PARTITION> /mnt/os
sudo chroot /mnt/os
# Fix /etc/fstab entries, check UUIDs, validate GRUB config
16.2 RDP/SSH Connectivity Failing
- Verify NSG rules and effective security rules at NIC/subnet
- Confirm the VM’s local firewall (Windows Defender FW / iptables/nftables)
- Test IP flow and connectivity via Network Watcher
- Prefer Bastion or JIT over public RDP/SSH
# Effective NSG and connection test
az network nic list-effective-nsg --ids <NIC_ID>
az network watcher test-connectivity --source-resource <VM_ID> --dest-address <VM_PRIV_IP> --dest-port 22
16.3 High Disk Latency / Low IOPS
- Check VM size’s max throughput vs disk SKU caps
- Enable Write Accelerator if supported (for data disks)
- Stripe multiple data disks with RAID0 for throughput (app dependent)
- Move to Premium SSD v2 or Ultra Disk if needed
# Linux: quick fio sample (adjust runtime/iodepth)
sudo apt-get update && sudo apt-get install -y fio
fio --name=randrw --filename=/data/testfile --rw=randrw --bs=4k --iodepth=16 --runtime=60 --numjobs=4 --group_reporting --direct=1
16.4 VM Agent / Extension Failures
If RunCommand or extensions fail, update the agent and clear failed extensions.
# Windows VM: repair agent service and re-run a script
Invoke-AzVMRunCommand -ResourceGroupName <RG> -Name <VM> `
-CommandId 'RunPowerShellScript' `
-ScriptPath '.\fix-agent.ps1'
# Linux VM: check walinuxagent logs
sudo systemctl status walinuxagent
sudo tail -n 200 /var/log/waagent.log
# Remove stuck extensions (PowerShell)
$vm = Get-AzVM -ResourceGroupName <RG> -Name <VM>
$vm.Extensions.Clear()
Update-AzVM -ResourceGroupName <RG> -VM $vm
16.5 NIC / IP Misconfiguration
# Swap primary NIC via PowerShell
$vm = Get-AzVM -ResourceGroupName <RG> -Name <VM>
$vm.NetworkProfile.NetworkInterfaces | ForEach-Object { $_.Primary = $false }
$nic = Get-AzNetworkInterface -Name "<NEW_PRIMARY_NIC>" -ResourceGroupName <RG>
$vm.NetworkProfile.NetworkInterfaces.Add((New-Object -TypeName Microsoft.Azure.Management.Compute.Models.NetworkInterfaceReference -ArgumentList @($nic.Id,$true)))
Update-AzVM -ResourceGroupName <RG> -VM $vm
16.6 Resize, Redeploy & Reimage
# Resize VM (deallocates if needed)
az vm resize -g <RG> -n <VM> --size Standard_D4s_v5
# Redeploy to new host (fix host-level issues)
az vm redeploy -g <RG> -n <VM>
# Reimage (scale set), or generalize and reimage a single VM with caution
16.7 Patch & Update
# Windows: Install updates with Update Management/Automanage, or ad-hoc
Invoke-AzVMRunCommand -ResourceGroupName <RG> -Name <VM> -CommandId 'RunPowerShellScript' -ScriptString 'Install-WindowsUpdate -AcceptAll -AutoReboot'
# Linux: patch via RunCommand
az vm run-command invoke -g <RG> -n <VM> --command-id RunShellScript --scripts "sudo apt-get update && sudo apt-get -y upgrade"
17) Azure Resource Graph (ARG) for Fleet Troubleshooting
Azure Resource Graph is perfect for fast, at-scale queries—inventory, drift, non-compliance—across subscriptions/management groups (different from Microsoft Graph). Use it to pinpoint VMs missing tags, without backups, or with public IPs.
ARG: VMs Lacking Required Tags
Resources
| where type == "microsoft.compute/virtualmachines"
| extend tags = tostring(tags)
| where isnull(tags) or tags !contains "env"
| project name, resourceGroup, location, tags
ARG: VMs with Public IPs
Resources
| where type == "microsoft.compute/virtualmachines"
| extend nics = properties.networkProfile.networkInterfaces
| mv-expand nics
| join kind=leftouter (Resources | where type == "microsoft.network/networkinterfaces") on $left.nics.id == $right.id
| mv-expand ipconfs = properties.ipConfigurations
| extend pipId = tostring(ipconfs.properties.publicIPAddress.id)
| where isnotempty(pipId)
| project vm = name, nic = name1, pipId
ARG: VMs Without Backup Enabled
Resources
| where type == "microsoft.compute/virtualmachines"
| join kind=leftouter (
ResourceContainers
| where type == "microsoft.recoveryservices/vaults"
) on subscriptionId
| project-away subscriptionId1
// (Alternate approach: query RSV protected items via Recovery Services resource providers)
18) Guest OS Hardening & Access
- Leverage Azure Bastion or JIT via Defender for time-boxed SSH/RDP
- Use Managed Identity instead of local secrets
- Enable Disk Encryption (server-side or ADE)
- Place break-glass accounts under PIM with approval & access reviews
PowerShell: Enable JIT on a VM
# Requires Defender for Cloud; opens ports only when approved
$rg="rg-prod"; $vm="appvm01"
$justInTimePolicy = @{
kind = "Basic"
properties = @{
virtualMachines = @(
@{
id = (Get-AzVM -ResourceGroupName $rg -Name $vm).Id
ports = @(@{ number = 22; protocol = "*"; allowedSourceAddressPrefix = @("*"); maxRequestAccessDuration = "PT3H" })
}
)
}
}
# Use Az.Security REST if not available in current Az module version
19) Disk, Filesystem & Backup Recipes
Linux: Create & Mount Data Disk
sudo lsblk
sudo parted /dev/<DISK> --script mklabel gpt mkpart primary ext4 0% 100%
sudo mkfs.ext4 /dev/<DISK_PART>
sudo mkdir -p /data
echo "/dev/<DISK_PART> /data ext4 defaults,nofail 0 2" | sudo tee -a /etc/fstab
sudo mount -a
Windows: Initialize & Format Data Disk (PowerShell)
Get-Disk | Where-Object PartitionStyle -Eq "RAW" | Initialize-Disk -PartitionStyle GPT
Get-Disk | Where-Object PartitionStyle -Eq "GPT" |
ForEach-Object {
New-Partition -DiskNumber $_.Number -UseMaximumSize -AssignDriveLetter |
Format-Volume -FileSystem NTFS -NewFileSystemLabel "Data" -Confirm:$false
}
20) Performance & Scale—Best Practices
- Match VM size and disk SKUs to app profile; avoid throttling ceilings
- Separate OS and data disks; dedicate disks to logs/tempdb
- Enable accelerated networking where supported
- Use proximity placement groups (PPG) for low-latency multi-tier apps
- Warm up instances before production cutover
21) End-to-End Troubleshooting Playbook (Copy/Paste)
- Gather: subscription, RG, VM name, NIC, VNet, subnet, IPs, NSGs, UDRs
- Check activity logs for recent changes (resize/redeploy/extensions)
- Confirm VM power state and host status
- Inspect boot diagnostics; open serial console
- Test IP flow & effective NSGs; review local firewall
- Query VM metrics (CPU/disk/network) and logs (syslog/eventlog)
- Validate disk health and storage throttle metrics
- Consider redeploy, resize, or repair-disk flow
- Reinforce guardrails (policy, JIT, backup) to prevent recurrence
CLI Bundle: Snapshot, Redeploy, Reattach
# Safe snapshot of OS disk
VM=<VM_NAME> RG=<RG>
OSDISK=$(az vm show -g $RG -n $VM --query "storageProfile.osDisk.name" -o tsv)
az disk create -g $RG -n ${OSDISK}-snap --source $OSDISK
# Redeploy VM
az vm redeploy -g $RG -n $VM
# If needed: replace a faulty NIC (requires downtime planning)
22) Governance Guardrails (Policy as Prevention)
- Deny public IP on NICs except in “dmz” subnet
- Require tags (env, owner, costCenter) on VMs and disks
- Audit unencrypted disks; deny non-compliant creates
- Enforce region/size allowlists for supportability
Policy Idea: Deny VM Without Required Tags
// Policy rule (snippet)
"if": {
"allOf": [
{ "field": "type", "equals": "Microsoft.Compute/virtualMachines" },
{ "anyOf": [
{ "field": "tags.env", "exists": "false" },
{ "field": "tags.owner", "exists": "false" }
] }
]
},
"then": { "effect": "deny" }
23) Frequently Asked: Design Choices
Public Access?
Prefer Bastion/JIT. If you must, restrict by IP, use Standard SKU PIP, enable NSG logging, and add alerts.
Disks: Premium vs Ultra?
Use Premium for predictable IOPS; pick Ultra for low-latency, tunable IOPS/throughput with strict SLAs.
Scale Up or Out?
Vertical scaling is simpler but capped; horizontal via scale sets gives resilience and rolling upgrades.
Backups & DR?
Daily vault backups + weekly test restores; ASR for cross-region replication and failover drills.
24) Copy-Ready Scripts for Rapid Troubleshooting
24.1 PowerShell: “Get My VM Story”
# Summary: state, size, disks, NICs, IPs, NSGs, recent activity
param([string]$ResourceGroup,[string]$VmName)
$vm = Get-AzVM -ResourceGroupName $ResourceGroup -Name $VmName -Status
$nicIds = $vm.NetworkProfile.NetworkInterfaces.Id
$disks = @($vm.StorageProfile.OsDisk) + $vm.StorageProfile.DataDisks
Write-Host "PowerState:" ($vm.Statuses | Where-Object Code -like "PowerState*").DisplayStatus
Write-Host "Size:" $vm.HardwareProfile.VmSize
Write-Host "OSDisk:" $vm.StorageProfile.OsDisk.Name
$disks | ForEach-Object { Write-Host "DataDisk:" $_.Name " LUN:" $_.Lun " SizeGB:" $_.DiskSizeGB }
foreach($id in $nicIds){
$nic = Get-AzNetworkInterface -ResourceId $id
Write-Host "NIC:" $nic.Name " PrivateIPs:" ($nic.IpConfigurations.PrivateIpAddress -join ",")
$eff = Get-AzEffectiveNetworkSecurityGroup -NetworkInterfaceName $nic.Name -ResourceGroupName $ResourceGroup
Write-Host "Effective NSG Rules:" ($eff.EffectiveSecurityGroupRules.Count)
}
24.2 PowerShell: Trigger VM RunCommand
# Windows script inline (list services)
Invoke-AzVMRunCommand -ResourceGroupName <RG> -Name <VM> -CommandId 'RunPowerShellScript' -ScriptString 'Get-Service | Sort-Object Status,DisplayName | Select-Object Status,Name,DisplayName | Format-Table -Auto'
24.3 CLI: Find VMs Without Recent Backups
# Example approach: list vault items then compare; varies by environment
az backup item list --resource-group <RG_OF_VAULT> --vault-name <RSV> -o table
# Join with az vm list --query to detect gaps
24.4 Guest Diagnostics Toggle
# Enable boot diagnostics to a managed storage account
az vm boot-diagnostics enable -g <RG> -n <VM>
# View serial console after enabling in Azure Portal
25) Putting It All Together
Azure Virtual Machines give you controlled, flexible compute for almost any workload—apps, DBs, batch, dev/test, DR, and GPU-accelerated AI/ML. Use the deployment snippets to get started, the monitoring and ARG queries to keep visibility sharp, policy and JIT to lock down security, and the troubleshooting runbooks when things go sideways. Pair VMs with services like Azure Files, Load Balancer, Application Gateway, and Azure Site Recovery to build robust, production-grade platforms.
Internal: This article Owned by cloudknowledge.in.













Leave a Reply