Compute Services - Amazon EC2 (Elastic Compute Cloud)

Compute Services: Amazon EC2 (Elastic Compute Cloud) — Complete Guide

WordPress-ready HTML — SEO-optimized, 100% width friendly, includes troubleshooting scripts and Graph/PowerShell samples.

What you'll learn

Core EC2 concepts: instances, AMIs, storage, networking, pricing
Security & access control best practices
Scaling, high availability, and fault tolerance
Cost optimization strategies & tools
Troubleshooting checklist with PowerShell and AWS CLI/SDK snippets

1. Introduction to Amazon EC2

Amazon Elastic Compute Cloud (EC2) is a foundational compute service from AWS that provides scalable virtual servers in the cloud. Companies of all sizes use EC2 to run application servers, web workloads, batch processing jobs, container hosts, and GPU-based ML training.

EC2 provides flexible selection of operating systems, CPU architectures, networking and storage configurations, and instance lifecycle controls. It acts as a building block for many higher-level AWS services and integrates tightly with AWS networking, storage, and monitoring services.

Why EC2 matters

Instant provisioning of compute resources
Predictable performance when sized correctly
Choice of pricing models for cost control

Who uses EC2?

Developers building backend services
Data scientists training ML models
Enterprises running legacy workloads
DevOps teams managing CI/CD runners

Inline SVGs used as royalty-free images so the HTML is self-contained and WordPress-ready.

2. EC2 Instance Types — Choosing the right instance

EC2 offers multiple families tuned to different workload characteristics. Choosing the right instance family and size is one of the most important cost and performance decisions you will make.

General purpose

Balanced compute, memory and networking. Examples: t4g, m7i. Great for web servers and small-to-medium databases.

Compute optimized

High CPU-to-memory ratio. c6i, c7g. Use for batch processing, high-performance compute (HPC), or CPU-bound microservices.

Memory optimized

Large memory footprint for in-memory caches and databases. r7g, x2idn.

Storage optimized

High IOPS and throughput. i4i, d3. Suitable for NoSQL databases, data warehousing, and time-series stores.

Accelerated computing

GPU-enabled instances for ML training and graphics workloads: p5, g6.

Tip: When testing instance families, benchmark using realistic workloads. AWS Compute Optimizer and third-party tools help find underutilized resources and recommend resizing.

Pricing models

Each instance can be acquired using different pricing models: on-demand, reserved, spot, and savings plans. Spot instances provide large discounts but can be interrupted.

3. AMIs (Amazon Machine Images) — Build once, reuse everywhere

An AMI is a snapshot that contains an operating system, optional application stack, and configuration to launch instances. AMIs can be:

Published by AWS (official)
Marketplace or community images
Custom AMIs created from a configured instance

Use AMIs to standardize images across environments, remove manual configuration steps, and recover quickly after failures. When creating AMIs, bake as much as possible into the image (OS packages, agent software), and keep environment-specific configuration to instance startup scripts or user-data.

Best practices for AMIs

Harden images: disable unused services, configure secure SSH, and remove default accounts.
Automate AMI creation: use Packer or EC2 Image Builder for reproducible images.
Tag AMIs with metadata for lifecycle management (owner, date, expiration).

4. Storage & Networking Options

EC2 integrates with multiple storage and networking services. Decisions here influence performance, resilience, and cost.

EBS — Elastic Block Store

EBS provides durable, network-attached block volumes. Common volume types include gp3 (general purpose SSD), io2 (high performance), and st1 (throughput-optimized HDD).

Instance store

Ephemeral, high-performance local disks useful for caches or temporary data. Data lost on instance stop/terminate.

EFS & FSx

Managed file systems that can be mounted by multiple instances. Great for shared storage across web servers and containers.

Networking

EC2 instances run in a VPC. Use subnets, route tables, Network ACLs, and Security Groups to secure traffic. Elastic IPs map a static public IP to an instance. Elastic Network Interfaces (ENIs) add network cards and IPs to instances.

Load balancing

Elastic Load Balancer (Application, Network, and Gateway) distributes traffic across instances and supports health checks, sticky sessions, and TLS termination.

Storage	Use case	Persistence
EBS	OS and persistent data	Durable
Instance store	Temp caches, local disk I/O	Ephemeral
EFS	Shared file storage	Durable

5. Pricing Models & Cost Optimization

Understanding pricing is essential. EC2 costs include instance hours, attached EBS volumes, snapshots, data transfer, and additional services like Elastic IPs and Load Balancers.

Pricing options

On-demand: Flexibility with no commitment.
Reserved Instances: Lower hourly rate for 1–3 year commitment.
Savings Plans: Flexibility in instance families for a usage commitment.
Spot: Up to 90% discount but interruptions possible.

Cost optimization strategies

Right-size instances: use CloudWatch and Compute Optimizer recommendations.
Use spot for stateless batch jobs and containers.
Turn off non-production instances (use schedules).
Migrate to Graviton (ARM) instances when compatible for better price/perf.

Tools: AWS Cost Explorer, Trusted Advisor, Compute Optimizer, and third-party tooling (cloudcost platforms) are helpful. Tag resources for cost allocation and tracking.

6. Common Use Cases

EC2 is used across a wide range of scenarios:

Web hosting and application servers
High-performance computing (HPC) and batch processing
Machine learning training and inference
Disaster recovery and backups
Development and test environments

Example: ML training pipeline

Use GPU instances for training, S3 for large datasets, and spot instances for distributed workers. Use EC2 Auto Scaling with mixed instances policies to tolerate interruptions.

7. Scaling and High Availability

Combine Auto Scaling Groups (ASGs) with Elastic Load Balancers (ELB) to scale horizontally. Distribute instances across multiple Availability Zones (AZs) to reduce single-AZ failure impact.

Placement groups

Placement groups allow you to control instance placement for low-latency networks (cluster), spread, or partition placement strategies.

Multi-region architectures

For global resiliency, replicate across regions. Use Route 53 and latency-based routing for failover and traffic steering.

8. Security and Access Control

Security is a shared responsibility. AWS protects the infrastructure; you manage OS, network, and application hardening.

Key elements

IAM roles: Grant EC2 instances only the permissions they need via instance profiles.
Security Groups: Instance-level firewall for traffic control.
Key pairs & session manager: Use SSH keys for Linux and session manager for bastionless access.
Encryption: Use EBS encryption at rest and TLS in transit.

Patch management

Use AWS Systems Manager (SSM) Patch Manager to automate OS patching and ensure consistent baseline across instances.

9. Troubleshooting & Best Practices

The table below lists common EC2 issues, likely causes, and recommended resolutions.

Issue	Possible Cause	Resolution
Instance not starting	Insufficient capacity or AMI issue	Try different AZ or instance type; verify AMI integrity
SSH connection failure	Security group or key pair misconfiguration	Ensure port 22 is allowed; verify correct key; use Session Manager
High latency	Unbalanced load or noisy neighbor	Use ELB, move to placement group, or change instance family
Billing spike	Unattached EBS or unexpected instances	Enable cost alerts, use tags, terminate idle resources
EBS volume not mounting	Device name mismatch or fstab error	Check /dev/ mapping and /etc/fstab; reattach and mount manually

Troubleshooting checklist

Check EC2 instance state in the console and system logs.
Validate security group and network ACL rules.
Confirm IAM role and SSM agent status for agent-based access.
Inspect CloudWatch metrics for CPU, network, and disk usage.
Use AWS Reachability Analyzer for network path issues.

10. PowerShell, AWS CLI and SDK Examples for Troubleshooting

Below are practical scripts and commands you can use during operations and troubleshooting. These are safe to paste into administration environments and are written with clarity so you can adapt them.

AWS Tools for PowerShell — List instances

Import-Module AWSPowerShell.NetCore
Set-AWSCredential -AccessKey <YOUR_KEY> -SecretKey <YOUR_SECRET> -StoreAs default
Get-EC2Instance -Region ap-south-1 | Select-Object @{Name='InstanceId';Expression={$_.Instances.InstanceId}}, @{Name='State';Expression={$_.Instances.State.Name}}, @{Name='Type';Expression={$_.Instances.InstanceType}} | Format-Table -AutoSize

PowerShell — Check instance reachability and retrieve system logs

# Get console output for an instance (useful for boot issues)
$instanceId = 'i-0123456789abcdef0'
(Get-EC2ConsoleOutput -InstanceId $instanceId -Region ap-south-1).Output | Out-File -FilePath .\console-output.txt -Encoding utf8

# Check if instance is passing status checks
(Get-EC2InstanceStatus -InstanceId $instanceId -Region ap-south-1).InstanceStatuses | Format-List

AWS CLI — Common quick checks

# Describe instances
aws ec2 describe-instances --region ap-south-1 --filters Name=instance-state-name,Values=running

# Describe volumes
aws ec2 describe-volumes --region ap-south-1 --filters Name=attachment.instance-id,Values=i-0123456789abcdef0

# Get system logs (console output)
aws ec2 get-console-output --instance-id i-0123456789abcdef0 --region ap-south-1 --output text

Mounting an EBS volume (Linux) — common steps

# Assume volume attached as /dev/xvdf (example)
sudo file -s /dev/xvdf
sudo mkfs -t ext4 /dev/xvdf     # only if the volume has no filesystem
sudo mkdir -p /mnt/data
sudo mount /dev/xvdf /mnt/data
# Add to /etc/fstab using the volume's UUID to persist across reboots
blkid /dev/xvdf

Using Systems Manager Session Manager for safe access (no SSH key required)

# Start a session from AWS CLI
aws ssm start-session --target i-0123456789abcdef0 --document-name AWS-StartInteractiveCommand --parameters 'commands=["whoami","uptime"]' --region ap-south-1

Programmatic example — Boto3 snippet to stop an instance (Python)

import boto3
ec2 = boto3.client('ec2', region_name='ap-south-1')
resp = ec2.stop_instances(InstanceIds=['i-0123456789abcdef0'])
print(resp)

Graph API (Azure) note — when managing hybrid identities

While AWS has its API and SDKs, many enterprises integrate EC2 hosts with Azure AD or Microsoft tooling for monitoring and identity. Below is an example Graph API snippet for retrieving device objects (useful if you join EC2 Windows instances to Azure AD):

GET https://graph.microsoft.com/v1.0/devices?$filter=operatingSystem eq 'Windows'&$select=id,displayName,operatingSystem
Authorization: Bearer <ACCESS_TOKEN>

Use Microsoft Graph SDKs to automate inventory or compliance reports when EC2 instances are hybrid-joined.

11. Operations — Patching, Backup, and CI/CD

Operational hygiene keeps instances secure and recoverable.

Patching

Use AWS Systems Manager Patch Manager to define patch baselines and schedule automated maintenance windows. Keep track of critical CVEs and remediate with automation documents.

Backups and snapshots

Regularly snapshot EBS volumes and copy snapshots across regions for disaster recovery. Use lifecycle policies to prune old snapshots and reduce cost.

CI/CD integration

Use CodeDeploy, CodePipeline, or open-source tools (Jenkins, GitLab CI) to deploy applications onto EC2 using immutable AMI-based deploys or blue/green strategies behind an ALB.

12. Monitoring, Logging & Observability

Visibility is key. Use CloudWatch to collect metrics (CPU, memory via agent, disk, network), logs, and alarms. Centralize logs in CloudWatch Logs or push to third-party systems (Elastic, Splunk).

Recommended metrics

CPUUtilization
NetworkIn / NetworkOut
DiskReadOps / DiskWriteOps
StatusCheckFailed

Creating an alarm (AWS CLI example)

aws cloudwatch put-metric-alarm --alarm-name HighCPU --metric-name CPUUtilization --namespace AWS/EC2 --statistic Average --period 300 --threshold 80 --comparison-operator GreaterThanThreshold --dimensions Name=InstanceId,Value=i-0123456789abcdef0 --evaluation-periods 2 --alarm-actions arn:aws:sns:ap-south-1:123456789012:NotifyMe

13. Governance and Compliance

Apply tagging conventions, enforce organization SCPs (if using AWS Organizations), and use IAM access boundaries to protect privileged operations. For regulated workloads, enable encryption, centralize logs, and use AWS Config for compliance drift detection.

Tagging strategy

Use tags such as Owner, Environment, CostCenter, and Project to enable cost allocation and automated lifecycle policies.

14. Real-world Example — Scaling a Web Application

Scenario: A web application experiences periodic traffic spikes. Use the following architecture:

Place web servers in an Auto Scaling Group across two AZs behind an Application Load Balancer.
Serve static assets from S3 + CloudFront to reduce compute load.
Use RDS for relational storage with read replicas for scaling reads.
Use spot instances for background workers (stateless) and configure interruption handling.

This architecture delivers high availability, lower latency, and cost efficiency while retaining operational simplicity.

15. Frequently Asked Questions (FAQ)

How do I choose between EBS and instance store?

Choose EBS for persistence and snapshots. Use instance store for ephemeral workloads needing very high IOPS and low latency.

When should I use spot instances?

Spot is ideal for fault-tolerant, stateless workloads like batch jobs, CI runners, and scalable ML worker fleets.

What is the recommended way to manage SSH access?

Use IAM roles and Systems Manager Session Manager for centralized, auditable, bastionless access; avoid managing SSH keys manually at scale.

16. EC2 Deployment Checklist

Choose right instance family and size
Create hardened AMIs and automate builds
Use Security Groups and IAM roles with least privilege
Enable CloudWatch monitoring and alarms
Implement backups and snapshot lifecycle policies
Tag resources for cost and governance
Use Auto Scaling for availability and performance

17. Further Reading & Resources

Recommended docs and tools to deepen your knowledge: AWS official docs, Compute Optimizer, Cost Explorer, and community guides. For more articles and hands-on tutorials, visit CloudKnowledge.