Compute Services: Amazon EC2 (Elastic Compute Cloud) — Complete Guide
What you'll learn
- Core EC2 concepts: instances, AMIs, storage, networking, pricing
- Security & access control best practices
- Scaling, high availability, and fault tolerance
- Cost optimization strategies & tools
- Troubleshooting checklist with PowerShell and AWS CLI/SDK snippets
1. Introduction to Amazon EC2
Amazon Elastic Compute Cloud (EC2) is a foundational compute service from AWS that provides scalable virtual servers in the cloud. Companies of all sizes use EC2 to run application servers, web workloads, batch processing jobs, container hosts, and GPU-based ML training.
EC2 provides flexible selection of operating systems, CPU architectures, networking and storage configurations, and instance lifecycle controls. It acts as a building block for many higher-level AWS services and integrates tightly with AWS networking, storage, and monitoring services.
Why EC2 matters
- Instant provisioning of compute resources
- Predictable performance when sized correctly
- Choice of pricing models for cost control
Who uses EC2?
- Developers building backend services
- Data scientists training ML models
- Enterprises running legacy workloads
- DevOps teams managing CI/CD runners
2. EC2 Instance Types — Choosing the right instance
EC2 offers multiple families tuned to different workload characteristics. Choosing the right instance family and size is one of the most important cost and performance decisions you will make.
General purpose
Balanced compute, memory and networking. Examples: t4g, m7i. Great for web servers and small-to-medium databases.
Compute optimized
High CPU-to-memory ratio. c6i, c7g. Use for batch processing, high-performance compute (HPC), or CPU-bound microservices.
Memory optimized
Large memory footprint for in-memory caches and databases. r7g, x2idn.
Storage optimized
High IOPS and throughput. i4i, d3. Suitable for NoSQL databases, data warehousing, and time-series stores.
Accelerated computing
GPU-enabled instances for ML training and graphics workloads: p5, g6.
Pricing models
Each instance can be acquired using different pricing models: on-demand, reserved, spot, and savings plans. Spot instances provide large discounts but can be interrupted.
3. AMIs (Amazon Machine Images) — Build once, reuse everywhere
An AMI is a snapshot that contains an operating system, optional application stack, and configuration to launch instances. AMIs can be:
- Published by AWS (official)
- Marketplace or community images
- Custom AMIs created from a configured instance
Use AMIs to standardize images across environments, remove manual configuration steps, and recover quickly after failures. When creating AMIs, bake as much as possible into the image (OS packages, agent software), and keep environment-specific configuration to instance startup scripts or user-data.
Best practices for AMIs
- Harden images: disable unused services, configure secure SSH, and remove default accounts.
- Automate AMI creation: use Packer or EC2 Image Builder for reproducible images.
- Tag AMIs with metadata for lifecycle management (owner, date, expiration).
4. Storage & Networking Options
EC2 integrates with multiple storage and networking services. Decisions here influence performance, resilience, and cost.
EBS — Elastic Block Store
EBS provides durable, network-attached block volumes. Common volume types include gp3 (general purpose SSD), io2 (high performance), and st1 (throughput-optimized HDD).
Instance store
Ephemeral, high-performance local disks useful for caches or temporary data. Data lost on instance stop/terminate.
EFS & FSx
Managed file systems that can be mounted by multiple instances. Great for shared storage across web servers and containers.
Networking
EC2 instances run in a VPC. Use subnets, route tables, Network ACLs, and Security Groups to secure traffic. Elastic IPs map a static public IP to an instance. Elastic Network Interfaces (ENIs) add network cards and IPs to instances.
Load balancing
Elastic Load Balancer (Application, Network, and Gateway) distributes traffic across instances and supports health checks, sticky sessions, and TLS termination.
| Storage | Use case | Persistence |
|---|---|---|
| EBS | OS and persistent data | Durable |
| Instance store | Temp caches, local disk I/O | Ephemeral |
| EFS | Shared file storage | Durable |
5. Pricing Models & Cost Optimization
Understanding pricing is essential. EC2 costs include instance hours, attached EBS volumes, snapshots, data transfer, and additional services like Elastic IPs and Load Balancers.
Pricing options
- On-demand: Flexibility with no commitment.
- Reserved Instances: Lower hourly rate for 1–3 year commitment.
- Savings Plans: Flexibility in instance families for a usage commitment.
- Spot: Up to 90% discount but interruptions possible.
Cost optimization strategies
- Right-size instances: use CloudWatch and Compute Optimizer recommendations.
- Use spot for stateless batch jobs and containers.
- Turn off non-production instances (use schedules).
- Migrate to Graviton (ARM) instances when compatible for better price/perf.
6. Common Use Cases
EC2 is used across a wide range of scenarios:
- Web hosting and application servers
- High-performance computing (HPC) and batch processing
- Machine learning training and inference
- Disaster recovery and backups
- Development and test environments
Example: ML training pipeline
Use GPU instances for training, S3 for large datasets, and spot instances for distributed workers. Use EC2 Auto Scaling with mixed instances policies to tolerate interruptions.
7. Scaling and High Availability
Combine Auto Scaling Groups (ASGs) with Elastic Load Balancers (ELB) to scale horizontally. Distribute instances across multiple Availability Zones (AZs) to reduce single-AZ failure impact.
Placement groups
Placement groups allow you to control instance placement for low-latency networks (cluster), spread, or partition placement strategies.
Multi-region architectures
For global resiliency, replicate across regions. Use Route 53 and latency-based routing for failover and traffic steering.
8. Security and Access Control
Security is a shared responsibility. AWS protects the infrastructure; you manage OS, network, and application hardening.
Key elements
- IAM roles: Grant EC2 instances only the permissions they need via instance profiles.
- Security Groups: Instance-level firewall for traffic control.
- Key pairs & session manager: Use SSH keys for Linux and session manager for bastionless access.
- Encryption: Use EBS encryption at rest and TLS in transit.
Patch management
Use AWS Systems Manager (SSM) Patch Manager to automate OS patching and ensure consistent baseline across instances.
9. Troubleshooting & Best Practices
The table below lists common EC2 issues, likely causes, and recommended resolutions.
| Issue | Possible Cause | Resolution |
|---|---|---|
| Instance not starting | Insufficient capacity or AMI issue | Try different AZ or instance type; verify AMI integrity |
| SSH connection failure | Security group or key pair misconfiguration | Ensure port 22 is allowed; verify correct key; use Session Manager |
| High latency | Unbalanced load or noisy neighbor | Use ELB, move to placement group, or change instance family |
| Billing spike | Unattached EBS or unexpected instances | Enable cost alerts, use tags, terminate idle resources |
| EBS volume not mounting | Device name mismatch or fstab error | Check /dev/ mapping and /etc/fstab; reattach and mount manually |
Troubleshooting checklist
- Check EC2 instance state in the console and system logs.
- Validate security group and network ACL rules.
- Confirm IAM role and SSM agent status for agent-based access.
- Inspect CloudWatch metrics for CPU, network, and disk usage.
- Use AWS Reachability Analyzer for network path issues.
10. PowerShell, AWS CLI and SDK Examples for Troubleshooting
Below are practical scripts and commands you can use during operations and troubleshooting. These are safe to paste into administration environments and are written with clarity so you can adapt them.
AWS Tools for PowerShell — List instances
Import-Module AWSPowerShell.NetCore
Set-AWSCredential -AccessKey <YOUR_KEY> -SecretKey <YOUR_SECRET> -StoreAs default
Get-EC2Instance -Region ap-south-1 | Select-Object @{Name='InstanceId';Expression={$_.Instances.InstanceId}}, @{Name='State';Expression={$_.Instances.State.Name}}, @{Name='Type';Expression={$_.Instances.InstanceType}} | Format-Table -AutoSize
PowerShell — Check instance reachability and retrieve system logs
# Get console output for an instance (useful for boot issues)
$instanceId = 'i-0123456789abcdef0'
(Get-EC2ConsoleOutput -InstanceId $instanceId -Region ap-south-1).Output | Out-File -FilePath .\console-output.txt -Encoding utf8
# Check if instance is passing status checks
(Get-EC2InstanceStatus -InstanceId $instanceId -Region ap-south-1).InstanceStatuses | Format-List
AWS CLI — Common quick checks
# Describe instances
aws ec2 describe-instances --region ap-south-1 --filters Name=instance-state-name,Values=running
# Describe volumes
aws ec2 describe-volumes --region ap-south-1 --filters Name=attachment.instance-id,Values=i-0123456789abcdef0
# Get system logs (console output)
aws ec2 get-console-output --instance-id i-0123456789abcdef0 --region ap-south-1 --output text
Mounting an EBS volume (Linux) — common steps
# Assume volume attached as /dev/xvdf (example)
sudo file -s /dev/xvdf
sudo mkfs -t ext4 /dev/xvdf # only if the volume has no filesystem
sudo mkdir -p /mnt/data
sudo mount /dev/xvdf /mnt/data
# Add to /etc/fstab using the volume's UUID to persist across reboots
blkid /dev/xvdf
Using Systems Manager Session Manager for safe access (no SSH key required)
# Start a session from AWS CLI
aws ssm start-session --target i-0123456789abcdef0 --document-name AWS-StartInteractiveCommand --parameters 'commands=["whoami","uptime"]' --region ap-south-1
Programmatic example — Boto3 snippet to stop an instance (Python)
import boto3
ec2 = boto3.client('ec2', region_name='ap-south-1')
resp = ec2.stop_instances(InstanceIds=['i-0123456789abcdef0'])
print(resp)
Graph API (Azure) note — when managing hybrid identities
While AWS has its API and SDKs, many enterprises integrate EC2 hosts with Azure AD or Microsoft tooling for monitoring and identity. Below is an example Graph API snippet for retrieving device objects (useful if you join EC2 Windows instances to Azure AD):
GET https://graph.microsoft.com/v1.0/devices?$filter=operatingSystem eq 'Windows'&$select=id,displayName,operatingSystem
Authorization: Bearer <ACCESS_TOKEN>
Use Microsoft Graph SDKs to automate inventory or compliance reports when EC2 instances are hybrid-joined.
11. Operations — Patching, Backup, and CI/CD
Operational hygiene keeps instances secure and recoverable.
Patching
Use AWS Systems Manager Patch Manager to define patch baselines and schedule automated maintenance windows. Keep track of critical CVEs and remediate with automation documents.
Backups and snapshots
Regularly snapshot EBS volumes and copy snapshots across regions for disaster recovery. Use lifecycle policies to prune old snapshots and reduce cost.
CI/CD integration
Use CodeDeploy, CodePipeline, or open-source tools (Jenkins, GitLab CI) to deploy applications onto EC2 using immutable AMI-based deploys or blue/green strategies behind an ALB.
12. Monitoring, Logging & Observability
Visibility is key. Use CloudWatch to collect metrics (CPU, memory via agent, disk, network), logs, and alarms. Centralize logs in CloudWatch Logs or push to third-party systems (Elastic, Splunk).
Recommended metrics
- CPUUtilization
- NetworkIn / NetworkOut
- DiskReadOps / DiskWriteOps
- StatusCheckFailed
Creating an alarm (AWS CLI example)
aws cloudwatch put-metric-alarm --alarm-name HighCPU --metric-name CPUUtilization --namespace AWS/EC2 --statistic Average --period 300 --threshold 80 --comparison-operator GreaterThanThreshold --dimensions Name=InstanceId,Value=i-0123456789abcdef0 --evaluation-periods 2 --alarm-actions arn:aws:sns:ap-south-1:123456789012:NotifyMe
13. Governance and Compliance
Apply tagging conventions, enforce organization SCPs (if using AWS Organizations), and use IAM access boundaries to protect privileged operations. For regulated workloads, enable encryption, centralize logs, and use AWS Config for compliance drift detection.
Tagging strategy
Use tags such as Owner, Environment, CostCenter, and Project to enable cost allocation and automated lifecycle policies.
14. Real-world Example — Scaling a Web Application
Scenario: A web application experiences periodic traffic spikes. Use the following architecture:
- Place web servers in an Auto Scaling Group across two AZs behind an Application Load Balancer.
- Serve static assets from S3 + CloudFront to reduce compute load.
- Use RDS for relational storage with read replicas for scaling reads.
- Use spot instances for background workers (stateless) and configure interruption handling.
This architecture delivers high availability, lower latency, and cost efficiency while retaining operational simplicity.
15. Frequently Asked Questions (FAQ)
How do I choose between EBS and instance store?
Choose EBS for persistence and snapshots. Use instance store for ephemeral workloads needing very high IOPS and low latency.
When should I use spot instances?
Spot is ideal for fault-tolerant, stateless workloads like batch jobs, CI runners, and scalable ML worker fleets.
What is the recommended way to manage SSH access?
Use IAM roles and Systems Manager Session Manager for centralized, auditable, bastionless access; avoid managing SSH keys manually at scale.
16. EC2 Deployment Checklist
- Choose right instance family and size
- Create hardened AMIs and automate builds
- Use Security Groups and IAM roles with least privilege
- Enable CloudWatch monitoring and alarms
- Implement backups and snapshot lifecycle policies
- Tag resources for cost and governance
- Use Auto Scaling for availability and performance
17. Further Reading & Resources
Recommended docs and tools to deepen your knowledge: AWS official docs, Compute Optimizer, Cost Explorer, and community guides. For more articles and hands-on tutorials, visit CloudKnowledge.













Leave a Reply