itsmostafa / ecs

AWS ECS container orchestration for running Docker containers. Use when deploying containerized applications, configuring task definitions, setting up services, managing clusters, or troubleshooting container issues.

0 views
0 installs

Skill Content

---
name: ecs
description: AWS ECS container orchestration for running Docker containers. Use when deploying containerized applications, configuring task definitions, setting up services, managing clusters, or troubleshooting container issues.
last_updated: "2026-01-07"
doc_source: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/
---

# AWS ECS

Amazon Elastic Container Service (ECS) is a fully managed container orchestration service. Run containers on AWS Fargate (serverless) or EC2 instances.

## Table of Contents

- [Core Concepts](#core-concepts)
- [Common Patterns](#common-patterns)
- [CLI Reference](#cli-reference)
- [Best Practices](#best-practices)
- [Troubleshooting](#troubleshooting)
- [References](#references)

## Core Concepts

### Cluster

Logical grouping of tasks or services. Can contain Fargate tasks, EC2 instances, or both.

### Task Definition

Blueprint for your application. Defines containers, resources, networking, and IAM roles.

### Task

Running instance of a task definition. Can run standalone or as part of a service.

### Service

Maintains desired count of tasks. Handles deployments, load balancing, and auto scaling.

### Launch Types

| Type | Description | Use Case |
|------|-------------|----------|
| **Fargate** | Serverless, pay per task | Most workloads |
| **EC2** | Self-managed instances | GPU, Windows, specific requirements |

## Common Patterns

### Create a Fargate Cluster

**AWS CLI:**

```bash
# Create cluster
aws ecs create-cluster --cluster-name my-cluster

# With capacity providers
aws ecs create-cluster \
  --cluster-name my-cluster \
  --capacity-providers FARGATE FARGATE_SPOT \
  --default-capacity-provider-strategy \
    capacityProvider=FARGATE,weight=1 \
    capacityProvider=FARGATE_SPOT,weight=1
```

### Register Task Definition

```bash
cat > task-definition.json << 'EOF'
{
  "family": "web-app",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "256",
  "memory": "512",
  "executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
  "taskRoleArn": "arn:aws:iam::123456789012:role/ecsTaskRole",
  "containerDefinitions": [
    {
      "name": "web",
      "image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:latest",
      "portMappings": [
        {
          "containerPort": 8080,
          "protocol": "tcp"
        }
      ],
      "environment": [
        {"name": "NODE_ENV", "value": "production"}
      ],
      "secrets": [
        {
          "name": "DB_PASSWORD",
          "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:db-password"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/web-app",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      },
      "healthCheck": {
        "command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
        "interval": 30,
        "timeout": 5,
        "retries": 3,
        "startPeriod": 60
      }
    }
  ]
}
EOF

aws ecs register-task-definition --cli-input-json file://task-definition.json
```

### Create Service with Load Balancer

```bash
aws ecs create-service \
  --cluster my-cluster \
  --service-name web-service \
  --task-definition web-app:1 \
  --desired-count 2 \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={
    subnets=[subnet-12345678,subnet-87654321],
    securityGroups=[sg-12345678],
    assignPublicIp=DISABLED
  }" \
  --load-balancers "targetGroupArn=arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/web-tg/1234567890123456,containerName=web,containerPort=8080" \
  --health-check-grace-period-seconds 60
```

### Run Standalone Task

```bash
aws ecs run-task \
  --cluster my-cluster \
  --task-definition my-batch-job:1 \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={
    subnets=[subnet-12345678],
    securityGroups=[sg-12345678],
    assignPublicIp=ENABLED
  }"
```

### Update Service (Deploy New Image)

```bash
# Register new task definition with updated image
aws ecs register-task-definition --cli-input-json file://task-definition.json

# Update service to use new version
aws ecs update-service \
  --cluster my-cluster \
  --service web-service \
  --task-definition web-app:2 \
  --force-new-deployment
```

### Auto Scaling

```bash
# Register scalable target
aws application-autoscaling register-scalable-target \
  --service-namespace ecs \
  --resource-id service/my-cluster/web-service \
  --scalable-dimension ecs:service:DesiredCount \
  --min-capacity 2 \
  --max-capacity 10

# Target tracking policy
aws application-autoscaling put-scaling-policy \
  --service-namespace ecs \
  --resource-id service/my-cluster/web-service \
  --scalable-dimension ecs:service:DesiredCount \
  --policy-name cpu-target-tracking \
  --policy-type TargetTrackingScaling \
  --target-tracking-scaling-policy-configuration '{
    "TargetValue": 70.0,
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "ECSServiceAverageCPUUtilization"
    },
    "ScaleOutCooldown": 60,
    "ScaleInCooldown": 120
  }'
```

## CLI Reference

### Cluster Management

| Command | Description |
|---------|-------------|
| `aws ecs create-cluster` | Create cluster |
| `aws ecs describe-clusters` | Get cluster details |
| `aws ecs list-clusters` | List clusters |
| `aws ecs delete-cluster` | Delete cluster |

### Task Definitions

| Command | Description |
|---------|-------------|
| `aws ecs register-task-definition` | Create task definition |
| `aws ecs describe-task-definition` | Get task definition |
| `aws ecs list-task-definitions` | List task definitions |
| `aws ecs deregister-task-definition` | Deregister version |

### Services

| Command | Description |
|---------|-------------|
| `aws ecs create-service` | Create service |
| `aws ecs update-service` | Update service |
| `aws ecs describe-services` | Get service details |
| `aws ecs delete-service` | Delete service |

### Tasks

| Command | Description |
|---------|-------------|
| `aws ecs run-task` | Run standalone task |
| `aws ecs stop-task` | Stop running task |
| `aws ecs describe-tasks` | Get task details |
| `aws ecs list-tasks` | List tasks |

## Best Practices

### Security

- **Use task roles** for AWS API access (not access keys)
- **Use execution roles** for ECR/Secrets access
- **Store secrets in Secrets Manager** or Parameter Store
- **Use private subnets** with NAT gateway
- **Enable CloudTrail** for API auditing

### Performance

- **Right-size CPU/memory** — monitor and adjust
- **Use Fargate Spot** for fault-tolerant workloads (70% savings)
- **Enable container insights** for monitoring
- **Use service discovery** for internal communication

### Reliability

- **Deploy across multiple AZs**
- **Configure health checks** properly
- **Set appropriate deregistration delay**
- **Use circuit breaker** for deployments

```bash
aws ecs update-service \
  --cluster my-cluster \
  --service web-service \
  --deployment-configuration '{
    "deploymentCircuitBreaker": {
      "enable": true,
      "rollback": true
    }
  }'
```

### Cost Optimization

- **Use Fargate Spot** for batch workloads
- **Right-size task resources**
- **Scale to zero** when not needed
- **Use capacity providers** for mixed Fargate/Spot

## Troubleshooting

### Task Fails to Start

**Check:**

```bash
# View stopped tasks
aws ecs describe-tasks \
  --cluster my-cluster \
  --tasks $(aws ecs list-tasks --cluster my-cluster --desired-status STOPPED --query 'taskArns[0]' --output text)
```

**Common causes:**
- Image not found (ECR permissions)
- Secrets access denied
- Network configuration (subnets, security groups)
- Resource limits exceeded

### Container Keeps Restarting

**Debug:**

```bash
# Check CloudWatch logs
aws logs get-log-events \
  --log-group-name /ecs/web-app \
  --log-stream-name "ecs/web/abc123"

# Check task details
aws ecs describe-tasks \
  --cluster my-cluster \
  --tasks task-arn \
  --query 'tasks[0].containers[0].{reason:reason,exitCode:exitCode}'
```

**Causes:**
- Health check failing
- Application crashing
- Out of memory

### Service Stuck Deploying

```bash
# Check deployment status
aws ecs describe-services \
  --cluster my-cluster \
  --services web-service \
  --query 'services[0].deployments'

# Check events
aws ecs describe-services \
  --cluster my-cluster \
  --services web-service \
  --query 'services[0].events[:5]'
```

**Causes:**
- Health check failing on new tasks
- Not enough capacity
- Target group health checks failing

### Cannot Pull Image from ECR

**Check execution role has:**

```json
{
  "Effect": "Allow",
  "Action": [
    "ecr:GetAuthorizationToken",
    "ecr:BatchCheckLayerAvailability",
    "ecr:GetDownloadUrlForLayer",
    "ecr:BatchGetImage"
  ],
  "Resource": "*"
}
```

**Also check:**
- VPC endpoint for ECR (if private subnet)
- NAT gateway (if private subnet)
- Security group allows HTTPS outbound

## References

- [ECS Developer Guide](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/)
- [ECS API Reference](https://docs.aws.amazon.com/AmazonECS/latest/APIReference/)
- [ECS CLI Reference](https://docs.aws.amazon.com/cli/latest/reference/ecs/)
- [boto3 ECS](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ecs.html)