To run tasks in ECS, up to four different roles are required. Which ones you need depends on a variety of factors. Probably the most frustrating thing for me when getting started with ECS was confusion around which of these roles needed what permissions, the purpose of each of them, how to create them, what components of the system used them, and where to configure them. This post does not attempt to be a complete introduction or reference to ECS in general, just a source to hopefully clear up confusion around IAM and ECS.

Role Types

Host Role

When running ECS on EC2, the EC2 instances hosting the containers need a role. This role gives them permission to, among other things, pull images from ECR, manage tasks in the ECS API, and put logs into cloudwatch.

Task Execution Role

When running in Fargate, there are no EC2 instances hosting your containers, so these permissions have to go somewhere. This is called a Task Execution Role. It gives the Fargate service the same permissions the EC2 instance would need. This role is not required when running tasks on EC2 backed ECS.

ECS Service-Linked Role

This is a role used by the ECS service itself to perform functions such as managing load balancer configuration, doing service discovery, as well as attaching network interfaces when using the `awsvpc` network mode. There is only one of these per account.

ECS Task Role (or Container Role)

Not to be confused with the Task Execution Role, the Task Role is used when code running inside the container needs access to AWS resources. This is equivalent to the instance profile if the code was running directly on an EC2 instance.

Creating the Required Roles in an ALKS-Controlled Account

At my company, we use a tool called ALKS to manage access to and permissions in our AWS accounts. We open sourced a Terraform provider for it, and that’s what my examples will be using. If needed, find another source for how to create the roles and use these examples for information on what policies to attach.

Host Role

This will be a standard IAM Role. First create the role itself:

resource "alks_iamrole" "ecs_host" {
  name                     = "ecs-host-role"
  type                     = "Amazon EC2"
  include_default_policies = false
}

If you use multiple clusters, you can prefix it with the name of your cluster, creating a `flume-ecs-host-role` for example.

Then attach the required policy:

resource "aws_iam_role_policy_attachment" "ecs_host_ecs_attachment" {
  role       = "${alks_iamrole.ecs_host.name}"
  policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role"
}

This is an AWS Managed policy, so there’s no need to create it.

Task Execution Role

Create a role with the type “Amazon EC2 Container Service Task Role” and attach the AWS provided policy to it.

resource "alks_iamrole" "task_execution_role" {
  name                     = "ecsTaskExecutionRole"
  type                     = "Amazon EC2 Container Service Task Role"
  include_default_policies = false
}

resource "aws_iam_role_policy_attachment" "task_execution_attachment" {
  policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy" // AWS provided policy
  role       = "${alks_iamrole.task_execution_role.name}"
}

Service Linked Role

Normally, this role would be created automatically the first time it’s needed. However, if your account is as locked down as my account at work is, you’ll need to create it manually from a privileged login.

Provide the AWS cli with credentials that have permission to create roles, and run:

$ aws iam create-service-linked-role --aws-service-name ecs.amazonaws.com

This only needs to be run once per account. Once the role is created you’ll never have to worry about it again, and you won’t even have to refer to it in any Terraform or other configuration. The ECS service will just use it if it exists.

Task Role

Create a role as normal, but give it the type of “Amazon EC2 Container Service Task Role”

resource "alks_iamrole" "config_services_container" {
  name                     = "application-name-container-role"
  type                     = "Amazon EC2 Container Service Task Role"
  include_default_policies = false
} 

There are no required attachments or other settings here. Just name it something that makes sense and attach the policies you need to it.

What about through the UI?

While it’s certainly possible to do all this through the UI, I highly recommend using a tool like Terraform to manage configuration and permissions.

Create the roles with the correct “type” in the UI, and attach the required policies to them. You should use Terraform though.

Terraforming Services and Tasks

Here’s a quick overview of which roles go where when terraforming resources. The Terraform documentation is very good for the properties I’m leaving out. See docs for EC2 instance and ECS task definition.

resource "aws_instance" "ecs_host_instance" {
  iam_instance_profile = "${var.host_role_name}" // This is the Host Role, applied to the cluster instances. This is required to allow your host access to manage tasks.
}

resource "aws_ecs_task_definition" "ecs_task_definition" {
  execution_role_arn = "${var.task_execution_role_arn}" // This is the Task Execution Role, only required on Fargate. Called "ecsTaskExecutionRole" above.
  task_role_arn      = "${var.container_role_arn}" // This is the Task Role, or Container Role. This is required only if code running in your container needs access to AWS services.
}

Which IAM Role will my code run as?

Assuming your code is using a recent version of the AWS SDK with the default credentials provider chain, i.e. not explicitly specifying where credentials are coming from, it will first attempt to get credentials from the ECS Task Role. If that fails, it will fall back to the Host Role.

Note: Certain versions of Hadoop and services running on top of it like Flume, for example, will pull in the Host Role no matter what. If, like on Fargate, there is no Host Role, Flume will not be able to find credentials.

Sources: