EKS Runtime Security Best Practices for AWS Workloads

This is part 4 of our 5-part AWS Elastic Kubernetes Service (EKS) security blog series. Don’t forget to check out our previous blog posts in the series:

Part 1 - Guide to Designing EKS Clusters for Better Security
Part 2 - Securing EKS Cluster Add-ons: Dashboard, Fargate, EC2 components, and more
Part 3 - EKS networking best practices

Following best practices for running your workloads on EKS plays a crucial part in keeping the cluster and all its workloads safe. Overly privileged pods, for example, pose a huge danger if they get infiltrated. Compromised or otherwise misbehaving EKS workloads pose possibly the greatest threat to your cluster’s security, making enforcement of minimal privilege for the container processes critical.

In addition, using and properly configuring and monitoring Kubernetes RBAC in the cluster, as well as limiting runtime privileges, in combination with pod security policies or a third-party admission controller, should be among the highest priorities for security lockdown.

Follow our guidance below to protect your running workloads on AWS’s EKS.

Use Namespaces

Why: Kubernetes namespaces provide scoping for cluster objects, allowing fine-grained cluster object management. Kubernetes RBAC rules for most resource types apply at the namespace level. Controls like network policies and many add-on tools and frameworks like service meshes also often apply at the namespace scope.

What to do: Plan out how you want to assign namespaces before you start deploying workloads to your clusters. Having one namespace per application provides the best opportunity for control, although it does bring extra management overhead when assigning RBAC role privileges and default network policies. If you do decide to group more than one application into a namespace, the main criteria should be whether those applications have common RBAC requirements and whether it would be safe to grant those privileges to the service accounts and users which need Kubernetes API access in that namespace.

Use Kubernetes RBAC

Why: Kubernetes Role-Based Access Control provides the standard method for managing authorization for the Kubernetes API endpoints. The practice of creating and managing comprehensive RBAC roles that follow the principle of least privilege, in addition to performing regular audits of how those roles are delegated with role bindings, provides some of the most critical protections possible for your EKS clusters, both from external bad actors and internal misconfigurations and accidents.

What to do: Configuring Kubernetes RBAC effectively and securely requires some understanding of the Kubernetes API. You can start with the official documentation, read about some best practices, and you may also want to work through some tutorials.

Once your team has solid working knowledge of RBAC, create some internal policies and guidelines. Make sure you also regularly audit your Role permissions and RoleBindings. Pay special attention to minimizing the use of ClusterRoles and ClusterRoleBindings, as these apply globally across all namespaces and to resources that do not support namespaces. (You can use the output of kubectl api-resources in your cluster to see which resources are not namespace-scoped.)

Protect the Cluster’s RBAC Authorization Configuration

Why: EKS uses a Kubernetes ConfigMap resource to grant Kubernetes RBAC privileges to AWS IAM users and roles. The aws-auth ConfigMap in the kube-system namespace in the cluster assigns IAM entities to groups for use in RBAC role bindings. Protecting this ConfigMap’s contents from unauthorized writes is absolutely critical for preventing authenticated users from increasing their RBAC permissions or removing access for other IAM users and roles.

What to do: Because the ConfigMap is a core Kubernetes resource type used by many, if not most, Kubernetes workloads, many cluster API users and some service accounts often have permission to modify ConfigMaps in at least one namespace. Cluster owners will need to perform careful, ongoing curation of RBAC permission grants to ensure that no unintended entities end up with the ability to change the aws-auth contents.

Do not grant write access to ConfigMaps in ClusterRoles, which apply globally across all namespaces. Use RoleBindings to limit these permissions to specific namespaces.

Limit Container Runtime Privileges

Why: Most containerized applications will not need any special host privileges on the node to function properly. By following the principle of least privilege and minimizing the capabilities of your cluster’s running containers, you can greatly reduce the level of exploitation for malicious containers and of accidental damage by misbehaving applications.

What to do: Use the PodSpec Security Context to define the exact runtime requirements for each workload. Use Pod Security Policies and/or admission controllers like Open Policy Agent (OPA) Gatekeeper to enforce those best practices by the Kubernetes API at object creation time.

Some guidelines:

Do not allow containers to run as root. Running as root creates by far the greatest risk, because root in a container has root on the node.
Do not use the host network or process space. Again, these settings create the potential for compromising the node and every container running on it.
Do not allow privilege escalation.
Use a read-only root filesystem in the container.
Use the default (masked) /proc filesystem mount.
Drop unused Linux capabilities and do not add optional capabilities that your application does not absolutely require. (Available capabilities depend on the container runtime in use on the nodes. EKS uses the Docker runtime, which supports these capabilities. The first table lists capabilities loaded by default, while the second table shows optional capabilities that may be added.)
Use SELinux options for more fine-grained process controls.
Give each application its own Kubernetes Service Account rather than sharing or using the namespace’s default service account.
Do not mount the service account token in a container if the container does not need to access the Kubernetes API.

Use Pod Security Policies

Why: Kubernetes Pod Security Policy provides a method to enforce best practices around minimizing container runtime privileges, including not running as the root user, not sharing the host node’s process or network space, not being able to access the host filesystem, enforcing SELinux, and other options. Most cluster workloads will not need special permissions. By forcing containers to use the least-required privilege, their potential for malicious exploitability or accidental damage can be minimized.

Pod Security Policies are enabled automatically for all EKS clusters starting with platform version 1.13. EKS gives them a completely-permissive default policy named eks.privileged.

What to do: Create policies which enforce the recommendations under Limit Container Runtime Privileges, shown above. Policies are best tested in a non-production environment running the same applications as your production cluster, after which you can deploy them in production.

Once you have migrated all your workloads to stricter policies, remove the capability to deploy user workloads using the permissive default policy by deleting the ClusterRoleBinding eks:podsecuritypolicy:authenticated.

Alternatively, because plans exist to deprecate PSPs in the future and because they only apply to a subset of controls, consider deploying a configurable admission controller, described below.

Use an Admission Controller to Enforce Best Practices

Why: Kubernetes supports using admission controllers, which can be configured to evaluate requests to the Kubernetes API. In the case of validating controllers, an admission controller can deny requests that fail to meet certain requirements, while mutating controllers can make changes to the request, such as injecting a sidecar container into a pod or adding labels to an object, before sending it to the Kubernetes API.

One increasingly popular option to use for a validating admission controller is Open Policy Agent (OPA) Gatekeeper. The Gatekeeper admission controller uses custom Kubernetes resources to configure the requirements for Kubernetes resources. Users can create policies tailored to their needs and applications to enforce a variety of best practices by preventing non-conforming objects from getting created in a cluster. While some overlap of Pod Security Policy capabilities exists, OPA allows restrictions not just on pods, but on any cluster resource using virtually any field.

What to do: You can write a custom admission controller to suit your specific needs, or install Gatekeeper or similar tool in your cluster. Note that while some example resources for enforcing common requirements in Gatekeeper exist, the policy configuration language and management come with a rather steep learning curve. As OPA and Gatekeeper gain greater adoption, more community resources should become available.

Note that Gatekeeper requires Kubernetes version 1.14 or higher.