Protecting Kubernetes API Against CVE-2019-11253 (Billion Laughs Attack) and Other Vulnerabilities

A potentially serious and unpatched security issue was opened over the weekend in the Kubernetes GitHub repository: the parsing of YAML manifests by the Kubernetes API server could lead to a denial-of-service attack against a cluster’s Kubernetes API service, therefore leaving it vulnerable to an instance of a “billion laughs” attack. While a CVE candidate has been reserved, with full details for the CVE yet to be published, it still presents a substantial threat to Kubernetes cluster security that you can and should take steps to protect against.

How to protect your clusters

At the time of writing, no patches that fix the underlying vulnerability have been released. However, the issue once again serves as a reminder that, like all software, Kubernetes is vulnerable to zero-day exploits. Thus, mere access to your Kubernetes API server should be treated as sensitive, regardless of how tight your application-level authorization policies (i.e., Kubernetes RBAC) are.

We recommend that at this point, you take a minute to ensure that your Kubernetes API server is as shielded as possible by following these best practices:

Review your RBAC policies and make sure that only fully trustworthy entities have privileged access your cluster’s resources. Generally speaking, access of any kind should be treated as privileged, but extra caution is required for roles that allow modifying the state of resources (create, edit, patch, delete, deletecollection verbs).
Audit your cluster roles and role bindings regularly. Pay close attention to the privileges that subjects with low or no trust have, like unauthenticated users. Any role bindings that grant privileges to the system:anonymous user or the system:unauthenticated group should be thoroughly reviewed or removed if deemed unnecessary. You can use this script to check if there are any such role bindings in your cluster. Check out our blog post discussing Kubernetes configuration best practices for the API server for additional recommendations.
In most cases, unauthenticated users have no business whatsoever on your cluster. Keep in mind that even information such as the API server version, or the mere fact that a Kubernetes API server is running on a given host, is valuable information for an attacker. Rather than ensuring unauthenticated users are not granted any privileges via RBAC, you can disable anonymous access completely by passing the --anonymous-auth=false flag to both the API server and the Kubelets (note that in the most recent version 1.16, the default is true). Check your Kubernetes provider documentation on how to change this flag, but note that on managed providers, this isn’t always possible.
Do not expose your Kubernetes API server endpoint on the internet. Protect it with network firewalls, and only allow access from trustworthy (private) subnets or VPC networks. (Note that most cloud providers default to the API server being exposed on the internet).

Explanation of the Issue

The Kubernetes API service, the main point of interaction with a Kubernetes cluster’s master and its resources, is backed by the Kubernetes apiserver. The apiserver accepts incoming connections, authenticates the connection, checks if the authenticated entity is authorized to make the request, then applies the corresponding request handlers. For requests that have a data payload (such as object creations or modifications), the apiserver parses the data before applying the handlers.

The Kubernetes API accepts these data payloads in three formats: JSON,YAML, and Protocol Buffers. One feature that — among the supported formats — is unique to YAML concerns the use of “references,” designed to reuse previously introduced YAML blocks called “nodes” in a document. YAML references are very useful for reducing the size of YAML files where different blocks share the same, or similar, configurations. A very simple example follows:

$ cat <<EOF | yq '.'
> ---
> block1: &default
>   value1: hello
>   value2: there
>
> block2:
>   <<: *default
>   value1: goodbye
> EOF
{
  "block1": {
    "value1": "hello",
    "value2": "there"
  },
  "block2": {
    "value1": "goodbye",
    "value2": "there"
  }
}

References to nodes can be used in nodes that are themselves referenced in other nodes. This nesting of references, and its subsequent expansion, form the root cause of the current security issue with the Kubernetes API. Using several levels of nesting, a tiny YAML file can grow to several gigabytes in size when consequently expanded. Since the apiserver does not perform input validation on uploaded YAMLs to detect such patterns, and also does not impose hard limits on the size of an expanded file, excessive CPU and/or RAM usage may result, making the apiserver non-responsive to incoming connections. This type of denial-of-service (DoS) attack is a form of the “billion laughs” attack, previously documented in XML parsers.

While YAML is the de facto standard for configuring Kubernetes cluster resources in a human-readable form, it is worth noting that the API server usually doesn’t get to see those YAML manifests: the kubectl tool, as well as most Kubernetes client libraries, convert any object definitions to JSON before sending them to the API server. This step also explains why some people initially assumed that the issue affected only kubectl, which would have been far less severe. However, using tools like curl, it is possible to feed raw YAML data to the API server. Consequently, any instance of resource exhaustion caused by such a YAML payload is unlikely to be an accident.

Going forward

A full fix to safeguard from this “billion laughs” attack will have to be made in the Kubernetes apiserver code itself. However, the existence of unpatched issues like this one reiterates the need to adopt best practices for securing your Kubernetes clusters and all their resources, especially the Kubernetes API.

Protecting Kubernetes API Against CVE-2019-11253 (Billion Laughs Attack) and Other Vulnerabilities

By: Karen Bruner and Malte Isberner

How to protect your clusters

Explanation of the Issue

Going forward