Welcome to the final post in our four-part series on security best practices for Azure Kubernetes Service. In the first three installments, we covered
- how to create secure AKS clusters and container images (part 1),
- how to lock down cluster networking (part 2), and
- how to plan and enforce application runtime safeguards (part 3).
This post will close out the series by covering the routine maintenance and operational tasks required to keep your AKS clusters and infrastructure secured.
Cluster Maintenance and Other Tasks
AKS brings some specific operational maintenance requirements on top of universal best practices for maintaining secure Kubernetes clusters.
Why: You cannot keep your cluster secure and healthy if you do not know what’s happening in it.
What to do: A number of tools and options exist for monitoring your AKS cluster. Some tools are more security-aware than others, though, so make sure you have the following areas covered at a minimum:
- Monitor your application logs and performance for anomalies.
- Enable AKS master component logs so you can view and monitor those. At a minimum, you will want to collect logs for the following components:
- kube-apiserver – Logs all calls to the cluster’s Kubernetes API, including source IP addresses
- kube-audit – Kubernetes audit events
- Audit your RBAC roles and bindings regularly.
eBook - Definitive Guide to Azure Kubernetes Service Security
Download to learn about the best practices for building, deploying, and running containerized applications in Azure Kubernetes ServiceDownload Now
AKS Security Updates
Why: AKS does not fully manage keeping your cluster up-to-date with Kubernetes/AKS security patches. Keeping up with the latest security patches is foundational for good AKS and Kubernetes security.
What Azure manages:
- Makes AKS upgrades available
- Make Windows OS upgrades available (for Windows nodes)
- Applies Linux OS patches nightly
What the user must manage:
- Rebooting/replacing Linux nodes when OS patches require it
- Applying Windows OS patches to nodes and rebooting
- Applying AKS upgrades to masters and all nodes
What to do: Most importantly, you will need to find a reliable way to watch for AKS security update announcements. Your organization will need to automate or create a procedure for these tasks, as applicable.
- Applying AKS upgrades for masters and all nodes
- Rebooting Linux nodes as needed for OS patches (not all patches require reboots)
- Apply AKS and OS updates to Windows nodes and reboot
Managing the Azure Service Principal
Note that the managed identities feature for AKS is currently in preview.
Why: Azure uses an Active Directory service principal to perform the creation and update of the Azure resources needed by an AKS cluster. By default, this service principal’s credentials are valid for one year, after which point the user must manage updating the credentials to maintain proper functionality of the associated AKS cluster. Automating the rotation of these static credentials within a shorter lifespan will improve the security posture and reliability of your clusters and Azure resources.
What to do: Follow these instructions for manually rotating the service principal’s credentials.
Users can also choose managed identities to avoid the use of static credentials. Managed identities must be enabled at cluster creation time.
Rotate Cluster Certificates
Why: AKS creates a number of TLS certificates for various control plane and node components. For clusters created after March 2019, these certificates are valid for 30 years. Many compliance audits require much shorter lifespans for certificates. Protecting these certificates, especially if you are not using a private cluster, is critical.
What to do: You can rotate these certificates, but be aware, it can require downtime of up to 30 minutes for the AKS cluster.
The Kubernetes Dashboard
Why: AKS installs the Kubernetes Dashboard in every cluster and it cannot be deleted permanently. (If the deployment, role binding, or service objects are deleted, they automatically get recreated.)
The Kubernetes Dashboard is a major source of concern for several reasons.
- It requires a number of cluster RBAC permissions to function. Even if it has only been granted read-only permissions, the dashboard still shares information that most workloads in the cluster should not need to access. Users who need to access the information should be using their own RBAC credentials and permissions.
- AKS does not enable any authentication requirement for the dashboard. Any entity that can connect to the endpoint has full access. Although the associated Kubernetes service only has an internal endpoint, any pod in the cluster and any user who can run
kubectl port-forwardin the cluster can access it.
- The Kubernetes Dashboard has been the subject of a number of CVEs. Because of its level of access to the cluster’s Kubernetes API and its lack of internal controls, vulnerabilities can be extremely dangerous.
What to do:
Do not grant the
kubernetes-dashboardservice account any additional privileges, and remove any bindings to the service account. (Note that the
kubernetes-dashboard-minimalrole in the
kube-systemnamespace is managed by AKS and will get recreated automatically if it gets deleted.)
Shell command to find cluster role bindings that apply to the
kubernetes-dashboardservice account (requires the jq tool):
kubectl get clusterrolebindings -ojson | \ jq -r '.items | select(.subjects? | \ (.name == "kubernetes-dashboard") and \ (.kind == "ServiceAccount") and \ (.namespace == "kube-system")) | \ .metadata.name'
Shell command to find all role bindings in all namespaces that bind to the
kubernetes-dashboardservice account (requires the jq tool):
kubectl get rolebindings --all-namespaces -ojson | \ jq -r '.items | select(.subjects? | \ (.name == "kubernetes-dashboard") and \ (.kind == "ServiceAccount") and \ (.namespace == "kube-system")) | \ [.metadata.name, .metadata.namespace]'
Use network policies to block all ingress to the dashboard.
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: block-kubernetes-dashboard namespace: kube-system spec: podSelector: matchLabels: k8s-app: kubernetes-dashboard policyTypes: - Ingress ingress: 
AKS Virtual Nodes
Why: Some critical Kubernetes controls, including network policies, do not apply to pods running on Virtual Nodes. Given the reduced visibility into and controls for Virtual Nodes, they may not be suitable for many workloads.
What to do: Understand the limitations of Virtual Nodes and decide if the risk is worth the convenience factor. Autoscaling node pools generally provide a suitable alternative for controlling costs and capacity for applications with variable loads or for task-based workloads, without compromising cluster security.
Thank you for following along with this comprehensive series on AKS security. As you probably noticed, it requires a sizable amount of effort and diligence to plan and run trustworthy AKS clusters.
A few other points that are worth noting:
- Application security, particularly for SaaS or other multi-tenant services, was largely outside the scope of this series. Azure and many third-party providers offer a number of options for Web Application Firewalls and other tools for controlling and monitoring the external network traffic of your services.
- AKS does not always support new Kubernetes features, even when they are generally available in open-source Kubernetes. Pay attention to AKS release notes.
- If AKS does not support a Kubernetes feature you want or need, let them know. Provide feedback through your customer support channels to help make the AKS platform more secure by default and to simplify operations.