Rancher Kubernetes Engine (RKE) Security Best Practice for Cluster Maintenance and Network Security
This is the last installment in our four-part RKE security blog series. Don’t forget to get caught up on everything you missed!
Part 4 - Rancher Kubernetes Engine (RKE) Security Best Practice for Cluster Maintenance and Network Security
The concept of zero-trust network security can be used to address the new security challenges of cloud-native architectures. These challenges include:
- The sharing of cloud infrastructure among workloads with different levels of trust
- Smaller microservices resulting in increased complexity and a larger attack surface for applications
Microservices architecture creates a more extensive network attack surface. To address this issue, administrators and developers will have to ensure that both external and internal software-defined networks are securely configured.
Enable Network Policy
By default, network traffic in an RKE cluster is allowed between all pods and can leave the cluster network altogether. This is also the general default policy for Kubernetes clusters so that beginners can use Kubernetes without all pods being effectively firewalled. As your organization increases its usage of Kubernetes, you will need to set restrictions on the cluster’s internal traffic.
Kubernetes relies on networking plugins that follow the Container Network Interface model (CNI) to manage traffic within the cluster effectively. With RKE, we can select the DNS provider and CNI directly in the cluster.yml file.
# Specify network plugin-in (canal, calico, flannel, weave, or none) network: plugin: canal # Specify DNS provider (coredns or kube-dns) dns: provider: coredns
The RKE default CNI plugin, Canal, supports network policies in Kubernetes. However, it is worth researching the various CNI options available and the network capabilities each one supports. To assist with this research, Rancher has compared the CNI plugins that it currently offers.
Network policies are implemented by the CNI plugin and applied on a per-namespace level, with additive permissions. Similar to RBAC permissions, once a network policy is defined and implemented, all network traffic will be excluded except for the specified traffic. As other traffic becomes required, you can add additional ports, protocols and even define IP blocks. Kubernetes recommends a few default policies that can be implemented to secure workloads and to apply the to and from functionality for more fine-grained network control.
Drop Network Capabilities from Privileged Workloads
When managing security contexts, users can control the Linux capabilities of running containers. When setting a pod’s security context, the default behavior should be
spec: privileged: false allowPrivilegeEscalation: false
This means the container cannot escalate to get root access and cannot request or set new capabilities. However, there may be cases in which a container requires root access. To ensure this is done safely, we can monitor and shut down capabilities that the root user will not need. CAP_NET_RAW and CAP_NET_ADMIN should be dropped whenever they are not required as they allow for unfettered access to the network
RKE allows for a very smooth upgrade of Kubernetes components. With the config file, all of the cluster images, and their versions, are defined. To upgrade any cluster components, simply change the file’s cluster image and use the command line’s rke up command.
Although this workflow seems easy, it is essential to understand the dependencies of each image. For example, upgrading the core Kubernetes components without upgrading the CNI may cause network capabilities to be lost or system containers to fail.
RKE allows for an entire upgrade strategy to be declarative, with specified rules for managing node downtime. As users upgrade their highly available cluster, they can determine the unavailable resources considered allowable during the upgrade process. That way, Rancher will not continue the upgrade if you have a compatibility issue, for example:
upgrade_strategy: max_unavailable_worker: 10% max_unavailable_controlplane: 1
The upgrade strategy above would ensure that a single control plane node is upgraded while maintaining the quorum.
There is an upgrade strategy feature for most Kubernetes core components and add-ons. Before moving forward to production, make sure to set the update strategy that works best for your team and make declarative adjustments accordingly.
Audit logging is used to identify and record API events in a Kubernetes environment. Organizations can use audit logs to monitor any changes or requests to the API server and act accordingly. Like most monitoring systems, there will be many logs and information discerned from a cluster. Therefore, it is up to administrators to set the audit policy to watch and highlight any changes or access that they want to escalate.
With RKE, audit logs are enabled by default in the cluster. However, it is always good practice to declare policy directly in the cluster.yml file.
services: kube-api: audit_log: enabled: true
Administrators can also set flags for the audit policy, including the number of backups, the time to save backups, the file format, and more.
services: kube-api: audit_log: enabled: true configuration: max_age: 6 max_backup: 6 max_size: 110 path: /var/log/kube-audit/audit-log.json format: json
Typically, the audit policy configuration is set up using a series of flags passed to the kube-apiserver and a configuration file for what changes administrators wish to monitor. Below is an example of a policy file that captures pods logs and status changes at the metadata level.
services: kube-api: audit_log: enabled: true configuration: max_age: 6 max_backup: 6 max_size: 110 path: /var/log/kube-audit/audit-log.json format: json policy: apiVersion: audit.k8s.io/v1 kind: Policy rules: # Log "pods/log", "pods/status" at Metadata level - level: Metadata resources: - group: "" resources: ["pods/log", "pods/status"]
Be aware of all of the configuration options and policies to ensure that the team is getting actionable and using audit logs.