Below is the transcript of the video, condensed and modified for clarity.
Some of us are pushing Kubernetes at our organizations and some of us are getting Kubernetes pushed on us at our organizations. This marks a huge paradigm shift in infrastructure, the way that we manage software and applications, and the way that developers deploy their applications. When you think about DevOps, it’s every SREs dream to have developers manage their own applications but that means that they’re pushing code to production and we’re building pipelines for people to quickly develop and push code, and from a security standpoint, that makes me a little scared.
I’m a developer and I worked in infrastructure before I worked at StackRox and I don’t think I thought about security a single time. Basically, what I would do was develop code, make sure it worked on docker containers, push it, and hope for the best; and now looking at it from a security point-of-view, it probably wasn’t the best way to go. So, how do we build security into rolling out our software?
Gartner Report: Best Practices for Running Containers and Kubernetes in Production
Learn about Gartner's recommendations for securing your containers and Kubernetes before and during runtime.Download Now
Keep Kubernetes up to date
First and foremost, on Kubernetes hygiene, keep it up to date! As you may know, Kubernetes 1.13 deprecates etcd 3.0; if you’re still on 1.12 and a vulnerability comes out, your version won’t be patched because patches are built for the past three releases. The last place you want to be in is having to update your data backplane while trying to fix a vulnerability.
You don’t want to be stuck in this scenario where you’re trying to fix a critical vulnerability but also rolling out tons of new infrastructure. I think everyone can agree that infrastructure upgrades are as scary as they come.
So, in order to upgrade, you really want to follow the community and see what’s going on.
I love this Google group. They post a ton and you’ll get a bunch of information. Just make sure you’re up to date and you understand when new CVEs are discovered, whether or not they affect your organization, and what you can do to remediate them. This Google group provides amazing resources for understanding where your risk is highest in the cluster itself.
Lock down the API server and other infrastructure components
Make sure you are using infrastructure best practices. For example, make sure that your network access is firewalled off correctly. Make sure the Kubernetes API server (which is basically the entry point into all of your infrastructure) is locked down. (I know a lot of people will restrict it to a VPC or VPN. Just ensure that the traffic to the API server is protected.) Lastly, let’s make sure that the actual host itself is locked down.
Something really interesting that has come out very recently is ephemeral debugging containers. This gets me really excited because I know when you’re trying to debug networking in the Kubernetes cluster, it can be such a pain. The idea of these ephemeral clusters is to run debugging images alongside your pods, binding to the same namespaces, and allowing you to debug right there without having all the root privileges or having a container that’s constantly full of available network tools. (I think the world where you run one application inside a container with one process is sort of the dream but that’s been tough when you need to debug and actually try to test stuff and see if it still works.)
Use Role-Based Access Control (RBAC)
Your API server is locked down to your VPN but RBAC makes sure the people who you expect to use your API server are accessing the correct resources. DevOps has been doing this through continuous deployment systems. You push a lot of the RBAC into the continuous deployment system, which is usually hooked into some version control. Then you use your normal authentication through GitHub, you have an automatic audit log of what’s changing in your infrastructure, and then the only access point in terms of modifying workloads is in the continuous deployment system. In this way, you really have one service account authorized against your API server. Obviously, you still have people who need access to the cluster, you still need to debug and fix, but now since you have fewer users and groups, you have a much smaller number of people you need to manage in RBAC.
Use Namespaces to secure the workload
Containers and Kubernetes provide a new way of securing workloads because now you can look at a single application and scope down the surface area to exactly what that application uses and needs.
So, first and foremost, leverage namespaces! Don’t wait until you start seeing the default namespace full of 150 deployments where no one knows what belongs to whom, and no one can figure out exactly what each does, and there’s zero segmentation between any of them. It can be a real headache for everyone. Why else should you use Namespaces?
- They are great for resource usage tracking to see which team or tenant is using a lot of resources.
- They enable RBAC to be finely tuned, so you can have a certain namespace where someone just has access to those resources in that namespace.
- They allow use of generic network policies and segmentation, especially if you’re a SaaS provider and need to separate out your tenants.
- They make kubectl results more sane
- They make API responses significantly faster
Use Kubernetes context to quantify and minimize your risk
In general, the way I think about workload risk is in terms of the context provided by Kubernetes deployments, as represented by the below hexagon pattern.
It starts with building your image, which means looking at the dependencies, the packages, and most importantly analyzing the vulnerabilities to discover the “known bad”.
Then, you look at how the app is configured:
- What privileges does it have on Linux?
- What privileges does it have against the API server?
- Are there weak secrets or other sensitive data (API keys for registries, sensitive databases)?
- What labels and annotations are used? This is key because it allows you to to answer the question of who owns a given service (annotation of owner, email, or team) and is very valuable for debugging for operations. This also allows you to route the issue quickly to the right owner as opposed to playing a game of murder mystery and trying to figure out who owns this application and who deployed it.
Some of the most important configurations best practices that I highly recommend for containers and Kubernetes are as follows:
- Read-only root file system – this means no one can write to your file system. I present a demo of how you can thwart an attack using a read-only root file system beginning at 9:45 in the above video.
- Linux capabilities – use CAP DROP and CAP ADD to limit what Linux capabilities containers are allowed to run. I present a demo of how you can limit Linux capabilities beginning at 13:10 in the above video.
- Network policies – use network policies to limit ingress and egress network communication between Pods (east-west) and from outside to Pods (north-south). I present a demo of how you can use network policies to block an incoming attack beginning at 15:30 in the above video.
So, the last thing to always remember is that security is hard, more akin to a marathon than a sprint. There’s no such thing as perfect security; it’s always about monitoring, iterating, and making sure that the tools are available for people who are building the code to integrate it into their process and drive security.