WiseAnalytics | Taming the Kaniko beast

Insights

Taming the Kaniko beast

9 min read

By Julien Kervizic

What is Kaniko

Kaniko is a tool developed by Google to help build docker container images in Kubernetes. It is an application written in Go, that doesn’t depend on a Docker daemon.

Why use Kaniko

There are multiple reasons why you might want to use Kaniko. Kaniko is a lightweight tool that doesn’t require as many permissions and privileges as Docker and doesn’t have a need to have a running Docker service, with which you communicate via sockets.

The increased focus on Kubernetes to increase security and isolation has. leads to a decreased number of options, on how to build containers. There used 3 main options prior to K8s v1.2+, DooD, DinD, and tools such as Kaniko. However now only DinD and leveraging similar tools to Kaniko is still possible. Gitlab notably recommends Kaniko for building containers on Kubernetes.

Container Reminder

Before jumping into the details of Kaniko, it is important to explain a bit how container images work.

Image-based containers

Containers that are based on images will typically be composed of multiple image layers:

A platform image typically built from “Scratch”
Additional layers containing changes

FROM scratch can be used to build a valid docker container. It will only include the necessary application without dependencies.

It can be used to build the Platform image.

RUN, COPY, ADD Instructions will create layers. FROM clause will likely add a few layers from the base image.

Running the command docker history — no-trunc {CONTAINER_ID} allows us to see what is contained within each layer of the container image:

OverlayFS

OverlayFS is the default storage driver for docker. It is a union filesystem — layering multiple directories and presenting them as a single view. This is represented single view is represented in the diagram below:

The upper layer is writable, while the lower layers are fully read-only. The diagram above shows 4 different types of cases handled by OverLayFS:

Existing File: The file exists in one of the lower layers, in this case, OverLayFS picks up the latest available file
New file: A new file not existing in any of the lower layers, has just been written. In this case, OverlayFS picks up the file from the upper dir.
Update: A file has been updated, in this case, OverLayFS picks up the file directly from the upper layer.
Whiteout: Whiteout happens when we want to delete a file, in this case, a whiteout file is created to indicate that we shouldn’t pick up this file on any of the lower levels.

Building images with Docker

Docker relies on two main components a Docker client and the Docker Daemon.

When building an image the docker client, sends the command and the build context to the docker daemon. The docker daemon is in charge of pulling the image and building the different layers.

The docker daemon is leveraging overlay fs through the process of building the image. Since with overlay fs, the upper layer contains all the changes that occur in this layer, packaging the changes and incorporating them as part of an image is straightforward.

It is worth noting that the Docker Daemon typically needs to run as root although there are now ways to run as non-root (rootless mode)

Building Containers on Kubernetes with Docker

Docker out of Docker (DooD)

Docker is getting deprecated as a container runtime as part of Kubernetes v1.2+, with this change, it is not practical to leverage a Docker out of Docker approach for building images. In the Docker out of Docker approach, a pod connects to the Docker daemon used by the Kubernetes worker. Since the workers no longer have the docker runtime as of K8s v1.2, this approach no longer works.

While the approach provided an easy path to leveraging Docker to build container images, it was facing a number of security issues, poor isolation, and was breaking Kubernetes scheduling.

Docker in Docker (DinD)

DinD provides a fairly easy path to building containers on Kubernetes, but these come with several drawbacks. Docker in Docker as it indicates relies on spawning children containers from within an existing container.

In terms of security, while Docker in Docker doesn’t require the same level of permissions as DooD, it still requires privileged container permissions. This means that the pod will have root access on the node it is running on and can allow it to execute arbitrary commands on the host as root. This is due to the Docker Daemon requiring root privileges to run (in most cases).

Besides the main problem of security, the Docker in Docker approach is also facing other challenges such as compatibility issues, potential data corruption, and performance issues. Regarding performance, the major issue with DinD tends to be the lack of Layer Caching when using ephemeral DinD containers, there are some workarounds, however (such as the one described by Applatix).

Sidecar Docker / Docker in Pod

With a docker in pod or Sidecar, a privileged Docker container runs the docker daemon (typically through the Docker-dind image) and exposes Docker’s REST API. The commands are sent from the clients to the privileged container for execution.

In this approach, the docker commands are isolated to the sidecar but still need to run in privileged mode. This provides a lower attack vector since the host is not exposed directly.

To use Docker In Docker on Kubernetes with a CI/CD tool such as Gitlab CI, it is necessary to register the agent/Runner to be able to run with privilege mode and enable the docker-in-docker service,e.g. docker:19.03.13-dind.

How Kaniko works

Kaniko parses the Dockerfile, retrieves the base image, and extracts the filesystem from the base image to a local directory (e.g./kaniko/{imagename}). It then creates different stages based on the docker file and converts the commands contained within it to shell commands. Kaniko directly modifies the local filesystem, executing these steps. Since Kaniko is meant to be running from within an existing container, this does not pose any issues, but if it was run directly from within a host, things could go bad.

It then proceeds to the process of snapshotting, creating a tarball of the change files, and adding files to a layer map or whiteout as needed. To determine if the files have been changed, kaniko leverages a hashing process including the modification time (mtime) of the file.

How to use Kaniko

Kaniniko needs to run inside a container, otherwise, they might be adverse consequences. The typical way that Kaniko is distributed is therefore through a container. Kaniko offers two container types, one base and one debug container.

Kaniko builds images through its “executor” application from within the container:

Kaniko offers two options when building a container, either provide a tarball of the image or push the image directly to a container registry. This can be specified using the — no-push flag of the Kaniko executor.

If needing to push the image to a container registry, it is necessary to add the registry to the credential store. For cloud registry such as AWS ECR, Kaniko incorporates credential helpers as part of its image. Setting up the credentials can then be done in such a way:

This all can be incorporated as part of a CI/CD job stage, such as Gitlab’s:

Issues from Kaniko

While only the official image is supported, it does however have issues and limitations that may require a more custom build. Kaniko’s Github has marked that issue as YMMV (your mileage may vary).

Pip issue:

Pip might not fully re-install some of the libraries already present in the userspace, but not used for building the container directory.

Pip will see the packages being already present in the user directories it will not look to download or modify the files for these packages. Since Kaniko relies on an indication of changes in the file through modtime it will not include these packages when snapshotting the file system and building the container’s layers.

The container will appear to be properly built, but the packages will however not be present. One easy way to get around this is to leverage the — force-reinstall or the --ignore-installed flags of pip.

RootFS

unpacking rootfs SIGGSEV

Caused by a bug on an old version of the Kaniko executor (1.6.0), upgrading to 1.8.x fixed the issue.

Sym links

If you don’t need UNIX mail, this can easily be worked around by converting the symlink with an empty directory:

C library version:

Kaniko installs the different libraries to be installed in user space (/usr/lib) and can cause conflict with the existing versions available as part of the base image. Since Kaniko, execute the different commands in the shell of the running container, it might end up leveraging the system library rather than the one available in userspace.

In these cases two courses of action are possible

1. Upgrade the base image to a version that includes the updated libraries (recommended) — although this might not always be possible.

2. Overwrite the libraries with the ones available from userspace.

Kaniko leverages a similar approach in its own container build step:

Optimizing build time with Kaniko

The segment store allows us to retain the history of the customers belonging to the different audiences. Segment stores allow for better manage the load on the downstream export flow. Either by providing a means to check the delta of audience memberships/segment attribution or through branching out the evaluation logic for customers belonging to the audience.

— snapshotmode (full, redo, and time): Takes into account different levels of file properties to decide whether or not to include them in a snapshot.
— single snapshot: Only does a final snapshot, instead of taking into account intermediate layer snapshotting.
— usenewrun: Kaniko provides an experimental implementation for detecting changes that improve build performance by up to 75%

Another reason why Kaniko might act slow is that there might be a lack of caching. Since Kaniko relies on short-lived containers, it would not typically cache the different layers in the local file system. Kaniko however does provides caching functionality, and one way to use them is through leveraging a container registry to cache the different layers.

Summary

Kaniko is a powerful tool that allows building container images safely in Kubernetes. However, it does require some specific knowledge on how to best leverage the tool to avoid pitfalls and slow performance.

‍