Blogs

Dockerized Kognitio Part 1

Users have been asking us about running Kognitio in Docker containers for some time now.  These requests have become a lot more frequent lately with the buzz around the technology. We have created an official Kognitio docker container image to enable this, you can find it on the Docker Hub here:

https://hub.docker.com/r/kognitio/kognitio

In this series I’m going to talk about the container image and show you how to use it to make Dockerized Kognitio clusters. I’ll begin with an overview of the Docker container, our aims and the way it should be used. I will also walk through the process of creating a Dockerized  Kognitio system using vanilla Docker commands.  Finally I’ll take you through an example of creating a much larger Kognitio system on multi-node Kubernetes cluster.

What is Docker

Docker provides a neat way to wrap up a software stack together with all of its dependencies and package it as a single unit — the Docker container image. Containers are then created using this image and executed in an isolated environment, optionally with limits applied to the use of memory, CPU and other resources. Multiple containers can be connected together to form a complex application.  Additionally, container orchestration technologies can be used to run containerized applications spread out across clusters of worker nodes.

Why Docker

Dockerizing an application has a lot of benefits:

  • Minimal installation required — Since containers define a whole stack right down to the platform runtime, installation is just a case of importing or building the container image, defining a container and starting it. This makes deployment easy and generic — there is no need to install supporting software or perform a set of prerequisite steps specific to the software.

  • Portability — Lack of external dependencies makes Docker containers very portable. Kognitio’s docker container, for example, will run on Windows even though the Kognitio software in the container is Linux only.

  • One way to manage everything — Dockerized applications can all be started, stopped, suspended, backed up, etc using the same set of docker commands. This lowers the TCO for IT administration as administrators can manage multiple pieces of software without needing to learn the details of each one.

  • Cloud/datacenter agnostic — Docker containerized applications run just as well in the cloud as they do in on-premise datacenters. Many cloud providers have Docker orchestration products which will work in their environment. Kubernetes and other orchestration products allow the same methods to be used to run products in multiple cloud environments and outside the cloud.

  • Access to an ecosystem of technologies — Many other complementary technologies have been created around Docker, some of which I’ll discuss here. Dockerizing an application allows all of these tools to be used with it.

  • VM-like isolation with less overhead — In many ways a docker container is similar to a VM. However, while every VM has to run its own kernel, Docker containers all share a kernel which provides isolation using kernel features designed for this purpose. This has the advantage that containers can run with little or no performance penalty over native applications.

Creating the Kognitio Docker Image

Docker actually represents a whole family of products which have been built around containerization and container orchestration (running containers on farms of worker nodes). There are multiple container orchestration technologies (e.g. Kubernetes, Docker swarm), alternative ways to launch containers (e.g. podman), complementary technologies for backups, upgrades, etc and even alternative docker compatible runtimes like VMWare’s VIC.

When we designed our container image we wanted it to do things ‘the docker way’ in order to allow it to fit in with as much of the Docker ecosystem as possible, so:

  • Everything is done inside containers. When bringing Kognitio to other platforms we have sometimes used wrappers as an easy way to launch clusters. For Docker, all the commands required to create and manage a cluster are launched inside Docker containers. This allows the user to use the container image in environments where different launch/maintenance methods are in place (e.g. ‘kubectl exec’ instead of ‘docker exec’).

  • Containers are ephemeral. A persistent volume mounted in the containers stores all persistent data, metadata and configuration so everything else is disposable. Containers can be destroyed and new ones created to replace them without the user needing to worry about losing their data. This facilitates deployments on orchestration clusters where containers might get lost in the event of a node failure or sometimes even a scaling event.

  • The container image is open and extensible. The container definition files and all supporting scripts are available on our github repository (https://www.github.com/kognitio-ltd/kognitio-docker-container). Users can extend our image according to their requirements or use it as a blueprint for making alternative images as required.

  • The container image wraps around an unmodified Kognitio release. At the time of writing, the container is built from release 8.2.3-rel190726 (downloaded in the Dockerfile when building the image). This makes Kognitio on Docker completely compatible with Kognitio running in other places and allows existing users to substitute in the Kognitio version they are currently using elsewhere for maximum compatibility.

Part 2

In part 2 of this series I’ll talk in more detail about how to use the Kognitio container image to make clusters.

Leave a Reply

Your email address will not be published nor used for any other purpose. Required fields are marked *