Blogs

Dockerized Kognitio Part 2

Creating a Dockerized Kognito cluster

In part 1 of this series I introduced Kognitio’s Docker container image. In this part I’ll provide an overview of the process of creating a Docker-based Kognitio cluster from it.  

You will need:

  • A Docker environment to run containers in.

  • To be able to create a Kognitio container image based on our image definition. If you have internet access you can pull the image directly from the docker hub (in many environments this will happen automatically if you just specify kognitio/kognitio as the container image for a container). If you don’t have internet access, you can either pull the image and transfer it or build your own container image.  Clone our public repository at https://github.com/kognitio-ltd/kognitio-docker-container if you need a base to build your own images from.

The Kognitio System ID

All Kognitio clusters must have a system ID. If you have a Kognitio license key that you want to use then this key will already specify a system ID to create your system with. If you are running without a license key you can specify any system ID. This is a string of characters up to 12 bytes long which identifies the cluster. Valid characters are [a-z0-9_].

Persistent storage

Before building a cluster you need to create a persistent storage volume to store the cluster’s data.  The Kognitio container image is ephemeral and without this you will lose data whenever a container terminates. Kognitio uses a single persistent storage volume mounted on /data and shared between all containers.

The persistent volume must be a read-write mount which is fully accessible by every container in the cluster.  In Kubernetes, for example, this is called a ReadWriteMany mount. The actual volume you create and the way you mount it will depend on your environment.

For most clusters, the persistent storage volume contains Kognitio data volumes in addition to the configuration data for the system. These are not to be confused with container volumes, they are files inside the persistent storage volume which store the internal Kognitio metadata tables as well as any data tables you create.

Before creating your persistent storage volume you need to decide on the number and size of the Kognitio data volumes to use.  Each volume needs to be 10G or larger, a good default size is 100G.  At any given time, a single volume will be accessed by a single Kognitio process running in one of the containers so you will usually want to have at least one volume for each container.  Volumes are created as sparse files, which means that they will not initially use the full amount of provisioned space on most platforms.

Storing Data Externally

As with all Kognitio deployment options, you can also create a Kognitio cluster which stores data externally instead of keeping it in the persistent volume (in Amazon EBS volumes, for example).  This mode of operation is outside the scope of this article and will be documented elsewhere.  In this case the persistent storage volume for the containers will be very small as it just contains configuration information.

Initialising Persistent Storage

The persistent storage volume needs to be initialised before it can be used by multiple containers at the same time. This is done by running a script to create the data volumes and build the necessary structures to link containers together into a cluster.

To initialise persistent storage, start a single container with the persistent storage volume mounted under /data. This can be one of the containers which will be part of the cluster or a temporary one which you terminate afterwards. Then run this command within a container interactively:

kognitio-cluster-init

This will ask you to accept the EULA before asking for the system ID and data volume size/count. Once this is complete you are ready to start the rest of the containers for your cluster.

Creating Containers

The commands used to create containers for the cluster will depend on your environment. There are some considerations you need to bear in mind when creating them:

  • The same persistent volume needs to be mounted on /data in all containers in the Kognitio cluster. This must not be mounted in any other Kognitio clusters.

  • Kognitio will use as much RAM as you allow it to so we recommend you always place a memory limit on your Kognitio containers! We also recommend setting the swap memory limit to the same value as the memory limit in order to disable swapping for your containers. Kognitio works best without swap.

  • The minimum memory required to run a Kognitio container is 4G. Containers with less than 16G of RAM may exhibit reduced functionality under heavy loads.

  • Every container in a Kognitio cluster needs to have full, unlimited network access to all of the others using TCP and UDP on IPv4. They will discover each others’ IP addresses and link up automatically. We recommend using a private network for a Kognitio cluster with one IP address per container. Using host based networking in docker with multiple Kognitio containers on a node is possible but not recommended.

  • External clients only need to be able to connect to port 6550 on one or more of the containers. We recommend that external traffic be filtered to exclude other traffic.

Creating a Database

Now, you can create the remaining containers to make a full cluster. Run this command inside one of the containers to check that everything is working:

wxprobe -H

This shows the number of containers (as the node count) and the total size of the Kognitio data volumes in use (it shows these as decimal Gb for historical reasons while they are specified in hex Gb so the size may be larger than you expect).  Depending on the container count, you may have to wait for up to a minute while the containers discover each other and link up.  Wait until the correct node count is shown before continuing.

Now you can create a new cluster by running this command inside one of the containers: 

kognitio-create-database <newsyspassword>

This will start the Kognitio software inside the containers and populate the necessary metadata tables to create a Kognitio system. You will want to run this interactively as this command will ask for confirmation before it proceeds. When the command finishes you have a running Kognitio system ready for use.

Connecting to your Cluster

Once the Kognitio cluster is running, you need to connect to it in order to run queries. The easiest way to do this is to run the ‘wxsubmit’ command interactively inside one of the docker containers like this:

wxsubmit -s localhost sys -p <newsyspassword>

This will put you into an interactive SQL session where you can run queries. You could try this query to get back a list of docker containers:

SELECT os_node_name FROM sys.ipe_nodeinfo

But most of the time you will want to connect external clients to the cluster. The docker containers export the Kognitio ODBC port on TCP port 6550. Clients can connect to this port on any of the docker containers using Kognitio’s client tools or JDBC/ODBC drivers which are available on our website. How you export this port will depend on your environment, for a simple docker-based setup you can give one of the docker run commands ‘-p 9000:6550’ to map port 9000 on the host into the docker container, then configure Kognitio ODBC/JDBC with port 9000 and the host’s IP address.

For detailed instructions on connecting client tools to Kognitio, see the documentation here: https://kognitio.com/documentation/latest/access/access.html.

At this point you have a working Kognitio cluster and you probably know what you want to do next. If you are new to Kognitio and just want to poke about and see what it can do, you might like to try working through the getting started guide on our documentation site here:

https://kognitio.com/documentation/latest/getstarted/kog.html

Routine Maintenance

Once the Kognitio cluster is up and running, users familiar with kognitio can administer it using the standard ‘wx’ commands you would use to adminster any other Kognitio cluster (‘wxviconf’ to edit configuration, ‘wxserver start’ to restart the server, ‘wxprobe’ to detect problems, etc). These need to be run inside one of the containers. See the documentation site at https://kognitio.com/documentation for full administration instructions.

The most important command to remember is ‘wxserver start’. You use this to restart the Kognitio services after a change to the number or size of the containers or after one or more containers have been restarted. This command reconfigures the Kognitio software to the current container configuration and makes the Database available.

Part 3

In Part 3 I will walk through the creation of a multi-node Kognitio cluster on a vanilla Docker environment.

Leave a Reply

Your email address will not be published nor used for any other purpose. Required fields are marked *