Blogs

Dockerized Kognitio Part 3

This is part 3 of a series of posts about Kognitio on Docker.  You can find the first part here and the previous post here.  This post will walk through the process of creating a Kognitio cluster from Kognitio’s Docker image. The system I’ve used is a CentOS 7.5 system with Docker installed but with no special configuration settings. My system is connected to the internet for running ‘docker pull’. Your Docker environment may vary, in which case you might have to adapt these steps appropriately.

Cluster Specifications

For my example I’m running a system with 8 cores and 64G of RAM. I’m going to create 4 containers with 15G of RAM in each one in order to leave a little for the host OS. I will be using ‘mycluster’ as the system ID for the cluster and creating 4 x 100G data volumes in my persistent storage volume. Since I’m just running Docker on a single node, the persistent storage volume will just be a regular Docker volume. The node I’m running on doesn’t have 400G of disk storage, but the Kognitio data volumes are sparse files so this doesn’t actually matter unless I try to put 400G of data onto the cluster at once.

Pulling The Kognitio Docker Image

I need to pull the Docker container image from the Docker Hub onto my node before I can create Kognitio containers. To do this I run the ‘docker pull’ command to download the container image. Kognitio’s container is named ‘kognitio/kognitio’ so I run:

$ docker pull kognitio/kognitio
Using default tag: latest
latest: Pulling from kognitio/kognitio
d8d02d457314: Pull complete 
b2841a79bce7: Pull complete 
4e9b14ee5925: Pull complete 
1288091557de: Pull complete 
4b5bc1d4459d: Pull complete 
20bb9955609c: Pull complete 
c36fecd43b68: Pull complete 
2d9dd47c8bd2: Pull complete 
cb1c8c74c688: Pull complete 
Digest: sha256:27689682a9196547da9e48ea1b5ff033337384cec8f91ba3a95d954e40fb9a85
Status: Downloaded newer image for kognitio/kognitio:latest
docker.io/kognitio/kognitio:latest

Kognitio’s image is public so you don’t need to log in to pull it. Docker may ask you to log in if you have previously logged in and your login has timed out, in which case you need to either log in or run ‘docker logout’. You can see the image ready for use like this:

$ docker image ls kognitio/kognitio
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
kognitio/kognitio   latest              9415f6177997        29 hours ago        1.39GB

Creating The Network

I’m going to use a private Docker network for my Kognitio cluster. You don’t have to do this, but it is recommended that you not use the ‘host’ networking option and a private network is a good way to isolate a Kognitio cluster. We will create a network called ‘mycluster-network’:

$ docker network create mycluster-network
6f65eb74b4d6b9463f5ebcca12f2fe96b68fb62e5de91b398f138d6817fa4fef

Creating The Persistent Volume

Now I can create my persistent volume using the ‘docker volume create’ command. My volume will be a Docker volume named ‘mycluster-datavol’ so:

$ docker volume create mycluster-datavol
mycluster-datavol

Now, I need to initialise this volume by creating a Kognitio container. I’m going to keep this container running after initialisation and make it the first container in the final cluster so I’ll set it up as I want the final containers to be set up. I will name this one ‘mycluster1’ and give it 15G of memory as above. We also disable swap because Kognitio is an in-memory product which doesn’t behave very well with swap.  Setting the memory-swap to the memory limit in docker turns swap off for a container. Finally, I’m going to use -d to detach and run the container in the background. The ‘docker exec’ will be used to communicate with it.

$ docker run --mount source=mycluster-datavol,destination=/data \
--name mycluster1 --memory 15G --memory-swap 15G \
-d --network mycluster-network kognitio/kognitio
337dde7f84c08577a40d2424d8a7f6f4d9f8fdbd4a21ca6d75cea2c3118a920e

Then we can see it running with ‘docker ps’:

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
337dde7f84c0        kognitio/kognitio   "/bin/sh -c /opt/kog\u2026"   36 seconds ago      Up 36 seconds       6550/tcp            mycluster1

Now we can initialise the persistent volume with ‘docker exec’. We need the -ti argument because this is an interactive command.

$ docker exec -ti mycluster1 kognitio-cluster-init
You must accept the Kognitio End User License Agreement to use Kodoop
Would you like to see this now (type yes to see it)?
no
Do you accept the Kognitio EULA?
Enter 'yes' to accept or anything else to reject
yes
You have accepted the EULA.
Enter a system ID for your Kognitio cluster 
(up to 12 characters, [a-z0-9_]) :
mycluster
Enter the number of storage volumes to create.
You will be asked for the size of each volume later.
Storage volumes are sparse files which hold the data stored in Kognitio.
4
Enter the maximum storage size in Gb for each storage volume.
The minimum limit for a storage volume is 10G.
100
Enter a Kognitio license key for the cluster.
If you do not have a license key enter - instead.
Kognitio can be freely used without a license for clusters with
less than 512G of RAM.
-
Kognitio WX2 Service Controller v8.02.03-rel190726 on kognitio
(c)Copyright Kognitio Ltd 2001-2019.

Service System management daemon was not running, not stopping
Kognitio WX2 Service Controller v8.02.03-rel190726 on mycluster
(c)Copyright Kognitio Ltd 2001-2019.

Starting System management daemon:    OK.
Creating volume 0
Kognitio WX2 Disk UID block updater v8.02.03-rel190726
(c)Copyright Kognitio Ltd 2010-2019.

UID BLOCK ON /data/dfs/disk0 IS NOT VALID.
WARNING:  numsecs is currently ignored by WX2 in most cases
About to write UID block to /data/dfs/disk0
    UID     : NEW_DISK_0
    SYSTEMID: mycluster
    SECTORS : 209715200
    SECSIZE : 512
    ZEROED  : 1
    TAG     : 1465402450
    CHECKSUM  : 1458772269/0 (VALID)
Creating volume 1
Kognitio WX2 Disk UID block updater v8.02.03-rel190726
(c)Copyright Kognitio Ltd 2010-2019.

UID BLOCK ON /data/dfs/disk1 IS NOT VALID.
WARNING:  numsecs is currently ignored by WX2 in most cases
About to write UID block to /data/dfs/disk1
    UID     : NEW_DISK_1
    SYSTEMID: mycluster
    SECTORS : 209715200
    SECSIZE : 512
    ZEROED  : 1
    TAG     : 1465402450
    CHECKSUM  : 1458706733/0 (VALID)
Creating volume 2
Kognitio WX2 Disk UID block updater v8.02.03-rel190726
(c)Copyright Kognitio Ltd 2010-2019.

UID BLOCK ON /data/dfs/disk2 IS NOT VALID.
WARNING:  numsecs is currently ignored by WX2 in most cases
About to write UID block to /data/dfs/disk2
    UID     : NEW_DISK_2
    SYSTEMID: mycluster
    SECTORS : 209715200
    SECSIZE : 512
    ZEROED  : 1
    TAG     : 1465402450
    CHECKSUM  : 1458641197/0 (VALID)
Creating volume 3
Kognitio WX2 Disk UID block updater v8.02.03-rel190726
(c)Copyright Kognitio Ltd 2010-2019.

UID BLOCK ON /data/dfs/disk3 IS NOT VALID.
WARNING:  numsecs is currently ignored by WX2 in most cases
About to write UID block to /data/dfs/disk3
    UID     : NEW_DISK_3
    SYSTEMID: mycluster
    SECTORS : 209715200
    SECSIZE : 512
    ZEROED  : 1
    TAG     : 1465402450
    CHECKSUM  : 1458575661/0 (VALID)
Kognitio WX2 Configuration Manager v8.02.03-rel190726 on mycluster
(c)Copyright Kognitio Ltd 2001-2019.

Completed.
WARNING:  Changes will not take effect until you restart the SMD.
Kognitio WX2 System Controller v8.02.03-rel190726 on mycluster
(c)Copyright Kognitio Ltd 2001-2019.

WXSERVER:  SMD is exiting for restart.
WXSERVER:  Connection closed by smd.

And now the volume is initialised and we are ready to make the rest of the containers.

Starting The Containers

We are going to start 3 more containers named mycluster2, mycluster3 and mycluster4. The last container, mycluster4, is going to export port 6550 from the container to the host to allow Kognitio clients to connect via port 9000 on the host.

$ docker run --mount source=mycluster-datavol,destination=/data --name mycluster2 \
--memory 15G --memory-swap 15G \
-d --network mycluster-network kognitio/kognitio
8d040206ef0ef1b8db8b8862547e207eb5e43bb02923a4d156b20b5ac0cb108c
$ docker run --mount source=mycluster-datavol,destination=/data --name mycluster3 \
--memory 15G --memory-swap 15G \
-d --network mycluster-network kognitio/kognitio
2a617a4ae06e9de5f23c8c573422298abe3dff9ae19ee57a211f760813bf5a12
$ docker run --mount source=mycluster-datavol,destination=/data --name mycluster4 \
--memory 15G --memory-swap 15G \
-d --network mycluster-network -p 9000:6550 kognitio/kognitio
358e3ee6a2fcf2892d26b9712bf18c9ce6bf168ad68340cba972aeb03c055d27

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                    NAMES
358e3ee6a2fc        kognitio/kognitio   "/bin/sh -c /opt/kog\u2026"   2 minutes ago       Up 2 minutes        0.0.0.0:9000->6550/tcp   mycluster4
2a617a4ae06e        kognitio/kognitio   "/bin/sh -c /opt/kog\u2026"   2 minutes ago       Up 2 minutes        6550/tcp                 mycluster3
8d040206ef0e        kognitio/kognitio   "/bin/sh -c /opt/kog\u2026"   2 minutes ago       Up 2 minutes        6550/tcp                 mycluster2
337dde7f84c0        kognitio/kognitio   "/bin/sh -c /opt/kog\u2026"   8 minutes ago       Up 8 minutes        6550/tcp                 mycluster1

Building A Database

Now I have the containers created, I can build the database. First, we use wxprobe to check that all the containers have linked up properly:

$ docker exec mycluster2 wxprobe -H
Kognitio WX2 Hardware Discovery Tool v8.02.03-rel190726 on mycluster
(c)Copyright Kognitio Ltd 2001-2019.

WX2 system has: 4 nodes in 1 group.
Disk resources: 428G in 4 disks.
System has 1 unique type of node.
System has 1 unique type of disk.
System RAM 61.4G, 61.4G for data processing.
32 CPUs available for data processing.

Detected node classes:
   full: 4 nodes

Detected Operating platforms:
   Linux-3.10.0-1062.1.1.el7.x86_64: 4 nodes

I picked the ‘mycluster2’ container to run this on, but it doesn’t matter which container you run it against. Note that the disk size is different from the 400G specified earlier (4x100G volumes).  This is because the size is given as 100G hex (1024*1024*1024 is a G) but reporting is in decimal (1000*1000*1000 is a G) for historical reasons.

To create a kognitio database I can run kognitio-create-database <mysyspassword>:

$ docker exec -ti mycluster3 kognitio-create-database mysyspassword
Kognitio WX2 System Administration Tool v8.02.03-rel190726 on mycluster
(c)Copyright Kognitio Ltd 2004-2019.



-----------------------------------------


-----------------------------------------
Enter the system id to confirm that this is the correct system
(ctrl-c goes back, type ? for help)? mycluster


-----------------------------------------
WX2 system has: 4 nodes in 1 group.
Disk resources: 428G in 4 disks.
System has 1 unique type of node.
System has 1 unique type of disk.
System RAM 61.4G, 61.4G for data processing.
32 CPUs available for data processing.

Detected node classes:
   full: 4 nodes

Detected Operating platforms:
   Linux-3.10.0-1062.1.1.el7.x86_64: 4 nodes

About to erase database data and reset to defaults.
Enter to continue or ctrl-c to abort.
: 
Logging startup to startup.T_2019-09-17_17:13:39_UTC.
   -->  Cleaning up unwanted files/processes.
   -->  No processes stopped on one or more nodes.
   -->  Examining system components.
   -->  Configuring WX2 software.
Generation results:
WARNING: A shared memory pool on a node is too small for a PMA.  Memory re-use on restart will be disabled.
WARNING: If you are new to Kognitio ignore this for now and come back to it later.
WARNING: To fix this see the quickstart guide section on expanding /dev/shm.
WARNING: Using 10G links with MTU 1500.  Jumbo frames recommended.
   -->  Initialising internal storage.
   -->  Initialising Database.
   -->  Creating newsys.sql new system script
   -->  Building new database 
   -->  Logging build to newsys.T_2019-09-17_17:14:18_UTC
   -->  Creating base system tables
   -->  Creating system groups
   -->  Populating base system tables
           What follows is a snapshot of work being performed (it is not exhaustive).
           17:15:11 Currently performing action:  CREATE TABLE "SYS"."IPE_ALLLOGIN" ("USER_ID" INTEG
   -->  Setting privileges and statistics for base system tables
   -->  Set system default character set to CHARSET_LATIN1
   -->  Setting up virtual tables
   -->  Set up ODBC schema
   -->  Creating information tables
   -->  Creating import/export tables
   -->  Setting up logging and control tables
   -->  Creating system views
   -->  Creating external data browsing table/view
   -->  Creating system table comments
   -->  Set up IPE_ALLQUEUES
   -->  Loading default plugins
   -->  Finishing database configuration
           17:15:45 Currently performing action:  CREATE TABLE "ODBC"."IPE_TYPEINFO" ("LOCALE" INTEG
   -->  Replicating newsys logs to all nodes.
Syncing filename <logdir>/newsys.T_2019-09-17_17:14:18_UTC
Syncing filename <logdir>/.log_newsys
Startup complete.  SERVER IS NOW OPERATIONAL.

And now I have a working cluster. The shared memory warning is because shared memory mapping is not configured, which is not a serious issue and is outside the scope of this blog. The server can work around this. The MTU warning above is because Kognitio prefers to use Jumbo frames. It is not necessary to fix this, particularly for a single node system, but for extra performance you can enable jumbo frames for Docker networks by setting the ‘mtu’ setting in /etc/docker/daemon.json to 9000. For now, however, we can ignore these two warnings and run some queries.

Running Some Queries

Lets test it by listing the containers in the cluster. Then, just for fun, we’ll use it to add up the first 1 billion integers:

$ docker exec -ti mycluster4 wxsubmit -s localhost sys -p albatros
Kognitio WX2 SQL Submission Tool v8.02.03-rel190726
(c)Copyright Kognitio Ltd 1992-2019.

Connected to localhost ODBC Version 8.02.03-rel190726 Server Version 08.02.0003
>SELECT os_node_name FROM sys.ipe_nodeinfo;
OS_NODE_NAME
8d040206ef0e
337dde7f84c0
358e3ee6a2fc
2a617a4ae06e
Query           1               4 rows     ----   0:00.0   0:00.0   0:00.0
>select sum(value) from values between 1 and 1000000000;
          SUM(VALUE)
  500000000500000000
Query           2                1 row     ----   0:00.0   0:01.6   0:01.6
>quit;
TotalQueries    2         Session Time   0:09.9     ----     ----     ----

At this point the system is ready to connect clients and run more queries.  You would do this by pointing a Kognitio ODBC/JDBC client at port 9000 of the host system (exported by the mycluster4 container).  

For detailed instructions on connecting client tools to Kognitio, see the documentation here: https://kognitio.com/documentation/latest/access/access.html.

System Management

So now we have a working Kognitio system and can start to use it. For routine maintenance we can use the Docker commands to start/stop/manage the containers. We can also use the Kognitio commands via ‘docker exec’. So to check for problems I can run:

$ docker exec mycluster1 wxprobe
Kognitio WX2 Hardware Discovery Tool v8.02.03-rel190726 on mycluster
(c)Copyright Kognitio Ltd 2001-2019.

No problems found.

If I want to stop the containers, I can do so with the ‘docker stop’ command:

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                    NAMES
358e3ee6a2fc        kognitio/kognitio   "/bin/sh -c /opt/kog\u2026"   21 minutes ago      Up 21 minutes       0.0.0.0:9000->6550/tcp   mycluster4
2a617a4ae06e        kognitio/kognitio   "/bin/sh -c /opt/kog\u2026"   21 minutes ago      Up 21 minutes       6550/tcp                 mycluster3
8d040206ef0e        kognitio/kognitio   "/bin/sh -c /opt/kog\u2026"   21 minutes ago      Up 21 minutes       6550/tcp                 mycluster2
337dde7f84c0        kognitio/kognitio   "/bin/sh -c /opt/kog\u2026"   27 minutes ago      Up 27 minutes       6550/tcp                 mycluster1
$ docker stop mycluster1 mycluster2 mycluster3 mycluster4
mycluster1
mycluster2
mycluster3
mycluster4
$ docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

And the server has been stopped. To bring the server back I can use ‘docker start’ followed by a ‘wxserver start’:

$ docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
$ docker start mycluster1 mycluster2 mycluster3 mycluster4
mycluster1
mycluster2
mycluster3
mycluster4
$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                    NAMES
358e3ee6a2fc        kognitio/kognitio   "/bin/sh -c /opt/kog\u2026"   23 minutes ago      Up 2 seconds        0.0.0.0:9000->6550/tcp   mycluster4
2a617a4ae06e        kognitio/kognitio   "/bin/sh -c /opt/kog\u2026"   23 minutes ago      Up 3 seconds        6550/tcp                 mycluster3
8d040206ef0e        kognitio/kognitio   "/bin/sh -c /opt/kog\u2026"   24 minutes ago      Up 3 seconds        6550/tcp                 mycluster2
337dde7f84c0        kognitio/kognitio   "/bin/sh -c /opt/kog\u2026"   29 minutes ago      Up 3 seconds        6550/tcp                 mycluster1
$ docker exec mycluster2 wxprobe -H
Kognitio WX2 Hardware Discovery Tool v8.02.03-rel190726 on mycluster
(c)Copyright Kognitio Ltd 2001-2019.

WX2 system has: 4 nodes in 1 group.
Disk resources: 428G in 4 disks.
System has 1 unique type of node.
System has 1 unique type of disk.
System RAM 61.4G, 61.4G for data processing.
32 CPUs available for data processing.

Detected node classes:
   full: 4 nodes

Detected Operating platforms:
   Linux-3.10.0-1062.1.1.el7.x86_64: 4 nodes

$ docker exec mycluster2 wxserver start
Kognitio WX2 System Controller v8.02.03-rel190726 on mycluster
(c)Copyright Kognitio Ltd 2001-2019.

Logging startup to startup.T_2019-09-17_17:31:18_UTC.
   -->  Cleaning up unwanted files/processes.
   -->  Examining system components.
   -->  Configuring WX2 software.
Generation results:
WARNING: Unable to recover images as no memory images detected for rscore.
WARNING: Memory image set not valid.  Rebuilding images instead.
WARNING: A shared memory pool on a node is too small for a PMA.  Memory re-use on restart will be disabled.
WARNING: If you are new to Kognitio ignore this for now and come back to it later.
WARNING: To fix this see the quickstart guide section on expanding /dev/shm.
WARNING: Using 10G links with MTU 1500.  Jumbo frames recommended.
   -->  Initialising Database.
   -->  Loading system tables, user tables and view images
Completed crimage in 00:00:23.
Startup complete.  SERVER IS NOW OPERATIONAL.

Rebuilding The Cluster

Because all containers in the system are ephemeral, I can completely remove all 4 containers and replace them with a single bigger one and all of my data will be preserved. First, lets create some simple data:

$ docker exec -ti mycluster4 wxsubmit -s localhost sys -p albatros
Kognitio WX2 SQL Submission Tool v8.02.03-rel190726
(c)Copyright Kognitio Ltd 1992-2019.

Connected to localhost ODBC Version 8.02.03-rel190726 Server Version 08.02.0003
>create schema data;
Query           1             Complete     ----   0:00.0   0:00.0   0:00.0
>create table data.test(f1 int);
Query           2             Complete     ----   0:00.0   0:00.0   0:00.0
>insert into data.test values(100), (200), (300);
Query           3      3 Rows Inserted     ----   0:00.1   0:00.1   0:00.1
>select * from data.test;
         F1
        100
        200
        300
Query           4               3 rows     ----   0:00.0   0:00.0   0:00.0

Now, lets completely remove all the containers:

$ docker stop mycluster1 mycluster2 mycluster3 mycluster4
mycluster1
mycluster2
mycluster3
mycluster4
$ docker rm mycluster1 mycluster2 mycluster3 mycluster4
mycluster1
mycluster2
mycluster3
mycluster4
$ docker ps --all
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

The containers are completely gone. But the data volume is still there and it still has all the data from the Kognitio cluster. So all we need to do is create a new container and attach it to the persistent volume to access it again:

$ docker run --mount source=mycluster-datavol,destination=/data --name mycluster5 \
--memory 60G --memory-swap 60G \
-d --network mycluster-network -p 9000:6550 kognitio/kognitio
a2c753a7384aee62727293bb7633eecd828b407f380b758ce5c9b56c1b733288
$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                    NAMES
a2c753a7384a        kognitio/kognitio   "/bin/sh -c /opt/kog\u2026"   10 seconds ago      Up 10 seconds       0.0.0.0:9000->6550/tcp   mycluster5
$ docker exec mycluster5 wxserver start
Kognitio WX2 System Controller v8.02.03-rel190726 on mycluster
(c)Copyright Kognitio Ltd 2001-2019.

Logging startup to startup.T_2019-09-17_17:43:12_UTC.
   -->  Cleaning up unwanted files/processes.
   -->  No processes stopped on one or more nodes.
   -->  Examining system components.
   -->  Configuring WX2 software.
Generation results:
WARNING: Unable to recover images as no memory images detected for rscore.
WARNING: Memory image set not valid.  Rebuilding images instead.
WARNING: A shared memory pool on a node is too small for a PMA.  Memory re-use on restart will be disabled.
WARNING: If you are new to Kognitio ignore this for now and come back to it later.
WARNING: To fix this see the quickstart guide section on expanding /dev/shm.
WARNING: Using 10G links with MTU 1500.  Jumbo frames recommended.
   -->  Initialising Database.
   -->  Loading system tables, user tables and view images
Completed crimage in 00:00:22.
Startup complete.  SERVER IS NOW OPERATIONAL.

And the data is still there:

$ docker exec -ti mycluster5 wxsubmit -s localhost sys -p albatros
Kognitio WX2 SQL Submission Tool v8.02.03-rel190726
(c)Copyright Kognitio Ltd 1992-2019.

Connected to localhost ODBC Version 8.02.03-rel190726 Server Version 08.02.0003
>select * from data.test;
         F1
        100
        200
        300
Query           1               3 rows     ----   0:00.0   0:00.0   0:00.0

Next Steps

If you’re an existing user or have a project you’re working on you probably know what you want to do next.  You can find a good overview of Kognitio’s SQL and approach here:

https://kognitio.com/documentation/latest/kognitio-for-sql-experts.html

If you are new to Kognitio and just want to poke about and see what it can do, you might like to try working through the getting started guide on our documentation site here:

https://kognitio.com/documentation/latest/getstarted/kog.html

Part 4

This post has shown you how to use Docker directly in a single-node environment. Part 4 will demonstrate how to use Kubernetes, a popular container orchestration technology, to run Dockerized Kognitio on a multi-node cluster.

Leave a Reply

Your email address will not be published nor used for any other purpose. Required fields are marked *