Blogs

Dockerized Kognitio Part 4

This is part 4 of a series of articles about running Kognitio in Dockerized environments. You can find the first part here. In part 3 of this series you saw how to create Kognitio clusters using docker containers, but up until now all of the examples were on a single server. The real power of Kognitio is the ability to connect the resources from multiple servers together, aggregating their compute power and exposing it to the user as a single Kognitio server. You could achieve this by creating a number of servers running Docker, manually starting containers on each, and connecting the container networks together to make a Kognitio server, but this would be difficult and time consuming. This is the sort of problem that container orchestration technologies were created to solve.

Kubernetes Walkthrough

In this article we’re going to use the Kubernetes container orchestration technology to run Kognitio on a cluster of servers. We’re going to be using Amazon’s managed Kubernetes (EKS) to create a Kubernetes cluster and Amazon’s Elastic Filesystem (EFS) for storage because these products are easy to get up and running but nothing about the Kognitio setup will be Amazon specific. You can get Kognitio running on any Kubernetes environment provided you have the proper shared storage available.

In this article I’m going to interact with AWS and Kubernetes using command line tools. There are other methods available to achieve the same (The AWS Console, for example) but command lines are easier to put in a blog to show what’s going on. The command line tools in this article can be run from any client you choose to put them on. For convenience, and to keep the example clean, I’m going to spin up a helper node in AWS and run them on that. The helper node is optional, you could just as easily run these commands directly on a laptop for example.

Getting Started

The first thing I need to do is set up a client with the command line tools. I’m going to do this with a helper node running in AWS (t2.nano instance running Amazon Linux 2). The client needs:

  • The Amazon AWS CLI tool.
  • The ‘eksctl’ command line tool for managing EKS clusters.
  • The ‘kubectl’ command line tool for managing Kubernetes clusters.

The first of these is pre-installed on Amazon’s linux, but this needs to be upgraded because the version on AWS Linux is too old for the eksctl command to work properly. The others can be installed by following the AWS instructions: https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html

In my case this means running these commands on my helper node:

sudo yum install python-pip
sudo pip install awscli --upgrade

curl --silent --location "https://github.com/weaveworks/eksctl/releases/download/latest_release/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp
curl --silent -o /tmp/kubectl https://amazon-eks.s3-us-west-2.amazonaws.com/1.14.6/2019-08-22/bin/linux/amd64/kubectl
chmod +x /tmp/kubectl
sudo cp /tmp/kubectl /usr/local/bin
sudo cp /tmp/eksctl /usr/local/bin

Finally, I run ‘aws configure’ to specify the AWS keys which allow the clients to run services in AWS in my account and to tell AWS the region I want to launch things in.

Creating a Kubernetes cluster

Now we need to create a Kubernetes cluster, which we will do using the ‘eksctl’ command. This command creates a simple Kubernetes cluster using r4.2xlarge nodes (each has 64G RAM and 8 cores) as the worker nodes. I’ll start with a single node in the cluster and scale it later to use more nodes. I also limit the worker nodes to being in a single availability zone within AWS because Kognitio works best when all nodes in a cluster are in one availability zone. Our storage volume will span availability zones so we have reliability — for a production cluster we could create 3 node groups and have new containers created in these to continue in the event of a zone failure.

Normally a production EKS cluster would use a configuration file to specify the cluster details but this is just a quick way to get something going.

$ eksctl create cluster --name my-kube \
                        --version 1.14  \
                        --region eu-west-1 \
                        --node-zones eu-west-1a \
                        --nodegroup-name kognitio-nodes \
                        --node-type r4.2xlarge \
                        --nodes-max 20 \
                        --nodes-min 1 \
                        --nodes 1 \
                        --node-ami auto
[i]  using region eu-west-1
[i]  setting availability zones to [eu-west-1c eu-west-1a eu-west-1b]
[i]  subnets for eu-west-1c - public:192.168.0.0/19 private:192.168.96.0/19
[i]  subnets for eu-west-1a - public:192.168.32.0/19 private:192.168.128.0/19
[i]  subnets for eu-west-1b - public:192.168.64.0/19 private:192.168.160.0/19
[i]  nodegroup "kognitio-nodes" will use "ami-0497f6feb9d494baf" [AmazonLinux2/1.14]
[i]  using SSH public key "/home/ec2-user/.ssh/authorized_keys" as "eksctl-my-kube-nodegroup-kognitio-nodes-90:e1:eb:4a:20:49:17:af:3e:39:bf:d0:71:b6:33:6c" 
[i]  using Kubernetes version 1.14
[i]  creating EKS cluster "my-kube" in "eu-west-1" region
[i]  will create 2 separate CloudFormation stacks for cluster itself and the initial nodegroup
[i]  if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=eu-west-1 --name=my-kube'
[i]  CloudWatch logging will not be enabled for cluster "my-kube" in "eu-west-1"
[i]  you can enable it with 'eksctl utils update-cluster-logging --region=eu-west-1 --name=my-kube'
[i]  2 sequential tasks: { create cluster control plane "my-kube", create nodegroup "kognitio-nodes" }
[i]  building cluster stack "eksctl-my-kube-cluster"
[i]  deploying stack "eksctl-my-kube-cluster"
[i]  building nodegroup stack "eksctl-my-kube-nodegroup-kognitio-nodes"
[i]  deploying stack "eksctl-my-kube-nodegroup-kognitio-nodes"
[√]  all EKS cluster resource for "my-kube" had been created
[√]  saved kubeconfig as "/home/ec2-user/.kube/config"
[i]  adding role "arn:aws:iam::12345678910111213:role/eksctl-my-kube-nodegroup-kognitio-NodeInstanceRole-1XTUWSZVPMGIH" to auth ConfigMap
[i]  nodegroup "kognitio-nodes" has 0 node(s)
[i]  waiting for at least 1 node(s) to become ready in "kognitio-nodes"
[i]  nodegroup "kognitio-nodes" has 1 node(s)
[i]  node "ip-192-168-45-40.eu-west-1.compute.internal" is ready
[i]  kubectl command should work with "/home/ec2-user/.kube/config", try 'kubectl get nodes'
[√]  EKS cluster "my-kube" in "eu-west-1" region is ready

Once this completes I can see if it’s working by running this:

$ kubectl get nodes
NAME                                           STATUS   ROLES    AGE   VERSION
ip-192-168-63-191.eu-west-1.compute.internal   Ready    <none>   62m   v1.14.6-eks-5047ed

Which should, correctly, show a single worker node.

Persistent Volumes In Kubernetes

Kognitio requires that the same volume be mounted in all docker containers within a cluster. This was easy to achieve for the single node case because I could just use a docker volume, but for Kubernetes this is harder. In Kubernetes clusters the contents of the individual worker nodes are considered ephemeral, meaning that the data stored on them can be lost. This can happen if a node fails or even, in the case of EKS, if a node group is resized down and that removes a worker. Kubernetes can also put the cluster’s containers on different nodes from one another and we still need all containers to have access to the same persistent volume.

Kubernetes supplies the answer to these requirements in the form of a pluggable volume system. This allows a wide range of storage technologies to be deployed while abstracting away the details of whatever underlying storage technology is in use. When starting a container in Kubernetes, the docker volumes are replaced with Kubernetes persistent volumes. These can be created using any of the available storage drivers (see the list here https://kubernetes.io/docs/concepts/storage/volumes/) or an addon driver you supply. When containers are created, a mapping is specified which mounts the volume inside the container’s file namespace.

Persistent Volume Considerations 

Once mounted, this behaves like the docker volume in our earlier example. Mounting happens within Kubernetes so the individual containers do not need to know anything about which underlying technology is in use.

For Kognitio containers, a ‘ReadWriteMany’ volume is required. This is a type of volume which allows all containers to read and write all files and see changes made by other containers. Another important consideration is scalability. Kognitio clusters can have a lot of containers and you need a backend which doesn’t bottleneck when all of the containers access the storage volume at once.  You would not, for example, want to use a single, small NFS server to serve up the mount to a large number of containers.

Creating An EFS Filesystem

If you already use Kubernetes, you will probably have a preferred persistent volume type which you use in your environment. Some environments offer a natural choice here (Containers on MapR using MapRFS for example). For this walkthrough I’m going to use Amazon’s EFS filesystem, which provides a serverless NFS volume which scales to fit the data put in it and which can provide fast IO to a large number of containers. This is a reasonably natural choice for an AWS based environment. I’ll name the filesystem mycluster-storage and create it like this:

$ aws efs create-file-system --creation-token mycluster-storage --tags Key=Name,Value=mycluster-storage --region eu-west-1
{
    "SizeInBytes": {
        "ValueInIA": 0, 
        "ValueInStandard": 0, 
        "Value": 0
    }, 
    "Name": "mycluster-storage", 
    "CreationToken": "mycluster-storage", 
    "Encrypted": false, 
    "Tags": [
        {
            "Value": "mycluster-storage", 
            "Key": "Name"
        }
    ], 
    "CreationTime": 1568810891.0, 
    "PerformanceMode": "generalPurpose", 
    "FileSystemId": "fs-ff76a434", 
    "NumberOfMountTargets": 0, 
    "LifeCycleState": "creating", 
    "OwnerId": "12345678910111213", 
    "ThroughputMode": "bursting"
}

Now I need this filesystem to be mountable within the worker nodes in my Kubernetes cluster. I need to create a ‘mount’ for the filesystem in EFS and make this available inside the EKS cluster. I’m going to do this by creating my mount using the VPC and security group EKS already created for the worker nodes. In this way, nodes added later when EKS is scaled up will also have access to the storage volumes automatically.

So I need to get the AWS security group for the EKS cluster’s nodes and the subnet ID for the private subnet for the node group I’m using. First, we need to find out the security group ID for the security group the EKS workers use to communicate:

$ aws ec2 describe-security-groups --filters Name=tag:Name,Values=eksctl-my-kube-cluster/ClusterSharedNodeSecurityGroup |grep GroupId |tail -1
            "GroupId": "sg-06a2f7b9b4f378345"

$ aws ec2 describe-subnets --filters Name=tag:Name,Values=eksctl-my-kube-cluster/SubnetPrivateEUWEST1A |grep SubnetId
            "SubnetId": "subnet-00312dcc51a5579bb", 

(note EUWEST1A here — this is because I used the eu-west-1a zone for the nodegroup when defining the EKS cluster above).

Now I can create an EFS mount inside the private subnet for our node group:

$aws efs create-mount-target --file-system-id fs-ff76a434 \
               --subnet-id subnet-00312dcc51a5579bb \
               --security-group sg-06a2f7b9b4f378345
{
    "MountTargetId": "fsmt-9e3ef956", 
    "NetworkInterfaceId": "eni-0b524750e47ee78b2", 
    "FileSystemId": "fs-ff76a434", 
    "LifeCycleState": "creating", 
    "SubnetId": "subnet-00312dcc51a5579bb", 
    "OwnerId": "1234567890111213", 
    "IpAddress": "192.168.157.55"
}

Later, when I want to remove everything, I will need to remove the EFS volume mount before removing the node group or EKS cluster. If I don’t the EKS cluster removal will fail because I added a dependency EKS doesn’t know about.

Creating the persistent storage volume

Now I can create a persistent volume for our Kognitio cluster. This will use the inbuilt NFS storage driver to mount the EFS volume in each container. Create a persistent volume using the ‘kubectl’ command like this:

kubectl create -f - <<EOF
apiVersion: v1
kind: PersistentVolume
metadata:
  name: mycluster-volume
spec:
  capacity:
    storage: 1000Gi
  accessModes:
    - ReadWriteMany
  nfs:
    server: fs-ff76a434.efs.eu-west-1.amazonaws.com
    path: "/"
EOF

Which creates the volume in Kubernetes. Note the filesystem ID output from the commands above is being used here. In order to make this available to containers, Kubernetes requires us to also create a persistent volume claim like this:

kubectl create -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mycluster-storage
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: ""
  resources:
    requests:
      storage: 1000Gi
EOF

Which is what we specify when creating containers.

Starting Containers

Everything is now ready to deploy the containers. This is done with a Kubernetes ‘Deployment’ entity which creates and manages the containers. Deployments in Kubernetes define a whole application by specifying how to build containers and the number of each type of container to build. In our example, we only want one type of container — the Kognitio docker image. The number of ‘replicas’ of each container is the Kubernetes way of saying how many of these containers we want to have.

As with the single node example, we need to create a single container first, use it to initialise the shared storage, and then expand the cluster to use multiple containers. I’ve used 50G as a RAM limit for each container so it comfortably fits into a 64G worker node with some space left over for other containers. I can define the Kubernetes deployment using a single container like this:

$kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kognitio-mycluster
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kognitio-mycluster
  template:
    metadata:
      labels:
        app: kognitio-mycluster
    spec:
      containers:
      - name: kognitio-mycluster-db
        image: kognitio/kognitio
        resources:
            limits:
               memory: "50Gi"
        ports:
          - name: odbc
            containerPort: 6550
        volumeMounts:
            - name: mycluster-storage
              mountPath: "/data"
      volumes:
      - name: mycluster-storage
        persistentVolumeClaim:
          claimName: mycluster-storage
EOF
deployment.apps/kognitio-mycluster created

And I can check the status of the deployment with the kubectl get pods command, waiting until STATUS is Running:

$ kubectl get pods
NAME                                  READY   STATUS              RESTARTS   AGE
kognitio-mycluster-67955f6964-28ljt   0/1     ContainerCreating   0          7s
$ kubectl get pods
NAME                                  READY   STATUS    RESTARTS   AGE
kognitio-mycluster-67955f6964-28ljt   1/1     Running   0          19s

Now I can run the kognitio-cluster-init command inside the container. Instead of using the docker command I have to do this with the ‘kubectl exec’ command and the container name from above like this:

$ kubectl exec -ti kognitio-mycluster-67955f6964-28ljt kognitio-cluster-init
You must accept the Kognitio End User License Agreement to use Kognitio
Would you like to see this now (type 'yes' to see it)?
no
Do you accept the Kognitio EULA?
Enter 'yes' to accept or anything else to reject
yes
You have accepted the EULA.
Enter a system ID for your Kognitio cluster 
(up to 12 characters, [a-z0-9_]) :
mycluster
Enter the number of storage volumes to create.
You will be asked for the size of each volume later.
Storage volumes are sparse files which hold the data stored in Kognitio.
4
Enter the maximum storage size in Gb for each storage volume.
The minimum limit for a storage volume is 10G.
100
Enter a Kognitio license key for the cluster.
If you do not have a license key enter - instead.
Kognitio can be freely used without a license for clusters with
less than 512G of RAM.
-
Kognitio WX2 Service Controller v8.02.03-rel190923 on kognitio
(c)Copyright Kognitio Ltd 2001-2019.

Service System management daemon was not running, not stopping
Kognitio WX2 Service Controller v8.02.03-rel190923 on mycluster
(c)Copyright Kognitio Ltd 2001-2019.

Starting System management daemon:    OK.
Creating volume 0
Kognitio WX2 Disk UID block updater v8.02.03-rel190923
(c)Copyright Kognitio Ltd 2010-2019.

UID BLOCK ON /data/dfs/disk0 IS NOT VALID.
WARNING:  numsecs is currently ignored by WX2 in most cases
About to write UID block to /data/dfs/disk0
    UID     : NEW_DISK_0
    SYSTEMID: mycluster
    SECTORS : 209715200
    SECSIZE : 512
    ZEROED  : 1
    TAG     : 1465402450
    CHECKSUM  : 1458772269/0 (VALID)
Creating volume 1
Kognitio WX2 Disk UID block updater v8.02.03-rel190923
(c)Copyright Kognitio Ltd 2010-2019.

UID BLOCK ON /data/dfs/disk1 IS NOT VALID.
WARNING:  numsecs is currently ignored by WX2 in most cases
About to write UID block to /data/dfs/disk1
    UID     : NEW_DISK_1
    SYSTEMID: mycluster
    SECTORS : 209715200
    SECSIZE : 512
    ZEROED  : 1
    TAG     : 1465402450
    CHECKSUM  : 1458706733/0 (VALID)
Creating volume 2
Kognitio WX2 Disk UID block updater v8.02.03-rel190923
(c)Copyright Kognitio Ltd 2010-2019.

UID BLOCK ON /data/dfs/disk2 IS NOT VALID.
WARNING:  numsecs is currently ignored by WX2 in most cases
About to write UID block to /data/dfs/disk2
    UID     : NEW_DISK_2
    SYSTEMID: mycluster
    SECTORS : 209715200
    SECSIZE : 512
    ZEROED  : 1
    TAG     : 1465402450
    CHECKSUM  : 1458641197/0 (VALID)
Creating volume 3
Kognitio WX2 Disk UID block updater v8.02.03-rel190923
(c)Copyright Kognitio Ltd 2010-2019.

UID BLOCK ON /data/dfs/disk3 IS NOT VALID.
WARNING:  numsecs is currently ignored by WX2 in most cases
About to write UID block to /data/dfs/disk3
    UID     : NEW_DISK_3
    SYSTEMID: mycluster
    SECTORS : 209715200
    SECSIZE : 512
    ZEROED  : 1
    TAG     : 1465402450
    CHECKSUM  : 1458575661/0 (VALID)
Kognitio WX2 Configuration Manager v8.02.03-rel190923 on mycluster
(c)Copyright Kognitio Ltd 2001-2019.

Completed.
WARNING:  Changes will not take effect until you restart the SMD.
Kognitio WX2 System Controller v8.02.03-rel190923 on mycluster
(c)Copyright Kognitio Ltd 2001-2019.

WXSERVER:  SMD is exiting for restart.
WXSERVER:  Connection closed by smd.

Starting The Cluster

Now, all we need to do in order to create the whole Kognitio cluster is change the deployment to use more replicas. First, I need to run ‘eksctl’ to add more nodes to the cluster and ‘kubectl get nodes’ to check that the nodes were added properly:

$ eksctl scale nodegroup --cluster my-kube --nodes 4 kognitio-nodes 
[i]  scaling nodegroup stack "eksctl-my-kube-nodegroup-kognitio-nodes" in cluster eksctl-my-kube-cluster
[i]  scaling nodegroup, desired capacity from 1 to 4
$ kubectl get nodes
NAME                                           STATUS   ROLES    AGE   VERSION
ip-192-168-45-179.eu-west-1.compute.internal   Ready    <none>   47s   v1.14.6-eks-5047ed
ip-192-168-61-236.eu-west-1.compute.internal   Ready    <none>   44s   v1.14.6-eks-5047ed
ip-192-168-63-157.eu-west-1.compute.internal   Ready    <none>   46m   v1.14.6-eks-5047ed
ip-192-168-63-203.eu-west-1.compute.internal   Ready    <none>   45s   v1.14.6-eks-5047ed

Now, I can scale the deployment by changing the number of replicas from 1 to 4:

$ kubectl scale deployment.v1.apps/kognitio-mycluster --replicas=4
deployment.apps/kognitio-mycluster scaled
$ kubectl get deployments
NAME                 READY   UP-TO-DATE   AVAILABLE   AGE
kognitio-mycluster   1/4     4            1           5m36s
$ kubectl get deployments
NAME                 READY   UP-TO-DATE   AVAILABLE   AGE
kognitio-mycluster   4/4     4            4           6m3s
$ kubectl get pods
NAME                                  READY   STATUS    RESTARTS   AGE
kognitio-mycluster-67955f6964-28ljt   1/1     Running   0          6m10s
kognitio-mycluster-67955f6964-lc89d   1/1     Running   0          39s
kognitio-mycluster-67955f6964-xqtkq   1/1     Running   0          39s
kognitio-mycluster-67955f6964-xz7tk   1/1     Running   0          39s

And we are ready to create our database. Again, this is the same as it was in the simple example in part 3 but using the kubectl command instead of the docker command to initiate it. I can pick a container, run wxprobe on it to check that all containers are linked, then run kognitio-create-database:

$ kubectl exec kognitio-mycluster-67955f6964-xqtkq -- wxprobe -H
Kognitio WX2 Hardware Discovery Tool v8.02.03-rel190923 on mycluster
(c)Copyright Kognitio Ltd 2001-2019.

WX2 system has: 4 nodes in 1 group.
Disk resources: 428G in 4 disks.
System has 1 unique type of node.
System has 1 unique type of disk.
System RAM 205G, 205G for data processing.
32 CPUs available for data processing.

Detected node classes:
   full: 4 nodes

Detected Operating platforms:
   Linux-4.14.138-114.102.amzn2.x86_64: 4 nodes

$ kubectl exec -ti kognitio-mycluster-67955f6964-xqtkq kognitio-create-database
Creating Database
Kognitio WX2 System Administration Tool v8.02.03-rel190923 on mycluster
(c)Copyright Kognitio Ltd 2004-2019.



-----------------------------------------
Enter the sys password for the new database
(ctrl-c goes back, type ? for help)? mysyspassword


-----------------------------------------
Enter the system id to confirm that this is the correct system
(ctrl-c goes back, type ? for help)? mycluster


-----------------------------------------
WX2 system has: 4 nodes in 1 group.
Disk resources: 428G in 4 disks.
System has 1 unique type of node.
System has 1 unique type of disk.
System RAM 205G, 205G for data processing.
32 CPUs available for data processing.

Detected node classes:
   full: 4 nodes

Detected Operating platforms:
   Linux-4.14.138-114.102.amzn2.x86_64: 4 nodes

About to erase database data and reset to defaults.
Enter to continue or ctrl-c to abort.
: 
Logging startup to startup.T_2019-09-23_17:27:14_UTC.
   -->  Cleaning up unwanted files/processes.
   -->  No processes stopped on one or more nodes.
   -->  Examining system components.
   -->  Configuring WX2 software.
Generation results:
WARNING: A shared memory pool on a node is too small for a PMA.  Memory re-use on restart will be disabled.
WARNING: If you are new to Kognitio ignore this for now and come back to it later.
WARNING: To fix this see the quickstart guide section on expanding /dev/shm.
   -->  Initialising internal storage.
   -->  Initialising Database.
   -->  Creating newsys.sql new system script
   -->  Building new database 
   -->  Logging build to newsys.T_2019-09-23_17:27:58_UTC
   -->  Creating base system tables
   -->  Creating system groups
   -->  Populating base system tables
           What follows is a snapshot of work being performed (it is not exhaustive).
           17:28:59 Currently performing action:  CREATE TABLE "SYS"."IPE_ALLLOGIN" ("USER_ID" INTEG
   -->  Setting privileges and statistics for base system tables
   -->  Set system default character set to CHARSET_LATIN1
   -->  Setting up virtual tables
   -->  Set up ODBC schema
   -->  Creating information tables
   -->  Creating import/export tables
   -->  Setting up logging and control tables
   -->  Creating system views
   -->  Creating external data browsing table/view
   -->  Creating system table comments
   -->  Set up IPE_ALLQUEUES
   -->  Loading default plugins
   -->  Finishing database configuration
           17:30:22 Currently performing action:  CREATE TABLE "SYS"."IPE_PROHIBIT_TEXT" ("PROHIBIT"
           17:30:23 Currently performing action:  CREATE TABLE "SYS"."IPE_TYPE_TEXT" ("TYPE" CHAR(1)
           17:30:24 Currently performing action:  CREATE TABLE "SYS"."IPE_SQLTABLETYPES3" ("TABLE_TY
           17:30:25 Currently performing action:  CREATE TABLE "SYS"."IPE_FIELD" ("FMT" CHAR(32) CHA
   -->  Replicating newsys logs to all nodes.
Syncing filename <logdir>/newsys.T_2019-09-23_17:27:58_UTC
Syncing filename <logdir>/.log_newsys
Startup complete.  SERVER IS NOW OPERATIONAL.

Note the – – before wxprobe in the first kubectl command above. Kubectl exec is a little different from docker exec and it needs a —  to tell it to stop processing arguments which start with a – and instead pass them onto the command you are running in the container.

Running Some Queries

Now, as in the previous example, we can run a few queries on the server:

$ kubectl exec -ti kognitio-mycluster-67955f6964-xqtkq -- wxsubmit -s localhost sys -p mysyspassword
Kognitio WX2 SQL Submission Tool v8.02.03-rel190923
(c)Copyright Kognitio Ltd 1992-2019.

Connected to localhost ODBC Version 8.02.03-rel190923 Server Version 08.02.0003
>SELECT os_node_name FROM sys.ipe_nodeinfo;
OS_NODE_NAME
kognitio-mycluster-67955f6964-xz7tk
kognitio-mycluster-67955f6964-xqtkq
kognitio-mycluster-67955f6964-lc89d
kognitio-mycluster-67955f6964-28ljt
Query           1               4 rows     ----   0:00.0   0:00.0   0:00.0
>select sum(value) from values between 1 and 1000000000;
          SUM(VALUE)
  500000000500000000
Query           2                1 row     ----   0:00.0   0:00.7   0:00.7
>select sum(cast(value as float)) from values between 1 and 10000000000;
SUM(CAST(VALUE AS FLOA
 5.00000000009679e+019
Query           3                1 row     ----   0:00.0   0:10.1   0:10.1
>TotalQueries    3         Session Time   0:32.3     ----     ----     ----

As before, I listed the container names and added up the first billion numbers. Notice how this goes significantly faster in this example than the one from the Docker blog post – that’s because I have four nodes working on the problem now instead of just one. Very small queries have a startup overhead so you don’t really see the benefits of scaling out in this way, so I ran a third query, adding up the first 10 billion numbers as floats, to have something that takes a little longer to run. Later, we will scale the cluster up further to see how the performance improves with more nodes.

Connecting To The Cluster

We now have a running Kognitio cluster which we can access by running commands on a container. But for a real deployment we will need to connect to the cluster with ODBC or JDBC clients. In the previous example, I did this by exporting a port from the docker container onto the host network and connecting to the host’s IP. On Kubernetes however, this is impractical because scaling events and node failures can replace containers and add new ones, meaning that our cluster is not running on any fixed host. Clients using the host IP would need updating whenever one of these events happens.

Kubernetes solves this with ‘services’. A Kubernetes service is an entity which presents ports from a collection of containers externally so that other entities can connect to them. There are various different kinds of service which can export ports in different ways, but the simplest kind of service is the default ‘ClusterIP’ service. This service creates an IP address unique to the service and forwards connections to that IP into endpoints registered with the service. A selector is used to automatically register all containers in our deployment with the service. It looks like this:

$ kubectl apply -f - <<EOF
apiVersion: v1
kind: Service
metadata:
  name: mycluster-service
spec:
  selector:
    app: kognitio-mycluster
  ports:
    - protocol: TCP
      port: 9000
      targetPort: 6550
EOF
service/mycluster-service created
$ kubectl describe service mycluster-service
Name:              mycluster-service
Namespace:         default
Labels:            <none>
Annotations:       kubectl.kubernetes.io/last-applied-configuration:
                     {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"mycluster-service","namespace":"default"},"spec":{"ports":[{"port...
Selector:          app=kognitio-mycluster
Type:              ClusterIP
IP:                10.100.163.87
Port:              <unset>  9000/TCP
TargetPort:        6550/TCP
Endpoints:         192.168.45.87:6550,192.168.49.128:6550,192.168.53.225:6550 + 1 more...
Session Affinity:  None
Events:            <none>

This service created IP 10.100.163.87 for the Kognitio cluster and redirected port 9000 to the containers which make up the service. Connections to 10.100.163.87:9000 will go to a container chosen from the list of Endpoints above. Whenever the deployment adds or removes containers for the cluster, Kubernetes will automatically update the list of places the service redirects to. You may have spotted that 10.100.163.87 is not an internet IP address. That’s because the ClusterIP service exports the service as an IP address that other containers can access. We will deal with exporting the service outside the Kubernetes cluster later.

Testing The Service

To test the service we can create a client container. The easiest way to do this is to launch another container with the Kognitio image which is not part of the cluster and run ‘wxsubmit’ from there. We can launch the container with the ‘kubectl run’ command:

$ kubectl run --generator=run-pod/v1 client --image=kognitio/kognitio 
pod/client created
$ kubectl get pod client
NAME     READY   STATUS    RESTARTS   AGE
client   1/1     Running   0          23s

And test the service with wxsubmit like this:

$ kubectl exec -ti client -- wxsubmit -s 10.100.163.87:9000 sys -p mysyspassword
Kognitio WX2 SQL Submission Tool v8.02.03-rel190923
(c)Copyright Kognitio Ltd 1992-2019.

Connected to 10.100.163.87:9000 ODBC Version 8.02.03-rel190923 Server Version 08.02.0003
>select system_id from sys.ipe_boot;
SYSTEM_ID
mycluster
Query           1                1 row     ----   0:00.1   0:00.1   0:00.1
>select count(*) from sys.ipe_nodeinfo;
            COUNT(*)
                   4
Query           2                1 row     ----   0:00.0   0:00.0   0:00.0
>quit;
TotalQueries    2         Session Time   0:11.4     ----     ----     ----

Connecting From Outside The Cluster

The service above shows you how to export the Kognitio cluster to other Kubernetes containers, but what do you do when you want to export your Kognitio service outside the cluster? Kubernetes services provide many options for this, and I’m not going to try to explain them all here because the solution you use will depend on your environment. I will just show you one possible way to export a service.

The logical solution is to use a ‘LoadBalancer‘ service. This is a type of service which creates an internet IP for the service and does the same mapping as the ClientIP service above. The LoadBalancer service is core to Kubernetes but it has different implementations for different environments under the hood. This means it’s behaviour will be different depending on the cloud environment you run on and different again for an on-premise Kubernetes cluster. For AWS, this creates an AWS Elastic Load Balancer with a public IP.

The service looks like this:

$ kubectl apply -f - <<EOF
apiVersion: v1
kind: Service
metadata:
  name: mycluster-external-service
spec:
  selector:
    app: kognitio-mycluster
  ports:
    - protocol: TCP
      port: 9000
      targetPort: 6550
  type: LoadBalancer
EOF
service/mycluster-external-service created
[ec2-user@ip-10-2-1-24 ~]$ kubectl describe service mycluster-external-service
Name:                     mycluster-external-service
Namespace:                default
Labels:                   <none>
Annotations:              kubectl.kubernetes.io/last-applied-configuration:
                            {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"mycluster-external-service","namespace":"default"},"spec":{"ports...
Selector:                 app=kognitio-mycluster
Type:                     LoadBalancer
IP:                       10.100.99.215
LoadBalancer Ingress:     a7857ae54dee211e9bf6406c2f8c2448-453274030.eu-west-1.elb.amazonaws.com
Port:                     <unset>  9000/TCP
TargetPort:               6550/TCP
NodePort:                 <unset>  30424/TCP
Endpoints:                192.168.45.87:6550,192.168.49.128:6550,192.168.53.225:6550 + 1 more...
Session Affinity:         None
External Traffic Policy:  Cluster
Events:
  Type    Reason                Age   From                Message
  ----    ------                ----  ----                -------
  Normal  EnsuringLoadBalancer  9s    service-controller  Ensuring load balancer
  Normal  EnsuredLoadBalancer   7s    service-controller  Ensured load balancer

Notice how the service looks very similar to the ‘ClusterIP’ service above, and in fact we can treat this like the service above when connecting from another container if we want to. The extra ‘LoadBalancer Ingres’ value above is the internet address of the service. We can use this from anywhere. So back on the helper node I can grab the Kognitio client tools and make a connection from there (or you can make a connection from any other Kognitio client):

$ wget https://kognitio.com/forum/latest_linuxclients.tar.gz
2019-09-24 15:47:57 (8.32 MB/s) - \u2018latest_linuxclients.tar.gz\u2019 saved [29531522/29531522]
$ tar -xzf latest_linuxclients.tar.gz 
$ ./bin/wxsubmit -s a7857ae54dee211e9bf6406c2f8c2448-453274030.eu-west-1.elb.amazonaws.com:9000 -p mysyspassword
Kognitio WX2 SQL Submission Tool v8.01.99-s180711
(c)Copyright Kognitio Ltd 1992-2018.

Connected to a7857ae54dee211e9bf6406c2f8c2448-453274030.eu-west-1.elb.amazonaws.com:9000 ODBC Version 8.01.99-s180711 Server Version 08.02.0003
>select system_id from sys.ipe_boot;
SYSTEM_ID
mycluster
Query           1                1 row     ----   0:00.1   0:00.1   0:00.1
>TotalQueries    1         Session Time   0:15.9     ----     ----     ----

Securing The Service

Exposing the client port to everwhere on the internet generally isn’t a very good idea for obvious reasons. In production you would want to do some further configuration for the service, ideally you would create it before creating replicas within the cluster and secure it first. To configure this you need to manage the Elastic Load Balancer object in AWS directly. You can do this by taking the first part of the client endpoint (up to the dash) and using the AWS client tools to examine and modify it. So, for example, to get the ELB’s security group:

$ aws elb describe-load-balancers --load-balancer-names a7857ae54dee211e9bf6406c2f8c2448 |grep 'sg-'
                "sg-02c7f114f76e374a3"

And I can then edit that security group in AWS console (or wherever) to place limits on the IP addresses which can connect to the load balancer. Another useful tuning option is the Idle Timeout, which is the number of seconds a client connection can idle before the load balancer terminates it. To avoid disconnections, change it from the default 1 minute to an hour like this:

$ aws elb modify-load-balancer-attributes --load-balancer-name a7857ae54dee211e9bf6406c2f8c2448 --load-balancer-attributes ConnectionSettings={IdleTimeout=3600}
{
    "LoadBalancerAttributes": {
        "ConnectionSettings": {
            "IdleTimeout": 3600
        }
    }, 
    "LoadBalancerName": "a7857ae54dee211e9bf6406c2f8c2448"
}

Scaling The Server

Now I’m going to scale Kognitio to run on 10 nodes instead of the 4 we have here. This is a three step process:

  • Scale the node group using eksctl to make space available.
  • Scale the deployment in Kubernetes to grow the server. This is done by changing the replica count for the deployment.
  • Run ‘wxserver start’ on a container to reconfigure Kognitio.

And it looks like this:

$ eksctl scale nodegroup --cluster my-kube --nodes 10 kognitio-nodes 
[i]  scaling nodegroup stack "eksctl-my-kube-nodegroup-kognitio-nodes" in cluster eksctl-my-kube-cluster
[i]  scaling nodegroup, desired capacity from 4 to 10
$ kubectl get nodes
NAME                                           STATUS   ROLES    AGE   VERSION
ip-192-168-33-65.eu-west-1.compute.internal    Ready    <none>   64s   v1.14.6-eks-5047ed
ip-192-168-40-41.eu-west-1.compute.internal    Ready    <none>   67s   v1.14.6-eks-5047ed
ip-192-168-45-179.eu-west-1.compute.internal   Ready    <none>   21m   v1.14.6-eks-5047ed
ip-192-168-48-236.eu-west-1.compute.internal   Ready    <none>   66s   v1.14.6-eks-5047ed
ip-192-168-60-134.eu-west-1.compute.internal   Ready    <none>   57s   v1.14.6-eks-5047ed
ip-192-168-60-180.eu-west-1.compute.internal   Ready    <none>   67s   v1.14.6-eks-5047ed
ip-192-168-61-236.eu-west-1.compute.internal   Ready    <none>   21m   v1.14.6-eks-5047ed
ip-192-168-63-157.eu-west-1.compute.internal   Ready    <none>   66m   v1.14.6-eks-5047ed
ip-192-168-63-203.eu-west-1.compute.internal   Ready    <none>   21m   v1.14.6-eks-5047ed
ip-192-168-63-234.eu-west-1.compute.internal   Ready    <none>   66s   v1.14.6-eks-5047ed
$ kubectl scale deployment.v1.apps/kognitio-mycluster --replicas=10
deployment.apps/kognitio-mycluster scaled
$ kubectl get pods
NAME                                  READY   STATUS    RESTARTS   AGE
kognitio-mycluster-67955f6964-24hk8   1/1     Running   0          51s
kognitio-mycluster-67955f6964-28ljt   1/1     Running   0          26m
kognitio-mycluster-67955f6964-h45vq   1/1     Running   0          51s
kognitio-mycluster-67955f6964-jq6zs   1/1     Running   0          51s
kognitio-mycluster-67955f6964-kdrjk   1/1     Running   0          51s
kognitio-mycluster-67955f6964-lc89d   1/1     Running   0          21m
kognitio-mycluster-67955f6964-njjdd   1/1     Running   0          51s
kognitio-mycluster-67955f6964-wp98f   1/1     Running   0          51s
kognitio-mycluster-67955f6964-xqtkq   1/1     Running   0          21m
kognitio-mycluster-67955f6964-xz7tk   1/1     Running   0          21m
$ kubectl exec kognitio-mycluster-67955f6964-h45vq -- wxprobe -H
Kognitio WX2 Hardware Discovery Tool v8.02.03-rel190923 on mycluster
(c)Copyright Kognitio Ltd 2001-2019.

WX2 system has: 10 nodes in 1 group.
Disk resources: 428G in 4 disks.
System has 1 unique type of node.
System has 1 unique type of disk.
System RAM 512G, 512G for data processing.
80 CPUs available for data processing.

Detected node classes:
   full: 10 nodes

Detected Operating platforms:
   Linux-4.14.138-114.102.amzn2.x86_64: 10 nodes

$ kubectl exec kognitio-mycluster-67955f6964-h45vq wxserver start
Kognitio WX2 System Controller v8.02.03-rel190923 on mycluster
(c)Copyright Kognitio Ltd 2001-2019.

Logging startup to startup.T_2019-09-23_17:46:04_UTC.
   -->  Cleaning up unwanted files/processes.
   -->  No processes stopped on one or more nodes.
   -->  Examining system components.
   -->  Configuring WX2 software.
Generation results:
WARNING: Unable to recover images as no memory images detected for rscore.
WARNING: Memory image set not valid.  Rebuilding images instead.
WARNING: A shared memory pool on a node is too small for a PMA.  Memory re-use on restart will be disabled.
WARNING: If you are new to Kognitio ignore this for now and come back to it later.
WARNING: To fix this see the quickstart guide section on expanding /dev/shm.
   -->  Initialising Database.
   -->  Loading system tables, user tables and view images
Completed crimage in 00:00:53.
Startup complete.  SERVER IS NOW OPERATIONAL.

And now we can repeat the queries from above on our bigger system:

$ kubectl exec -ti kognitio-mycluster-67955f6964-xqtkq -- wxsubmit -s localhost sys -p mysyspassword
Kognitio WX2 SQL Submission Tool v8.02.03-rel190923
(c)Copyright Kognitio Ltd 1992-2019.

Connected to localhost ODBC Version 8.02.03-rel190923 Server Version 08.02.0003
>SELECT os_node_name FROM sys.ipe_nodeinfo;
OS_NODE_NAME
kognitio-mycluster-67955f6964-xz7tk
kognitio-mycluster-67955f6964-lc89d
kognitio-mycluster-67955f6964-xqtkq
kognitio-mycluster-67955f6964-wp98f
kognitio-mycluster-67955f6964-njjdd
kognitio-mycluster-67955f6964-kdrjk
kognitio-mycluster-67955f6964-jq6zs
kognitio-mycluster-67955f6964-h45vq
kognitio-mycluster-67955f6964-24hk8
kognitio-mycluster-67955f6964-28ljt
Query           1              10 rows     ----   0:00.1   0:00.1   0:00.1
>select sum(value) from values between 1 and 1000000000;
          SUM(VALUE)
  500000000500000000
Query           2                1 row     ----   0:00.1   0:00.4   0:00.4
>select sum(cast(value as float)) from values between 1 and 10000000000;
SUM(CAST(VALUE AS FLOA
 5.00000000010821e+019
Query           3                1 row     ----   0:00.1   0:04.5   0:04.5
>TotalQueries    3         Session Time   0:33.3     ----     ----     ----

Notice how the second query is now faster than it was with 4 nodes, and the final query now runs 2.5 times faster, which is what you would expect when going from 4 to 10 nodes.

Next Steps

If you’re an existing user or have a project you’re working on you probably know what you want to do once your Kognitio cluster is up and running. You can find a good overview of Kognitio’s SQL and approach here:

https://kognitio.com/documentation/latest/kognitio-for-sql-experts.html

If you are new to Kognitio and just want to poke about and see what it can do, you might like to try working through the getting started guide on our documentation site here:

https://kognitio.com/documentation/latest/getstarted/kog.html

Leave a Reply

Your email address will not be published nor used for any other purpose. Required fields are marked *