Blogs

Kognitio on Kubernetes on GCP

Following on from Andy MacLeans’s series of blogs about running Kognitio in a containerised environment, I thought I’d try to replicate the process on some other platforms. The first platform I tried was Google Cloud Platform and it turned out to be pleasantly straightforward process.

Andy’s blogs contain much more detail and this blog should be read in conjunction with them. In particular, the final post – Dockerized Kognitio part 4 describes the process of running Kognitio on Kubernetes on AWS (which this GCP deployment was derived from) and it includes information on how to stop, start and resize the Kognitio cluster which are equally applicable to this deployment.

Architecture

This diagram below shows how I deployed Kognitio on GCP:

Architecture diagram showing Kognitio containers running in a Kubernetes deployment connected to a filestore using a Kubernetes volume / persistent volume and connected to the client using a load balancer service

As you can see, this is not a complex setup – the accompanying shell script creates all the required resources and deploys Kognitio in under 250 lines of code. The script is stand alone in that you can run it in Google cloud shell and it will create a working Kognitio on Kubernetes cluster. You can also use parts of it as an example of how to deploy on existing infrastructure.

The provisioned GKE cluster consists of three 8CPU / 64GiB nodes which run one 56GiB Kognitio container each. You can run up to a total of 512GiB of Kognitio container memory without a license so feel free to increase the node and container sizes. If you want to trial larger systems please contact us for a license.

I added the storage-rw scope to the Kubernetes cluster definition to give the nodes (and hence the containers) write access to the projects GCP buckets. There are several options for controlling access to GCP (and other cloud providers) storage and this is the simplest.

For persistent storage I used a GCP Filestore mounted using a Kubernetes persistent volume / volume claim. This is a high performance NFS based file system that provides a shared volume that Kognitio uses to store tables and metadata. The smallest size you can allocate is 1TB which is plenty for testing. 

If you already have a Filestore or similar you can use that instead but it must mount an empty directory on /data and it must support the ReadWriteMany access mode and be POSIX compliant. Also bear in mind that disk based table and metadata access performance are dependent on file system performance (so no thinly disguised blob stores please).

To access the cluster I used a Kubernetes load balancer service which creates a GCP load balancer. Kognitio accepts JDBC or ODBC connections on port 6550 on any node in the cluster so a load balancer is a convenient way to provide a connection. 

The script specifies preemptible nodes so they will disappear after (at the most) 24 hours, either change the setting in the script or when the nodes disappear, just reprovision the nodes, redeploy the cluster and start the Kognitio server again as described in the Dockerized Kognitio part 4 blog.

Running the script

All files mentioned here are available in github

Copy kubernitio-gcp.sh to your Google cloud shell (or just clone the repo into it) and with the session set to a suitable project (you must have permission to create a filestore, a GKE cluster and provision 24 CPUs)  run it with: 

./kubernitio-gcp create <CIDR block>

The recommended CIDR block is <your external ip address>/32 – you can use 0.0.0.0/0 but the Kognitio server will then be accessible by anyone which is not recommended.

It should take around ten minutes to create the Kubernetes cluster and deploy Kognitio on it.

The script is automated in that it waits for resources to become available before using them but you have to enter some basic information about the system. The first step is Kognitio cluster initialisation:

  1. License agreement – enter “yes” to see the agreement or return to skip.
  2. Accept license agreement – enter “yes” to continue.
  3. Cluster ID – enter up to 12 lower case, numeric or underscore characters – e.g “mycluster”
  4. Number of storage volumes – enter “8”
  5. Storage volume size – enter “100”
  6. License key – enter “-” unless you have allocated more than 512GiB of container memory otherwise enter a license key.

The Kognitio cluster will now be initialised. Although you have told it to use 800GB of disk, Kognitio uses sparse volumes where possible so you will not see this if you look at the Filestores usage.

The next phase is Kognitio server initialisation:

  1. SYS password – enter the password you want to use for the SYS (admin) account.
  2. System ID – enter the cluster ID you entered above.
  3. Enter to continue or ctrl-c to abort – press enter to continue.

The Kognitio server will now be initialised and you can connect to it on the IP address of the load balancer which is printed out at the end or by running.

./kubernitio-gcp info

To delete all the resources allocated by the script you can run:

./kubernitio-gcp delete

This will delete the Kubernetes cluster and the filestore (and therefore the database and any data you may have put in it).

Once you have built the Kognitio cluster, you can connect to it using any ODBC or JDBC compatible query tool

Once you are connected to the Kognitio cluster, you need to get some data to query. The easiest way is to use an external table to connect to data stored in GCS. The file gcs-ext-table-demo.sql in the github repo contains SQL showing how create external tables for accessing data in CSV, ORC or Parquet files. The block connector used for loading CSV files is very versatile and has many target string options for reading data in a variety of formats including Avro

Don’t forget to create memory images of your data before querying it.

If you have any comments or questions please leave them below or use our community forum

Leave a Reply

Your email address will not be published nor used for any other purpose. Required fields are marked *