In the last year I've had the opportunity to play around with a few BIRead More
Installing Kognitio on Azure HDInsight
This is an updated version of the blog entry first published in March which reflects the simplified installation procedure for Kognitio on Hadoop version 8.2.1 on Azure HDInsight…
HDInsight is the easiest way of getting a Hadoop cluster up and running on Microsoft’s Azure cloud platform. It’s essentially Hortonworks HDP integrated with the Azure storage options.
Installing Kognitio on HDInsight is pretty straightforward and can be accomplished by following the instructions here. This blog just provides some surrounding context for Azure.
First create an HDInsight cluster of the type “Hadoop”. A basic evaluation system can be built with two D13 v2 worker nodes (and the sizings below assume this is what you have built) but to really test the power of Kognitio on Hadoop, you need to create a system with hundreds of gigabytes of memory.
We recommend using Azure Storage rather than Data Lake Storage as the primary storage type since it is a bit faster but Data Lake Storage will be fine but the initial database instance creation will be quite slow.
Once the cluster has been created you can install the Kognitio software. In production, Kognitio can be installed on either a head node or an edge node – we will use the primary head node, hn0-<clustername> here.
SSH into hn0-<clustername> and carry out the following actions to install Kognitio.
Create a user called kognitio to install the software under.
sudo adduser kognitio
Now log in as the kognitio user and…
Download Kognitio on Hadoop from our getting started page, copy the tar file to the kognitio users home directory and then unpack it with.
tar -xvf kodoop-80201<release number>.tar.gz
This creates a directory called kodoop which contains the executable used to manage your Kognitio cluster(s) (“~/kodoop/bin/kodoop”) and various supporting files as well as directories to contain cluster data such as log files.
You can now create a Kognitio cluster – the command below assumes you built an HDInsight Hadoop cluster with two D13 v2 worker nodes. For different configurations, see https://kognitio.com/documentation/latest/install/create-cluster.html for how to size a Kognitio cluster. If you want to leave some room for Hive etc to run, reduce the CONTAINER_MEMSIZE parameter to 24576 (the server will still work down to 4096 but external scripts and table connectors work best with 24GB or more of memory in a container).
CONTAINER_MEMSIZE=40960 CONTAINER_VCORES=6 CONTAINER_COUNT=2 kodoop create_cluster kog1
The kodoop create_cluster command will take several minutes to run and will generate a few screenfulls of status messages.
When you see the message:
Initialisation complete. The initial sys password (your system ID) will appear in various logs. We recommend you change it now.
Your cluster is now running and you can connect to it using ODBC or JDBC, the cluster is called kog1 and this is also the sys password for the database. In the YARN resource manager the application will be called kognitio-kog1.
If you don’t get this message, please contact us via the community forum or via your support contact and provide all the console output.
Kognitio Console is an ODBC based SQL submission tool incorporating many Kognitio administrative functions and it can be downloaded from the client tools section of https://kognitio.com/all-downloads/
Kognitio is an in-memory database and it requires you to explicitly load data into memory images to get the best performance from it. This is a simple process and is explained in the Images: Quick Overview documentation.
The Parquet and ORC table connector documentation contains details of how to connect to Parquet and ORC files on various file systems including wasb: and adl:
Getting started with Kognitio on Hadoop contains a step by step guide to loading some retail data, imaging and then querying it.