BI tools are great for inspecting your data from a viewpoint that databases can't provide.Read More
Using Kognitio on HDP Hortonworks
Please note: this blog relates to an older version of Kognitio. To find equivalent instructions for the latest version of Kognitio, please go to this blog.
In this article we will install and configure a Kognitio cluster on to a running HDP Hadoop cluster.
In order to run Kognitio on HDP Hortonworks you will need:
- Version 8.2.00-rel170616 or later of the Kognitio on Hadoop software
(from https://kognitio.com/products/kognitio-on-hadoop )
You do not need a licensed Hortonworks installation to run Kognitio.
Getting ready to start
When running on Hadoop, the Kognitio software tarball only needs to be installed on one Hadoop node as the user running it. The tarball will typically be installed on an ‘edge node’, an interface node between the Hadoop cluster and the outside network, though you can install on any of the Hadoop cluster nodes if you do not have an edge node specifically configured.
Kognitio on Hadoop runs as a YARN application. The controlling node where you install the Kognitio tarball will act as the gateway for clients to access, and control your Kognitio cluster(s).
Creating a Kognitio cluster will submit a YARN task that will distribute and resource manage a configurable number of Kognitio software containers to the Hadoop nodes.
Configuring the ‘edge node’
The edge node needs to be configured so that it can be used as a controlling node for creating and managing one or more Kognitio clusters. You need a User for the software to run as, set it up for HDFS access, and install/configure the Kognitio software under that user. Specifically:
• Create a ‘kodoop’ user
• Create an HDFS home directory for the user
• Unpack the kodoop.tar.gz into the user’s home directory
If you already have a user account on the edge node, and a HDFS home account for that user, then skip to the Kognitio on Hadoop install.
#!/bin/bash sudo useradd -c "kodoop user" -d /home/kodoop -m kodoop HADOOP_USER_NAME=hdfs hadoop fs -mkdir /user/kodoop HADOOP_USER_NAME=hdfs hadoop fs -chown kodoop /user/kodoop sudo mkdir ~kodoop/.ssh sudo cp -r ~/.ssh/authorized_keys ~kodoop/.ssh sudo chown -R kodoop ~kodoop/.ssh
This script creates a kodoop user first followed by creating a user directory on HDFS and changing the ownership of the HDFS directory. It will copy your current authorized public keys to the new user to enable ssh access. Finally kodoop is added to the PATH environment variable for the kodoop user.
Kognitio on Hadoop install
Download the Kognitio on Hadoop latest release (https://kognitio.com/products/kognitio-on-hadoop/) and place it into a directory accessible to the edge node.
Unpack the tar-ball into the Kodoop home directory.
tar -xvf /tmp/kodoop.tar.gz
Add kodoop to the PATH for the kodoop user.
echo PATH=~/kodoop/bin:$PATH > ~/.bashrc
Now ssh into the edge node and type ‘kodoop‘. This will invite you to accept the EULA and display some useful links for documentation, forum support, etc. that you might need later.
Finally you can run ‘kodoop testenv’ to validate that the environment is working properly.
kodoop testenv Kognitio Analytical Platform software for Hadoop ver80200rel170824. (c)Copyright Kognitio Ltd 2001-2017. KODOOP_HOME: /home/kodoop/kodoop HADOOP_COMMAND: hadoop SLIDER_COMMAND: slider KODOOP_LOGS: /home/kodoop/kodoop/logs KODOOP_RUNTIME: /home/kodoop/kodoop/clusters HDFS seems to be working OK Slider's client seems to be working OK Check finished. If problems were reported you can find extra information in /home/kodoop/kodoop/logs/commands
If there are any problems reported at this stage, follow the guidelines referring to the log file generated, and refer to the Kognitio on Hadoop forum for further support https://kognitio.com/forum/viewforum.php?f=13
Kognitio on Hadoop is now installed on the edge node and you are ready to build Kognitio clusters.
Preparing your HDP Hortonworks Hadoop cluster for Kognitio
Consult the Kognitio analytical platform requirements at https://kognitio.com/forums/viewtopic.php?f=2&t=138 and install the appropriate packages for your Linux distribution. This needs to be done for all nodes in your Hadoop cluster which can run YARN containers and for the ‘edge’ node.
Configuring /dev/shm space correctly
The /dev/shm filesystem (symlinked to /run/shm on some Linux distributions) is a tmpfs filesystem used to hold shared memory objects. The Kognitio Analytical Platform uses these objects to hold memory images. Typically this filesystem is mounted with a limit of 50% of system memory, which is not enough to run a Kognitio cluster in a container that uses most of a node’s memory. You will typically need to remount /dev/shm with the option ‘size=90%’ to allow up to 90% of system memory to be used for shared memory objects.
For a running node you can do this with:
mount /dev/shm -o remount,size=90%
This will take effect immediately but will not persist after a reboot. To persist the change after a reboot you will also need to put the change into /etc/fstab, for example:
tmpfs /dev/shm tmpfs size=90% 0 0 # was tmpfs /dev/shm tmpfs defaults 0 0
The above change applies to all slave nodes in the Hadoop cluster.
Setting user limits
Some Linux distributions ship with a configuration that sets ulimit values for users. Most of these are not a problem but the ‘nprocs’ (max user processes or -u) limit can cause problems when running with the Kognitio software because it also counts threads. The Kognitio software is aggressively multi-threaded and can often exceed this limit. This limit should be set to a large number (minimum 100,000) for any user under which the Kognitio YARN tasks will run. Typically this will be the YARN user but it could also be the edge node user if your Hadoop cluster is configured to run YARN jobs with setuid.
Changing the YARN user limits from Ambari
Select the YARN service -> Configs Select the Advanced Tab -> Advanced yarn-env
Update the yarn_user_nproc_limit to 100,000
You will be prompted to restart the Hadoop YARN service for the config change to be made.
Changing the default YARN container size setting
Out of the box Hortonworks configures YARN with limited resources; memory allocation for containers, and number of vcores per container.
Set these as big as possible. This will provide the greatest flexibility to configure a large Kognitio cluster or multiple smaller clusters.
Open Ambari, select the YARN service, and select the configs tab. change the “maximum container size”, ‘Number of virtual cores’, and ‘maximum container size (vcores)’ as big as possible.
NOTE: Changing the YARN container and vcore settings will prompt recommended changes to the YARN map reduce settings. Kognitio on Hadoop does not use map reduce jobs and the recommended changes can be uncommented and left as is.
After changes to the YARN configuration Ambari will prompt to restart the services that have stale configs.
Optimally, you will want to create a single container per node which uses nearly all the memory for when you want the best performance for running production load, however it is also possible to create small containers and even have multiple Kognitio on Hadoop clusters and containers running on a single node, as you might choose during a development cycle.
Depending on the workload of the Hadoop cluster you will need to determine the Kognitio container size to use. A single container cannot span multiple nodes so make sure you do not set the maximum limit to be greater than the amount of YARN memory available to each node.
Future Kognitio on Hadoop releases will have the option to create Kognitio clusters using a percentage of available YARN memory without the user having to determine the memory available.
In addition to the containers, the Kognitio cluster also needs to be able to create a 2048MB application management container with 1 vcore.
If you set the container memory size to be equal to the capacity and put one container on each node then there won’t be any space for the management container. For this reason you should subtract 1 from the vcore count and 2048 from the memory capacity.
Creating a Kognitio on Hadoop cluster
Once the YARN container size has been configured, and the Kodoop user has been created you are ready to build a Kognitio cluster. The Kognitio cluster will be made up of a number of YARN containers. You specify the size of these containers, the vcore count for the containers and the number of containers to create.
You will also need to choose a name for the cluster which must be 12 characters or less and can only contain lower case letters, numbers and an underscore. Assuming we call it ‘cluster1’ we would then create a Kognitio cluster on the above example cluster like this:
CONTAINER_MEMSIZE=28000 CONTAINER_VCORES=4 CONTAINER_COUNT=6 kodoop create_cluster cluster1
This will display the following and invite you to confirm or cancel the operation:
kodoop@ip-xx-x-x-xxx:~$ CONTAINER_MEMSIZE=28000 CONTAINER_VCORES=4 CONTAINER_COUNT=6 kodoop create_cluster cluster1 Kognitio Analytical Platform software for Hadoop ver80200rel170824. (c)Copyright Kognitio Ltd 2001-2017. Creating Kognitio cluster with ID cluster1 ================================================================= Cluster configuration for cluster1 Containers: 6 Container memsize: 28000 Mb Container vcores: 4 Internal storage limit: 10 Gb per store Internal store count: 6 External gateway port: 6550 Kognitio server version: ver80200rel170824 Cluster will use 168 Gb of ram. Cluster will use up to 60 Gb of HDFS storage for internal data. Data networks: all Management networks: all Edge to cluster networks: all Using broadcast packets: no ================================================================= Hit ctrl-c to abort or enter to continue
The above is telling you the configuration of the cluster you are about to create. A Kognitio 168Gb cluster consisting of 6 28Gb containers assigning 4 vcores to each container, and 60Gb on HDFS as the Kognitio data store will be created. If this looks OK, hit enter and the cluster will be created.
Once creation is completed you will have a working Kognitio Analytical platform up and running and ready to use.
If you have over committed the YARN resource available it is likely that the creation will fail waiting for the resource manager to allocate a resource request that it cannot fulfill. The Hadoop ResourceManager UI provides you with information about the Kognitio cluster application and the containers created.
From Ambari select the YARN service and click the Quick Links ResourceManager UI.
For further information on the configuration settings and options (https://kognitio.com/documentation/latest/install/install-kodoop.html)
At this point you will have a working Kognitio cluster up and ready to use. If you’re already a Kognitio user you probably know what you want to do next and you can stop reading here.
Full Kognitio on Hadoop documentation to get you started on your Kognitio journey is available from our website (https://kognitio.com/documentation/latest/getstarted/quickstart.html)
You can download the Kognitio client tools (https://kognitio.com/forum/viewtopic.php?f=5&t=10) install them locally, run Kognitio console and connect to port 6550 on the edge node to start working with the server.
Alternatively you can just log into the edge node as kodoop and run ‘kodoop sql <system ID>’ to issue sql locally. Log into Kognitio as ‘sys’ user with the system ID as the password (it is a good idea to change this!).
kodoop sql cluster1 Kognitio Analytical Platform software for Hadoop ver80200rel170824. (c)Copyright Kognitio Ltd 2001-2017. Kognitio WX2 SQL Submission Tool v8.02.00-rel170824 (c)Copyright Kognitio Ltd 1992-2017. Password: Connected to localhost:6550 ODBC Version 8.02.00-rel170824 Server Version 08.02.0000 >alter user "SYS" alter password to "PASSWORD"; Query 1 Complete ---- 0:00.1 0:00.1 0:00.1
There are now lots of different ways you can set up your server and get data into it but the most common thing to do is to build memory images (typically view images) to run SQL against. This is typically a two step process involving the creation of external tables which pull external data directly into the cluster followed by the creation of view images on top of these to pull data directly from the external source into a memory image. In some cases you may also want to create one or more regular tables and load data into them using Kognitio wxloader or another data loading tool, in which case Kognitio will store a binary representation of the data in the HDFS filesystem.