What’s new in Kognitio 8.2.3?

The current version of Kognitio is 8.2.3. The latest release is 80203rel190726. In this article we cover:

New features

  • Kognitio for the cloud

    Kognitio is now available to deploy on AWS Marketplace. This allows you to use Kognitio on-demand and to take advantage of cloud compute resources.

  • Kognitio Launcher

    Introduced to make the installation of Kognitio easier. In the initial release it is available for launching in AWS as a CloudFormation template.

  • Writable External Tables

    The Kognitio external table framework for accessing data held in external sources has extended to allow writable external tables. The syntax is:

    CREATE EXTERNAL TABLE table_name [(column_names)]
    [COMMENT 'comment']
    FROM connector_name
    [TARGET 'target'];

    For full details on syntax see the CREATE EXTERNAL TABLE DDL.

    When upgrading Kognitio from versions prior to 8.2.3 all existing external tables are considered to be FOR SELECT ONLY and are not writable.

  • Hive Connector

    Allows seamless access to Hive tables from within Kognitio. The connector is able to discover metadata (column data, external file location and format, etc) belonging to any table in Hive. Based on this information it passes off the processing of the object to other connectors available as standard in Kognitio. To the end-user this is all seamless.

    The hive connector is the first Metdata Connector added to the external table framework.

    For users already familiar with Kognitio and external tables there is a Hive Connector Quick Reference Sheet available.

  • Glue Connector

    Allows seamless access to AWS Glue tables from within Kognitio. The connector is able to discover metadata (column data, external file location and format, etc) belonging to any table in Glue. Based on this information it passes off the processing of the object to other connectors available as standard in Kognitio. To the end-user this is all seamless.

    The glue connector is the second Metdata Connector added to the external table framework.

    For users already familiar with Kognitio and external tables there is a Glue Connector Quick Reference Sheet available.

    If you are deploying Kognitio on AWS then a Glue connector comes pre-installed.

  • Backup and restore directly to HDFS

    Two new client tools are available for backing up and restoring Kognitio directly into HDFS.

    wxhdfsbackup does a backup to a temporary directory located on the node where kognitio is installed. Any data files created during the backup process are streamed directly into HDFS files. Once the backup is complete the temporary directory is tarred up and transferred to HDFS as well.

    wxhdfsrestore unpacks a metadata tarball, created by the wxhdfsbackup process, into a temporary directory on the node where Kognitio is installed. A restore is then carried out transferring any data directly from HDFS files.

    See the Backup to HDFS quick reference sheet for examples and more details.

  • Long-lived Java Daemon

    Runs alongside the rest of the Kognitio software (which is c/c++ based). Kognitio creates any Java objects it requires within this process. Kognitio can then communicate with these objects via RPC calls. The Java daemon is designed to be less resource intensive - removing the need to create multiple JVM instances allowing JIT etc. to be done once per object rather than once per operation.

  • New Java Connector API

    A new connector framework was added that utilizes the long-lived java Daemon to enable high performance, low latency Java based connectors to be created. These allow Kognitio to take advantage of existing libraries used to access files formats such as Parquet and ORC very efficiently. The existing Java-based standard connectors have all been converted to use this new API. The legacy JET connector framework is still available but is no longer supported.

  • New and simpler API makes filtering easy for connectors to implement pushdown filters to perform partition elimination, etc. Further partition filter optimisations can be applied in view creation so queries can take advantage of partition filtering when they would not normally be able to. For eaxmple queries with filters on TIMESTAMP columns can utilize DATE partitions. See our blog post for more details.

  • A new LISTAGG function is now available. This allows aggregation to concatentate strings with an optional seperator.

  • New ALTER TABLE options for external tables. Users can now switch connector or enable/disable the read/write capabilities for an existing external tables.

Changes in behavior

This section outlines the changes in behavior in Kognitio version 8.2.3 when compared with previous Kognitio 8.2.1 release

Performance Improvements

  • Image Rebuild on start-up - any RAM images that require rebuilding on Kognitio restart are now done in the background. Kognitio becomes available prior to these images being built. Any query that is dependent on these images is queued behind the image re-creation. See information on the imaging queue for more details.

  • Optimiser improvements for temp table distributions in joins - reduces memory overhead for some queries.

  • Asymmetric querying of replicated objects - improved scaling of accesses against replicated tables, resulting in performance and concurrency improvements for some queries.

  • Simultaneous drops of views with a common parent object - improvements to locking mean this is now allowed

  • Faster performance for INSERT-SELECT statements - where query contains an IN subquery returning a small result set.

  • Disk space availability check on upgrades - a much smaller amount of free space is now required in order to perform an upgrade

Memory allocation changes

There has been a number of changes in the default sizes of memory allocated to different processes in Kognitio:

  • Java based connector - improved performance of Java connectors such as Hive, Glue, ORC and Parquet considerably. The memory available for user data is reduced by 512M per node or container (not per ramstore).

  • SYS ramstore - size is now based on the node/container size. The size was previously fixed at 512 MB but is now the larger of 512M or 2% of memory available up to a maximum of 3.5G. Memory availabe for user data is reduced by the same amount. This allows for more performance of queries on the System Tables for larger systems

  • Shared memory pool - is used on the nodes or within containers for generic processing. It has been increased to 1G from 600M. This has no effect for containers/nodes with more than 16G of RAM. For systems built from smaller nodes or containers it results in less memory available for data processing but will increase system stability and enable the use of the full range of connectors

  • Checks for minimum container size - when deployed on Hadoop. Kognitio will now refuse to create containers smaller than 4G. When creating containers smaller than 16G a warning will be displayed indicating this is below the recommended size.

  • External Script Limit Default - this has been increased from 2G to 10G to accomodate larger memory intensive scripts

External Data Sources

  • Transaction support in connector API - providing notification of commits and rollbacks to connectors to allow external tables to be writable.

  • Improved locking options for connectors - now specifies whether concurrent parallel accesses to external objects are allowed

  • Parquet and ORC connectors replaced with new ones which use the new Java connector API (above)

  • Queries over multiple external tables - improvements to resource scheduling now allocate resources for one union branch at a time rather than allocating for all branches simultaneously. This means queries over mulitple external objects via UNION are more perfromant

  • Avro support now includes schema evolution - Avro files can now have different schemas provided a reader schema is specified. See the Loading Avro with Connectors Quick Reference Sheet for more details.

Kognitio on Hadoop Changes

  • Multiple Namenodes - the HDFS connector now handles multiple namenodes supplied as a comma separated list and appropriate failover

  • In-built version of libhdfs library - this library is normally part of Hadoop. The software will default to using the hadoop-supplied version if already present on the system but will fall back to utilizing the inbuilt version (and output a startup warning) if this is not found.

  • Cluster creation on Hadoop - improvements in the speed of metadata initialisation when building new clusters on Hadoop.

Other changes

  • Correct startup banner in kogscript exe

  • Timezone detection on Redhat based systems - pick up correct timezone on newer RedHat based systems. (using /etc/localtime). If your server is using the default timezone and this is not aligned with the default timezone on the nodes, this change may cause the SMD and SQL level timezone to change on upgrade.

  • Security Changes - there a number of small changes in security:

    1. Only password failures are counted towards failed login attempts for account lockouts

    2. Configuration file can now be edited by an user that is not also present on the Kognitio nodes

    3. Support for user-mapping when Kognitio is not running as root via the sudo command. This is used by default in Kognitio on AWS.