What’s new in Kognitio 8.2.3?¶
This article covers new features in Kognitio versions 8.2.3 For latest information see What’s New in 8.2.4 In this article we cover:
Kognitio is now available to deploy on AWS Marketplace. This allows you to use Kognitio on-demand and to take advantage of cloud compute resources.
Introduced to make the installation of Kognitio easier. In the initial release it is available for launching in AWS as a CloudFormation template.
The Kognitio external table framework for accessing data held in external sources has extended to allow writable external tables. The syntax is:
CREATE EXTERNAL TABLE table_name [(column_names)] [FOR INSERT | FOR INSERT ONLY | FOR SELECT ONLY] [COMMENT 'comment'] FROM connector_name [TARGET 'target'];
For full details on syntax see the CREATE EXTERNAL TABLE DDL.
When upgrading Kognitio from versions prior to 8.2.3 all existing external tables are considered to be
FOR SELECT ONLYand are not writable.
Allows seamless access to Hive tables from within Kognitio. The connector is able to discover metadata (column data, external file location and format, etc) belonging to any table in Hive. Based on this information it passes off the processing of the object to other connectors available as standard in Kognitio. To the end-user this is all seamless.
The hive connector is the first Metdata Connector added to the external table framework.
For users already familiar with Kognitio and external tables there is a
Hive Connector Quick Reference Sheetavailable.
Allows seamless access to AWS Glue tables from within Kognitio. The connector is able to discover metadata (column data, external file location and format, etc) belonging to any table in Glue. Based on this information it passes off the processing of the object to other connectors available as standard in Kognitio. To the end-user this is all seamless.
The glue connector is the second Metdata Connector added to the external table framework.
For users already familiar with Kognitio and external tables there is a
Glue Connector Quick Reference Sheetavailable.
If you are deploying Kognitio on AWS then a Glue connector comes pre-installed.
Backup and restore directly to HDFS
Two new client tools are available for backing up and restoring Kognitio directly into HDFS.
wxhdfsbackupdoes a backup to a temporary directory located on the node where kognitio is installed. Any data files created during the backup process are streamed directly into HDFS files. Once the backup is complete the temporary directory is tarred up and transferred to HDFS as well.
wxhdfsrestoreunpacks a metadata tarball, created by the
wxhdfsbackupprocess, into a temporary directory on the node where Kognitio is installed. A restore is then carried out transferring any data directly from HDFS files.
Backup to HDFS quick reference sheetfor examples and more details.
Long-lived Java Daemon
Runs alongside the rest of the Kognitio software (which is c/c++ based). Kognitio creates any Java objects it requires within this process. Kognitio can then communicate with these objects via RPC calls. The Java daemon is designed to be less resource intensive - removing the need to create multiple JVM instances allowing JIT etc. to be done once per object rather than once per operation.
New Java Connector API
A new connector framework was added that utilizes the long-lived java Daemon to enable high performance, low latency Java based connectors to be created. These allow Kognitio to take advantage of existing libraries used to access files formats such as Parquet and ORC very efficiently. The existing Java-based standard connectors have all been converted to use this new API. The legacy JET connector framework is still available but is no longer supported.
New and simpler API makes filtering easy for connectors to implement pushdown filters to perform partition elimination, etc. Further partition filter optimisations can be applied in view creation so queries can take advantage of partition filtering when they would not normally be able to. For eaxmple queries with filters on
TIMESTAMPcolumns can utilize
DATEpartitions. See our blog post for more details.
A new LISTAGG function is now available. This allows aggregation to concatentate strings with an optional seperator.
New ALTER TABLE options for external tables. Users can now switch connector or enable/disable the read/write capabilities for an existing external tables.
Changes in behavior¶
This section outlines the changes in behavior in Kognitio version 8.2.3 when compared with previous Kognitio 8.2.1 release
Image Rebuild on start-up - any RAM images that require rebuilding on Kognitio restart are now done in the background. Kognitio becomes available prior to these images being built. Any query that is dependent on these images is queued behind the image re-creation. See information on the imaging queue for more details.
Optimiser improvements for temp table distributions in joins - reduces memory overhead for some queries.
Asymmetric querying of replicated objects - improved scaling of accesses against replicated tables, resulting in performance and concurrency improvements for some queries.
Simultaneous drops of views with a common parent object - improvements to locking mean this is now allowed
Faster performance for INSERT-SELECT statements - where query contains an
INsubquery returning a small result set.
Disk space availability check on upgrades - a much smaller amount of free space is now required in order to perform an upgrade
Memory allocation changes¶
There has been a number of changes in the default sizes of memory allocated to different processes in Kognitio:
Java based connector - improved performance of Java connectors such as Hive, Glue, ORC and Parquet considerably. The memory available for user data is reduced by 512M per node or container (not per ramstore).
SYS ramstore - size is now based on the node/container size. The size was previously fixed at 512 MB but is now the larger of 512M or 2% of memory available up to a maximum of 3.5G. Memory availabe for user data is reduced by the same amount. This allows for more performance of queries on the System Tables for larger systems
Shared memory pool - is used on the nodes or within containers for generic processing. It has been increased to 1G from 600M. This has no effect for containers/nodes with more than 16G of RAM. For systems built from smaller nodes or containers it results in less memory available for data processing but will increase system stability and enable the use of the full range of connectors
Checks for minimum container size - when deployed on Hadoop. Kognitio will now refuse to create containers smaller than 4G. When creating containers smaller than 16G a warning will be displayed indicating this is below the recommended size.
External Script Limit Default - this has been increased from 2G to 10G to accomodate larger memory intensive scripts
External Data Sources¶
Transaction support in connector API - providing notification of commits and rollbacks to connectors to allow external tables to be writable.
Improved locking options for connectors - now specifies whether concurrent parallel accesses to external objects are allowed
Parquet and ORC connectors replaced with new ones which use the new Java connector API (above)
Queries over multiple external tables - improvements to resource scheduling now allocate resources for one union branch at a time rather than allocating for all branches simultaneously. This means queries over mulitple external objects via
UNIONare more perfromant
Avro support now includes schema evolution - Avro files can now have different schemas provided a reader schema is specified. See the
Loading Avro with Connectors Quick Reference Sheetfor more details.
Kognitio on Hadoop Changes¶
Multiple Namenodes - the HDFS connector now handles multiple namenodes supplied as a comma separated list and appropriate failover
In-built version of libhdfs library - this library is normally part of Hadoop. The software will default to using the hadoop-supplied version if already present on the system but will fall back to utilizing the inbuilt version (and output a startup warning) if this is not found.
Cluster creation on Hadoop - improvements in the speed of metadata initialisation when building new clusters on Hadoop.
Correct startup banner in kogscript exe
Timezone detection on Redhat based systems - pick up correct timezone on newer RedHat based systems. (using
/etc/localtime). If your server is using the default timezone and this is not aligned with the default timezone on the nodes, this change may cause the SMD and SQL level timezone to change on upgrade.
Security Changes - there a number of small changes in security:
Only password failures are counted towards failed login attempts for account lockouts
Configuration file can now be edited by an user that is not also present on the Kognitio nodes
Support for user-mapping when Kognitio is not running as root via the
sudocommand. This is used by default in Kognitio on AWS.