Kognitio for SQL experts¶
or how to get the most out of Kognitio.
Kognitio is a mature ANSI compliant SQL database with a comprehensive language feature set - at it’s simplest level, it behaves just like any other JDBC / ODBC accessible SQL database.
It is ACID compliant but it is designed primarily for analytical query performance so don’t expect to run heavy transaction processing workloads on it.
The biggest difference is that Kognitio is an in-memory database with declarative memory management - you need to tell it what to put into memory. If you don’t load data into memory, it will work but performance will be poor compared to in-memory speeds.
Data is loaded into memory using create image statements which create an in memory representation of a physical table or view. This is explained in Using Kognitio Memory. While having to explicitly manage memory may seem like extra work, the in-memory architecture does not require indexes (which can be very time consuming to create and maintain) and so, particularly for ad-hoc workloads, Kognitio is easier to manage.
Kognitio was designed from the outset to be simple to maintain - the majority of system administration is done using SQL. Product installation, fault finding etc may require setting configuration parameters and accessing log files but all the normal system administration tasks are performed using SQL extensions which makes administration simpler to automate and reduces complexity.
All versions of Kognitio allow the creation of both internal and external tables.
Internal tables are just like tables in any other database and are stored on the underlying storage layer for the platform (Linux disk, HDFS, MapR-FS). The data in them is only acccessible by Kognitio, they are fully transactional and performance is dependent on the performance of the underlying filesystem and whether they are imaged in RAM or not.
External tables are used to access data on a wide variety of platforms from flat files in HDFS to ORC or Parquet files in S3. They are based around connectors with Kognitio providing many as standard.
Connectors can directly define the type and location of the files or objects they access or they can connect to Hive or Glue metastores where the type, location and table definition will be automatically determined by the connector.
Custom connectors can also be created for unusual data sources - these are easy to write using any of the common Linux programming languages such as Python or BASH.
Tables are explained in more detail in Kognitio Tables
Kognitio also has the ability to integrate non SQL analytics into a SQL workflow.
“External scripts” are a mechanism that allows data to be
selected using a SQL statement, piped through any Linux compatible program and then
returned as if a simple
SELECT had been executed. Data can be partitioned
and sorted so that the external code is presented with data in the most convenient
way and the Kognitio script scheduler automatically manages parallelism.
This is a very powerful facility which can be used for operations as diverse as trend
forecasting using the statistical programming language
R or time series analysis in
C++. Since the results of the external processing are returned in the same way
as the result of a
SELECT statement, the complexity of the external processing
can be hidden from the users or from BI tools providing an effective and robust
method of deploying advanced analytics. This allows your data scientists to package
their work for end user consumption and to get on with further analysis rather than
having to support end user in the running of their analyses.