External tables

External tables allow you to process data that is held outside Kognitio. For each external data source, you’ll need a connector. You can use one of our standard connectors or define your own custom connectors For example, to use Kognitio to access data held in an HDFS file system, we provide a standard connector called HDFS. This is pre-installed when you create a Kognitio on Hadoop cluster.

Usage

CREATE EXTERNAL TABLE table-name [({column-name [data-type], ...})]
[FOR INSERT | FOR INSERT ONLY | FOR SELECT ONLY]
[COMMENT 'comment']
FROM connector-name
TARGET target-string

Notes

The connector-name specifies the connector used to access the external data source. For more details on connectors see the connector DDL or the connector overview.

The target-string specifies the location of the external data source via the connector as well as other target attributes including error handing and controlling parallel threads.

In order to be able to create an external table a user must have the create external table privilege on the schema (or system wide) and the connect privilege on connector-name. See privileges for more details.

As of Kognitio version 8.2.3 there are 3 options to control the read/write behaviour of an external table:

  • FOR INSERT - table is both readable and writable
  • FOR INSERT ONLY - writable only. For exporting data from Kognitio
  • FOR SELECT ONLY - readable only. This is the default option for backwards compatiblity purposes.

Note that currently Kognitio standard connectors only support FOR SELECT ONLY read only external tables. If you want to be able to use writable external tables you need to create custom connectors. For an example see creating and using a writable CSV connector.

Example 1: Using an anonymous external table

It is also possible to have anonymous external tables by placing the EXTERNAL TABLE definition into a SELECT statement, for example:

SELECT * FROM (
    EXTERNAL TABLE (c1 varchar) FROM hdfs
    TARGET 'file /my/hdfs/file.txt
    FMT_FIELD_SEPARATOR ":"
    FMT_IGNORE_EXTRA_FIELDS yes'
) dt

The advantages of anonymous external tables are:

  • Users without create external table privilege are allowed to use connectors to access external data in an ad-hoc fashion.
  • The external table is defined and used in one statement without the need to create an object
  • The SQL query indicates where the data is coming from.

Example 2: Using the Kognitio standard ORC connector

Kognitio has a number of standard connectors for accessing data in external data sources. One of these allows Kognitio to access data stored in ORC file format:

CREATE EXTERNAL TABLE t_orc_1000( f_pk INT,
                                  f_tinyint TINYINT,
                                  f_smallint SMALLINT
                                 )
FROM HDFS_ORC
TARGET 'uri_path "/user/hive/warehouse/jet_test.db/orc_1000"'

This external table connects Kognitio to ORC formatted files stored in the orc_1000 Hive table in the jet_test Hive database. For more details see the ORC Connector page.

Once an external table is created data can be selected directly from it:

SELECT TOP 100 *
FROM t_orc_1000;

However it is more common to create a view and pull the data into Kognitio RAM for extremely fast access for all users that have access to it:

CREATE VIEW v_orc_1000 AS
SELECT *
FROM t_orc_1000;

CREATE VIEW IMAGE v_orc_1000;

SELECT *
FROM v_orc_1000;