Kognitio External Table Connectors

External table connectors are used in the creation of external tables which can access a wide variety of data sources. The connector contains the code required to connect to the data source and the external table provides the definition of exactly what data to access and (if it can’t be determined from the data source) what format that data is in.

For example, an external table connector which uses the Amazon S3 connector to read a CSV file will need to supply the column definitions it expects in the CSV file and possibly details about how to handle NULL values etc. An external table that uses the Hive connector only needs to specify the location of the Hive table (schema and tablename). The connector will determine the column definitions from the Hive metastore.

Standard Connectors

There a number of connectors available as standard within Kognitio. These are outlined in the table below. Depending on your Kognitio deployment option some of these will be pre-installed (marked with *):

Available standard connectors by data source and file format

Table / File Format

Amazon S3

Glue

Hadoop File System 1

Hive

Another Kognitio System

Another Database via ODBC

Table

N/A

Glue*

N/A

Hive

Cross

ODBC

Flat file

S3*

Glue*

HDFS

Hive

N/A

N/A

JSON 2

S3*

N/A

HDFS

N/A

N/A

N/A

ORC

ORC*

Glue*

ORC

Hive

N/A

N/A

Parquet

Parquet*

Glue*

Parquet

Hive

N/A

N/A

Avro

S3*

Glue*

HDFS

Hive

N/A

N/A

Available standard connectors by data source and file format

Table / File Format

Amazon S3

Glue

Hadoop File System 1

Hive

Another Kognitio System

Another Database via ODBC

Table

N/A

Glue

N/A

Hive*

Cross

ODBC

Flat file

S3

Glue

HDFS*

Hive*

N/A

N/A

JSON 2

S3

N/A

HDFS*

N/A

N/A

N/A

ORC

ORC*

Glue

ORC*

Hive*

N/A

N/A

Parquet

Parquet*

Glue

Parquet*

Hive*

N/A

N/A

Avro

S3

Glue

HDFS*

Hive*

N/A

N/A

Attention

 

To deploy Kognitio on MapR systems we now recommend using Containerized Kognitio .

Therefore the “Kognitio on MapR” product is no longer available to new users.

Existing users can still receive updates via support channels. Please contact us via the community forum or our support portal if you want to migrate to a containerized deployment.

 

Available Connectors by Data Source and File Format

Table / File Format

Amazon S3

Glue

Hadoop File System 1

Hive

Another Kognitio System

Another Database via ODBC

Table

N/A

Glue

N/A

Hive

Cross

ODBC

Flat file

S3

Glue

MAPRFS*

Hive

N/A

N/A

JSON 2

S3

N/A

MAPRFS*

N/A

N/A

N/A

ORC

ORC*

Glue

ORC*

Hive

N/A

N/A

Parquet

Parquet*

Glue

Parquet*

Hive

N/A

N/A

Avro

S3

Glue

MAPRFS*

In Development

N/A

N/A

Available Connectors by Data Source and File Format

Table / File Format

Amazon S3

Glue

Hadoop File System 1

Hive

Another Kognitio System

Another Database via ODBC

Table

N/A

Glue

N/A

Hive

Cross

ODBC

Flat file

S3

Glue

HDFS

Hive

N/A

N/A

JSON 2

S3

N/A

HDFS

N/A

N/A

N/A

ORC

ORC

Glue

ORC

Hive

N/A

N/A

Parquet

Parquet

Glue

Parquet

Hive

N/A

N/A

Avro

S3

Glue

HDFS

Hive

N/A

N/A

  1. The Kognitio HDFS, ORC and Parquet connectors use the generic interface provided by the Hadoop file system. They can therefore be used to access any filesystem supported by Hadoop and configured on your platform. This includes HDFS, MapR-FS, Microsoft Azure WASB and Amazon S3 amongst others.

  2. Kognitio treats JSON as a special format of flat file. Therefore you use the same connector but must provide additional information in the external table CREATE statement. For more details on parsing JSON in Kognitio see how to load JSON

  3. Kognitio treats Avro as a special format of flat file. Avro files contain a description of the data schema that is designed to be read from the file when it is accessed. The Kognitio HDFS connector automatically builds the table definition from the Avro schema when the format is specified in the target string of either the connector or external table. See the Avro quick reference sheet for more details. The Hive connector will detect when an Avro file is being used by an external table and will use the Avro format automatically.

Custom Connectors

In addition to the standard connectors, Custom Connectors can be built quite easily, e.g.

  • Basic connector - uses output from linux df command to provide information on disk usage figures on hardware running Kognitio

contact us via the community forum or our support portal if you have a maintenance contract.

References

Connectors:

  • Hive - Access ORC, Parquet and text files stored in Hive tables.

  • HDFS - Access data stored in HDFS.

  • MapR-FS - Access data stored in MapR-FS

  • S3 - Access data stored in Amazon S3 buckets.

  • ORC - Access data stored in ORC format on the Hadoop file system.

  • Parquet - Access data stored in Parquet format on the Hadoop file system.

  • Cross Instance - High speed access to data in another Kognitio database in the same server farm.

  • Unloader - Access data in another Kognitio database.

  • ODBC - Access data stored in a database using ODBC.

  • ODBC - MySQL - Specific instructions for acessing data in MySQL via ODBC.

Support: