Kognitio External Table Connectors

External table connectors are used in the creation of external tables which can access a wide variety of data sources. The connector contains the code required to connect to the data source and the external table provides the definition of exactly what data to access and (if it can’t be determined from the data source) what format that data is in.

For example, an external table connector which uses the Amazon S3 connector to read a CSV file will need to supply the column definitions it expects in the CSV file and possibly details about how to handle NULL values etc. An external table that uses the Hive connector only needs to specify the location of the Hive table (schema and tablename). The connector will determine the column definitions from the Hive metastore.

Standard Connectors

There a number of connectors available as standard within Kognitio. These are outlined in the table below. Depending on your Kognitio deployment option some of these will be pre-installed (marked with *):

Kognitio on AWS

Available standard connectors by data source and file format
Table / File Format Amazon S3 Glue Hadoop File System 1 Hive Another Kognitio System Another Database via ODBC
Table N/A Glue* N/A Hive Cross ODBC
Flat file S3* Glue* HDFS Hive N/A N/A
JSON 2 S3* N/A HDFS N/A N/A N/A
ORC ORC* Glue* ORC Hive N/A N/A
Parquet Parquet* Glue* Parquet Hive N/A N/A
Avro S3* Glue* HDFS Hive N/A N/A

Kognitio on Hadoop

Available standard connectors by data source and file format
Table / File Format Amazon S3 Glue Hadoop File System 1 Hive Another Kognitio System Another Database via ODBC
Table N/A Glue N/A Hive* Cross ODBC
Flat file S3 Glue HDFS* Hive* N/A N/A
JSON 2 S3 N/A HDFS* N/A N/A N/A
ORC ORC* Glue ORC* Hive* N/A N/A
Parquet Parquet* Glue Parquet* Hive* N/A N/A
Avro S3 Glue HDFS* Hive* N/A N/A

Kognitio on MapR

Available Connectors by Data Source and File Format
Table / File Format Amazon S3 Glue Hadoop File System 1 Hive Another Kognitio System Another Database via ODBC
Table N/A Glue N/A Hive Cross ODBC
Flat file S3 Glue MAPRFS* Hive N/A N/A
JSON 2 S3 N/A MAPRFS* N/A N/A N/A
ORC ORC* Glue ORC* Hive N/A N/A
Parquet Parquet* Glue Parquet* Hive N/A N/A
Avro S3 Glue MAPRFS* In Development N/A N/A

Kognitio standalone

Available Connectors by Data Source and File Format
Table / File Format Amazon S3 Glue Hadoop File System 1 Hive Another Kognitio System Another Database via ODBC
Table N/A Glue N/A Hive Cross ODBC
Flat file S3 Glue HDFS Hive N/A N/A
JSON 2 S3 N/A HDFS N/A N/A N/A
ORC ORC Glue ORC Hive N/A N/A
Parquet Parquet Glue Parquet Hive N/A N/A
Avro S3 Glue HDFS Hive N/A N/A
  1. The Kognitio HDFS, ORC and Parquet connectors use the generic interface provided by the Hadoop file system. They can therefore be used to access any filesystem supported by Hadoop and configured on your platform. This includes HDFS, MapR-FS, Microsoft Azure WASB and Amazon S3 amongst others.
  2. Kognitio treats JSON as a special format of flat file. Therefore you use the same connector but must provide additional information in the external table CREATE statement. For more details on parsing JSON in Kognitio see how to load JSON
  3. Kognitio treats Avro as a special format of flat file. Avro files contain a description of the data schema that is designed to be read from the file when it is accessed. The Kognitio HDFS connector automatically builds the table definition from the Avro schema when the format is specified in the target string of either the connector or external table. See the Avro quick reference sheet for more details. The Hive connector will detect when an Avro file is being used by an external table and will use the Avro format automatically.

Custom Connectors

In addition to the standard connectors, Custom Connectors can be built quite easily, e.g.

  • Basic connector - uses output from linux df command to provide information on disk usage figures on hardware running Kognitio

contact us via the community forum or our support portal if you have a maintenance contract.

References

Connectors:

  • Hive - Access ORC, Parquet and text files stored in Hive tables.
  • HDFS - Access data stored in HDFS.
  • MapR-FS - Access data stored in MapR-FS
  • S3 - Access data stored in Amazon S3 buckets.
  • ORC - Access data stored in ORC format on the Hadoop file system.
  • Parquet - Access data stored in Parquet format on the Hadoop file system.
  • Cross Instance - High speed access to data in another Kognitio database in the same server farm.
  • Unloader - Access data in another Kognitio database.
  • ODBC - Access data stored in a database using ODBC.
  • ODBC - MySQL - Specific instructions for acessing data in MySQL via ODBC.

Support: