Kognitio External Table Connectors¶
External table connectors are used in the creation of external tables which can access a wide variety of data sources. The connector contains the code required to connect to the data source and the external table provides the definition of exactly what data to access and (if it can’t be determined from the data source) what format that data is in.
For example, an external table connector which uses the Amazon S3 connector to read a CSV file will need to supply the column definitions it expects in the CSV file and possibly details about how to handle NULL values etc. An external table that uses the Hive connector only needs to specify the location of the Hive table (schema and tablename). The connector will determine the column definitions from the Hive metastore.
There a number of connectors available as standard within Kognitio. These are outlined in the table below. Depending on your Kognitio deployment option some of these will be pre-installed (marked with *):
The Kognitio HDFS, ORC and Parquet connectors use the generic interface provided by the Hadoop file system. They can therefore be used to access any filesystem supported by Hadoop and configured on your platform. This includes HDFS, MapR-FS, Microsoft Azure WASB and Amazon S3 amongst others.
Kognitio treats JSON as a special format of flat file. Therefore you use the same connector but must provide additional information in the external table
CREATEstatement. For more details on parsing JSON in Kognitio see how to load JSON
Kognitio treats Avro as a special format of flat file. Avro files contain a description of the data schema that is designed to be read from the file when it is accessed. The Kognitio HDFS connector automatically builds the table definition from the Avro schema when the format is specified in the target string of either the connector or external table. See the Avro quick reference sheet for more details. The Hive connector will detect when an Avro file is being used by an external table and will use the Avro format automatically.
In addition to the standard connectors, Custom Connectors can be built quite easily, e.g.
Basic connector - uses output from linux
dfcommand to provide information on disk usage figures on hardware running Kognitio
Hive - Access ORC, Parquet and text files stored in Hive tables.
HDFS - Access data stored in HDFS.
MapR-FS - Access data stored in MapR-FS
S3 - Access data stored in Amazon S3 buckets.
ORC - Access data stored in ORC format on the Hadoop file system.
Parquet - Access data stored in Parquet format on the Hadoop file system.
Cross Instance - High speed access to data in another Kognitio database in the same server farm.
Unloader - Access data in another Kognitio database.
ODBC - Access data stored in a database using ODBC.
ODBC - MySQL - Specific instructions for acessing data in MySQL via ODBC.