CREATE TABLE

CREATE TABLE is used to create Kognitio internal tables. These store data within Kognitio. This is the common way to hold data in Kognitio standalone. Internal tables will be stored on local disk. For Kognitio on Hadoop data is more likely to be accessed from external tables with the data stored in external file system such as HDFS. However, it is possible to create Kognitio internal tables in Kognitio on Hadoop provided that you have configured your Kognitio cluster to have enough space to hold the data within its allocated HDFS file space. See Cluster configuration for more details.

In its basic form the CREATE TABLE statement creates a new Kognitio internal table and defines the columns in it. By default, an image of the table is also placed in Kognitio RAM.

The user can also specify if and how a table should be distributed in RAM, and also generate the table definition from a SELECT statement.

It is also possible to create tables that are resident only in RAM or on disk.

Usage

CREATE [RAM ONLY] TABLE mytable
[({column-name [data-type]
               [{NOT NULL | NULL}]
               [{UNIQUE | PRIMARY KEY}]
               [references-spec]
               [DEFAULT default-spec]
               [CHECK (constraint-expression)],...
  [UNIQUE ({column-name},...) |
   PRIMARY KEY ({column-name},...) |
   FOREIGN KEY ({column-name},...) |
   CHECK ((constraint-expression),...)]})]
[DISK |
 IMAGE ({column-name},...)[RANDOM |
                           REPLICATED |
                           HASHED [ON]({column-name},...) [RANDOM |
                                                           REPLICATED [rvc-list |
                                                           VALUES (hash-value-list)]]]
                          [COMPRESSED]
                          [PARTITION IMAGE BY ({column-name}, …)]]
[SET SLABS TO slab-list |
              SCHEMA DEFAULT |
              ALL]
[FOR | FROM | AS select-statement]

Notes

  • RAM ONLY - use to create a RAM ONLY Table. These are used if you wish to load and manipulate data, but don’t need to keep a disk copy of the results of the initial load, or intermediate results and manipulations.

  • column-name - must start with a letter, but can contain digits and the underscore (_) character. The maximum permitted length is 128 characters. Where a select_statment is not specified, both column name s and data types must specified after the CREATE TABLE statement When a select-statement is used column-name is optional as they can be inferred from the select-statement.

  • data-type - specifies the type of data for the specific column, for example, CHARACTER (and length) or NUMERIC. If a select-statment is used then a data-type cannot be specified as it is inferred from the select-statment. Otherwise each column-name must have a data-type defined.

    For CHAR/VARCHAR data types the character set can be specified, see Using National Character Sets for more details.

  • NULL | NOT NULL - if a column is defined as NOT NULL, each row must contain a value for that column. A column defined as NULL doesn’t require a value for each row this is default behaviour. Normally, the keyword NULL is just omitted.

  • UNIQUE | PRIMARY KEY - is a unique identifier. It can be a single column or a combination of columns such that no two rows of the table have the same value for that column or column combination. No proper subset of the columns within that column combination has the uniqueness property (that is, none of the columns mentioned is irrelevant for unique identification purposes).

    Several columns could satisfy the requirements to be the PRIMARY KEY, but only one can be designated as such, the others will typically be specified as UNIQUE.

  • FOREIGN KEY - is a column or combination of columns in one base table T2, whose values are required to match values of the PRIMARY KEY in some other base table TI.

  • references-spec - specifies a column in another table, which the column you are creating refers to (referential integrity)

    REFERENCES table[({column-ref},...)]
                  [ON DELETE {RESTRICT | CASCADE | SET NULL | SET DEFAULT}]
                  [ON UPDATE {RESTRICT | CASCADE | SET NULL | SET DEFAULT}]
    

    Referential integrity forces values in one column to be the same as those in another. For example, suppose column A is declared as a FOREIGN KEY dependent on column B, the referenced key. Only values that appear in column B can be inserted into column A. You can set the following options for UPDATE and DELETE in column B:

    • CASCADE - automatically UPDATE/DELETE all matching foreign key values.

    • RESTRICT - inhibit UPDATE and/or DELETE if foreign key references remain.

    • SET DEFAULT - set foreign key to its default value.

    • SET NULL - set foreign key to NULL.

    Note that if you want to use referential integrity to maintain integrity during INSERT, UPDATE and DELETE operations, all relevant columns of all tables involved must be in RAM (for example, if you need to ensure a column is UNIQUE, it must be in RAM; if you need to ensure a set of columns form a PRIMARY KEY, those columns must be in RAM).

  • DEFAULT default-spec - specifies a default value to be placed in a column, where the user doesn’t provide a value on INSERT. This value can be a literal, a literal expression, or the keyword NULL. Note that IMPORT doesn’t use default-specs.

  • CHECK constraint-expression - can apply to multiple columns (table level) or to a single column (column level). Note that a CHECK constraint cannot reference another table.

    IMPORT doesn’t enforce CHECK constraints.

  • DISK - is specified if the table should be created on disk only. By default a Kognitio table is create both on disk and in RAM. Adding DISK into the CREATE TABLE statements means that a RAM IMAGE is not created. The default behaviour of creating a RAM image can be modified by changing the Kognitio system parameter “def_table_loc” value from 0 (RAM images) to 1 (disk only).

  • IMAGE ({column-name},...) - is used to define which columns will be placed in RAM when data is loaded or inserted into the table

  • RANDOM | REPLICATED | HASHED ON - specifies how the table will be held in RAM. For full details of options available when creating table images see CREATE TABLE IMAGE.

  • COMPRESSED - specifies that the image in RAM will be compressed. This can add considerable processing time to queries depending on the form of the data.For more details see: CREATE TABLE IMAGE.

  • PARTITION IMAGE BY - for very large data sets it may also be useful to partition the data in RAM to enable even faster reading. For more details see CREATE TABLE IMAGE.

  • SET SLABS - slabs are used in Kognitio to allow disk to separated for easier administration of available resources. See the Kognitio Guide PDF Guide for full details of disk store slabs.

  • FOR | FROM | AS - are used to create a table from a specified select-statement; that can be any SQL query.

    FOR builds a table definition and create an empty table using the select-statement.

    FROM and AS are idential they build a table definition, creates the table and populates it using the select-statement.

    Although the syntax allows you to create and populate a table in one step (using FROM or AS) Kognitio strongly recommend that you create the table first (using a FOR clause) and then populate it with a separate INSERT-SELECT statement. The reason is that all locks associated with the table creation continue to be held while the table is populated; which may take many minutes for large data sets.

    When using a select-statement to create a table it is possible to specify some ordering which should improve the efficiency of accessing data directly from disk. Adding an ORDER BY to the INSERT-SELECT or CREATE TABLE AS statement will cause individual nodes to order results before writing to disk. Kognitio processes data, including writing to disk, in parallel so the ordered results from each node will be interleaved onto disk. Therefore Kognitio ordering to disk is referred to as partial ordering. It will improve future reads of the data but does not guarantee absolute ordering.

Example 1: Simple Tables

The PART table is one of the five related tables (part, supplier, partsupp, customer and ordertab) used in the Kognitio Training Demo throughout this documentation. The tables have certain columns in common, so for example, there is a partkey column in both the part and partsupp tables. A system of prefixes distinguishes columns with similar names in the different tables, so for example, all the columns in the part table start with p_ and all the columns in partsupp start with ps_:

CREATE TABLE part( p_partkey INT NOT NULL,
                   p_name VARCHAR(55) NOT NULL,
                   p_mfgr CHAR(25) NOT NULL,
                   p_brand CHAR(10) NOT NULL,
                   p_type VARCHAR(25) NOT NULL,
                   p_size INTEGER NOT NULL,
                   p_container CHAR(10) NOT NULL,
                   p_retailprice DECIMAL(12, 2) NOT NULL,
                   p_comment VARCHAR(23) NOT NULL,
                   PRIMARY KEY(P_PARTKEY))

The part table includes the keywords NOT NULL for all columns, so every row must have an entry for every column. Since p_partkey is a key column, no rows can have a duplicate value.

The primary key: p_partkey is used to form a relational link to the partsupp table:

CREATE TABLE partsupp(PS_PARTKEY int not NULL,
                      PS_SUPPKEY int not NULL,
                      PS_AVAILQTY integer not NULL,
                      PS_SUPPLYCOST decimal(12,2) not NULL,
                      PS_COMMENT varchar(199) not NULL,
                      PRIMARY KEY (PS_PARTKEY, PS_SUPPKEY));

Note the partsupp` table also has a primary key, but it is a combination key involving two columns (``ps_partkey and ps_suppkey). This means that uniqueness is determined by the combined values in the two columns, so there can be duplicate values for ps_partkey but not for ps_partkey and ps_suppkey in combination. Consequently, a part can be supplied by more than one supplier, but all parts are unique for a supplier.

Example 2: Default Values

This alternative definition of the partsupp table has a default value ('no comment') in the ps_comment column:

CREATE TABLE partsupp( ps_partkey INT,
                       ps_suppkey INT,
                       ps_availqty INTEGER,
                       ps_supplycost decimal(12, 2),
                       ps_comment VARCHAR(199) DEFAULT 'no comment')
                       PRIMARY KEY (PS_PARTKEY, PS_SUPPKEY));

Example 3: Check Constraints

Another definition of the partsupp table has a check constraint on the partkey column, which checks that any value you enter is over 100:

CREATE TABLE partsupp( ps_partkey INT NOT NULL,
                       CHECK(ps_partkey > 100),
                       ps_suppkey INT NOT NULL,
                       ps_availqty INTEGER NOT NULL,
                       ps_supplycost DECIMAL(12, 2) NOT NULL)

Example 4: References Spec

The customer table has a column c_nationkey. n_nationkey is the PRIMARY KEY for the table nation. We can use referential integrity to ensure that no customer record can be inserted into the customer table, unless there is already a record giving the nation details in the nation table.

Add a reference-spec to the nation table in the customer table as follows:

CREATE TABLE customer( c_custkey INT NOT NULL,
                       c_name VARCHAR(25),
                       c_address VARCHAR(40),
                       c_postcode CHAR(9),
                       c_nationkey INTEGER REFERENCES nation.n_nationkey,
                       c_phone CHAR(15),
                       c_acctbal DECIMAL(12, 2),
                       c_flags INT,
                       PRIMARY KEY(c_custkey))

Note this can also be done with a FOREIGN KEY definition at the end of the table definition.

There is a performance penalty if referential integrity is used.

Example 5: Tables with More Than One Unique Group

The following table is defined with two unique groups. The first is the primary key (id) and the second group includes name and schema_id:

CREATE TABLE ipe_table( schema_id INTEGER NOT NULL,
                        name CHAR(32) NOT NULL,
                        id INTEGER NOT NULL,
                        owner INTEGER NOT NULL,
                        type CHAR(1) NOT NULL,
                        create_time TIMESTAMP(0),
                        PRIMARY KEY(id),
                        UNIQUE(name, schema_id),
                        FOREIGN KEY(schema_id) REFERENCES ipe_schema)

The Kognitio Guide PDF gives additional information on unique groups.