CREATE TABLE IMAGE

Use the CREATE TABLE IMAGE statement to set up a RAM image of a table or selected columns from a table. Any changes to the table are reflected in RAM as well as on disk. Because the image is in RAM, queries run significantly faster on a table image. For more information on table images, see the Kognitio Guide.

Note: When you create a table, by default, a RAM image is also created. It is only possible to create one table image of any particular table at any one time.

Usage

--Standard Image
CREATE TABLE IMAGE table[(column-list)]
[COMPRESSED]
[PARTITION IMAGE BY (column-list)]
[WHERE <local predicates>]

--Replicated Image
CREATE TABLE IMAGE table REPLICATED
[COMPRESSED]
[PARTITION IMAGE BY (column-list)]
[WHERE <local predicates>]

--Hashed Image
CREATE TABLE IMAGE table[(column-list)] HASHED [ON] (column-list)
[COMPRESSED]
[PARTITION IMAGE BY (column-list)]
[WHERE <local predicates>]

--Partial Hashed Image
CREATE TABLE IMAGE table[(column-list)] HASHED [ON](column-list) RANDOM | REPLICATED
                                        [IN (select-list) | VALUES(RVC-list) | (hash-value-list)]
[COMPRESSED]
[PARTITION IMAGE BY (column-list)]
[WHERE <local predicates>]

On Kognitio, when data is brought into RAM by creating an IMAGE the default behaviour is to distribute the data randomly across all RAMStores. The Kognitio extensions to SQL provide alternate distributions and imaging strategies that allow certain processing to operate more efficiently.

Replicated Image

The REPLICATED clause specifies that a complete copy of a selected table is loaded onto every RAMStore. It is most commonly used to perform a join between two tables, where one table is very large and the second (replicated) table is much smaller. A table image can be both fragmented (see below) and replicated.

Hashed Image

Use the HASHED clause to select a column (or columns) to use as a key to hash distribute an image across the available RAM. If two table/view images have columns with identical values and each is selected as a hash key so that both images are hash distributed, then pairs of rows with the same key from the two images are placed on the same RAMStore. Any query involving a join between the two hashed images based on hashed column(s) runs efficiently, because the rows being joined are located together on the same RAMStore.

In addition to speeding up join performance any query involving all the HASHED column(s)in a GROUP BY statement will also run more efficiently .Due to the co-location of the rows for each distinct value(s) of the HASHED column(s) the GROUP BY can be satisfied locally without data re-distribution.

Compression and Partitioning Images

It is also possible to compress data in memory for user-created images using the COMPRESSED option. The rate of compression is highly dependent on the structure of the data.

When the objects held in Kognitio RAM Images are large these can be broken down into a number of partitions to reduce the need for full RAM scans using the PARTITION IMAGE clause. Only the partitions required to satisfy a query are scanned. If the user has selective predicates in a query then if these were also specified in the PARTITION IMAGE BY clause Kognitio will automatically restrict the rows to be scanned to a qualified sub-set of partitions in the table/view image and improve query performance. users are strongly advised to use partitioning in these circumstances.

Using partitioning also improves compression. For more details of compression and partitioning see the :ref:Kognitio PDF Guide.

The optional WHERE clause at the end of the statement allows only rows that match the provided local predicates to be in RAM, with other rows just existing on disk. Updates will put new rows into the appropriate location. This is called a horizontally fragmented image, as opposed to a vertically fragemented image where only a subset of columns is put into RAM).

If both the table images are dropped, the compiler may create temporary copies of the hashed table images for you, depending on the query and the data demographics.

Partial Hashing for skewed data

If one or more values occur far more frequently than others, for example NULL or a default, then the hashed image may be skewed in memory, and occupy far more space on one RAMStore than on the others (assuming that it fits at all).

Partial distributions handle joins when a large table is severely skewed. Partial hashing provides an alternative to straightforward hashing. There are two forms

  • Partial hashed/random
  • Partial hashed/replicated.

Partial hashing makes use of a list of exception values, which identify the values that are causing the skewing. Most rows hash in the same way as a normal hashed distribution, but if a value is an exception, then depending on the type of partial distribution, it is either

  • Given to a random RAMStore (partial hashed/random), or
  • Replicated (Partial hashed/replicated).

The final statement in the usage section is used to create partially hashed distributions. One way to specify the exception values is as an RVC-list, meaning that individual values are enclosed in parentheses. This allows distributions that are hashed on more than one column to be defined. This is illustrated by the following extract from a worked example in the Kognitio Guide that discusses partial distributions in detail.:

CREATE VIEW IMAGE telco_sumview HASHED ON(Orig_State)
                                REPLICATED VALUES(('NY'), ('CA'), ('PA'), ('NJ'),
                                                  ('IL'), ('MD'), ('MA'), ('TX'))

It is also possible to specify the exception values via a SELECT statement. This is frequently the best way to identify the values in a production environment where the distribution of the underlying data may be constantly changing. For example:

CREATE VIEW IMAGE telco_sumview HASHED ON(Orig_State)
                                REPLICATED IN (SELECT Orig_State
                                               FROM (SELECT Orig_State, COUNT(Orig_State) cs, MAX(x) mx
                                                     FROM telco_demo,
                                                     (SELECT COUNT(*)
                                                      FROM telco_demo) AS dt(x)
                                                      GROUP BY Orig_State) AS dt2
                                                      WHERE 100.0 * cs / mx > 5.0);

SQL does not permit the use of aliases in the WHERE or HAVING clauses of a SELECT, this could result in some expressions being repeated; making the SQL more verbose and prone to error if the expressions are complex and/or regularly changed. To avoid this problem the above example uses derived tables so that the expressions are specified only once.

HAVING clauses and derived tables are both described later in this document and in the Kognitio Guide.

Vertically Fragmented Table Images

There may be insufficient space to fit the complete image of a large table into RAM. If this happens, you can make a vertically fragmented image of the table, which loads only the most frequently accessed columns into RAM and leaves the rest on disk. The data on disk is still accessible.

Hashed and replicated images, partial hashing, and vertically fragmented images are discussed in detail in the Kognitio Guide.

Example 1: Dropping and Creating a Table Image

Create a table image of the CUSTOMER table. When you create the table, by default a RAM image is created at the same time, so this image must be dropped before you create another one. (There are various reasons for dropping an image, including making space available for an image of another large table or view, or wishing to instantiate a replicated or hashed image as in Examples 2 and 3.):

DROP TABLE IMAGE customer

Once the image has been dropped, you can give the CREATE TABLE IMAGE command to create the image with a different distribution:

CREATE TABLE IMAGE customer REPLICATED

Note: It is possible to specify the required table image distribution when the table is created, however the default behaviour will still be as specified above.

Example 2: Replicated Table Images

The CUSTOMER and SUPPLIER tables each have a column for nation (s_nationkey and c_nationkey), with an INT data type. The NATION table can be used to map country names to these identifiers. Since the NATION table will be small a replicated image is appropriate:

CREATE TABLE IMAGE nation REPLICATED

Example 3: Hashed Distribute Two Fragmented Table Images

This example creates fragmented, hashed tables images for the PART and PARTSUPP tables. The columns part.p_partkey and partsupp.ps_partkey are the key columns used to hash distribute the tables. Pairs of rows with the same hash key from the two tables are placed on the same RAMStores and so these tables will be distributed in a way suitable for joining on their partkey columns.

You must drop existing table images for PART and PARTSUPP before you can create the hashed table images, so the following may be necessary:

DROP TABLE IMAGE part;
DROP TABLE IMAGE partsupp;

Now create the new images:

CREATE TABLE IMAGE part(p_partkey, p_name, p_mfgr, p_type,
                        p_size, p_container, p_retailprice)
HASHED ON(p_partkey);

CREATE TABLE IMAGE partsupp(ps_partkey, ps_suppkey,
                            ps_availqty, ps_supplycost)
HASHED ON(ps_partkey);