Images: Quick Overview

You can create an image of any view or (internal) table in Kognitio. An image is essentially just the data defined in a view definition or a copy of an internal table that is held in memory.

Images are in memory

Kognitio images are always in memory. So working with data in images is much faster than data in disk based tables.

Images are snapshots

An image is a snapshot of the data in a table or the results of the SQL in a view. So if the data in the table changes after you make the image, the data in the image doesn’t automatically change.

Often this is OK because you’ll create an image for a particular task and use it immediately.

If you need to update an image, drop it and then recreate it. For example:

DROP VIEW IMAGE demo_ret.v_ret_sale;
CREATE VIEW IMAGE demo_ret.v_ret_sale;

There are different types of image

Images can be:
  • random
  • replicated
  • hashed

Note there are further options outlined in Table Images and View Images.

Random images

A Kognitio cluster consists of a number of nodes. Each node has a number of ramstores.

By default, Kognitio distributes the rows of an image evenly across all the ramstores on all the nodes. It writes rows in batches to ramstores chosen at random.

Suppose you create an image that has 3000 rows of data. If you have 4 nodes in your cluster and 6 ramstores per node, each ramstore should receive about 125 rows.

You can omit the RANDOM keyword because it’s the default:

CREATE VIEW IMAGE demo_ret.v_ret_sale RANDOM;

Replicated images

For relatively small tables or views (lookup tables, for example) you can create a replicated view:

CREATE VIEW IMAGE demo_ret.v_ret_sale REPLICATED;

Kognitio writes every row of the image to every ramstore on every node. This gives you excellent join performance at the expense of memory.

Hashed images

If you have a large table or view that is often joined or grouped by a particular column (or set of columns) then it is worth considering hashing the the image:

CREATE VIEW IMAGE demo_ret.v_ret_sale HASHED ON (BASKETNO);

This means that all data with a particular value for that column (BASKETNO in the example above) will be located in the same place in memory: in the same Kognitio ramstore. This means that when this column is used in a JOIN or GROUP BY clause the data is already in position for fast local processing. Data redistribution is minimized.

Note that hashing on a column with severely skewed data is often not possible as all data cannot be colocated. You may get an error: RS0001 Insufficient RAM for table / view. In this case you need to consider using partial hashing