CREATE VIEW IMAGE

Use the CREATE VIEW IMAGE statement to create a RAM image of a view. A view image is queried in the same way as a table image. Where columns are either calculated or derived from other columns, results are fetched significantly faster than for a normal view (that isn’t in RAM), since each node has immediate access to the rows held in its RAMStore.

Usage

CREATE VIEW IMAGE image[(column-list)]
[AS SELECT select-list]
[COMPRESSED]
[PARTITION IMAGE BY (column-list)]
[ORDER IMAGE BY (column-list)]

CREATE VIEW IMAGE image[(column-list)]
HASHED [ON](column-list)
[AS SELECT select-list]
[COMPRESSED]
[PARTITION IMAGE BY (column-list)]
[ORDER IMAGE BY (column-list)]

CREATE VIEW IMAGE image[(column-list)]
REPLICATED
[AS SELECT select-list]
[COMPRESSED]
[PARTITION IMAGE BY (column-list)]
[ORDER IMAGE BY (column-list)]

CREATE VIEW IMAGE image[(column-list)]
HASHED [ON](column-list)
RANDOM | REPLICATED
[IN (select-list) | VALUES (RVC-list) | (hash-value-list)]
[COMPRESSED]
[PARTITION IMAGE BY (column-list)]
[ORDER IMAGE BY (column-list)]

Notes

For additional information on the final form above, which creates a partially hashed distribution, see CREATE TABLE IMAGE.

It is possible to compress data in memory for user-created images.

Images can be broken down into a number of partitions to reduce the need for full scans of the data in RAM; only the partitions required to satisfy a query are scanned. If the user has selective predicates in a query then ensuring these are specified in the PARTITION IMAGE BY clause will restrict the rows to be scanned to a qualified sub-set of partitions in the image and so improve query performance; users are strongly advised to use partitioning in these circumstances.

For more details of compression, sorting and partitioning see the Kognitio Guide in PDF Documentation.

Best practice for creating images

Although the CREATE VIEW IMAGE syntax allows you to create a view and
its image in one step, Kognitio recommend that you create the view first,
and then create the image. The reason is that all locks associated with the view creation
continue to be held for the duration of the image creation, which may take many minutes for large images.

For example, suppose you want to create a new view and view image (called NEWVIEW). If you give the following type of command:

CREATE VIEW IMAGE newview
AS SELECT ...

the view and image are created in one step, and locks are held during the select process. However, if you give the command:

CREATE VIEW newview AS
SELECT ...

followed by:

CREATE VIEW IMAGE newview

Locks are released before the possibly time-consuming selection process.

A view image has a similar relationship to the underlying view as a table image has to the underlying table. (Tables and views themselves are very different, as tables physically exist, but a view is only a definition.) However a view image will be a snapshot of the view at the point in time the image is created, whereas a table image is always synchronised with the underlying table with any updates to or deletes from the table being reflected in the image.

If you give the CREATE IMAGE command following a Kognitio restart, any view (or table) images established prior to the shutdown are recreated automatically.

Using a view image lets users load complex views of data into RAM. Scanning these will be faster because any complex joins or aggregations are only performed when the image is generated. View images are also more economical with space than table images, as the row headers for view images are smaller. Views can be built up in stages. If images of intermediate views exist, they are used to create the final view image. For more information, see the Kognitio Guide.

HASHED and Replicated Clauses

On Kognitio, data is normally distributed randomly across all RAMStores. The Kognitio extensions to SQL provide alternate distributions that allow certain joins to operate more efficiently. These distributions are hashed and replicated.

Use the HASHED clause to select the columns to use as keys to hash distribute an image across the available RAM. If two table/view images have columns with identical values and each is selected as a hash key so that both images are hash distributed, then pairs of rows with the same key from the two images end up on the same RAMStore. Any query involving a join between the two hashed images runs fast, because the rows being joined are located together on the same RAMStore.

The key columns for each view must be identical, unless the column is a string or an integer. (For strings the key columns forming the pair could be a CHAR on one table and a VARCHAR on the other, and pairs of integers might be INT4 and INT8.)

The REPLICATED clause specifies that a complete copy of a selected view is loaded onto every available RAMStore in the processing network.

Partial Distributions

Partial distributions handle joins when a large view is severely skewed. Partial hashing provides an alternative to straightforward hashing. There are two forms

  • Partial hashed/random
  • Partial hashed/replicated.

Partial hashing makes use of a list of exception values, which identify the values that are causing the skewing. The exception values can be obtained using the HASH_VALUE and HASH_MPID functions. Most rows hash in the same way as a normal hashed distribution, but if a value is an exception, then depending on the type of partial distribution, it is either

  • Given to a random RAMStore (partial hashed/random), or
  • Replicated (Partial hashed/replicated).

Example: Basic View Images

Create View Images for the views created as examples in Create View reference:

CREATE VIEW IMAGE uk_customers HASHED ON (custno);
CREATE VIEW IMAGE cust_short_pcode HASHED ON (custno);
CREATE VIEW IMAGE supp_nation REPLICATED;
CREATE VIEW IMAGE smalltown_suburbs HASHED ON(custno)

The three views all containing the customer information are hashed on custno column. This is to faciliate fast joining on this column. Any join containing this column can now be done locally in Kognitio without having to re-distribute or repliacte the data to satisfy the join. The custno is a good choice for hashing in these examples as there are likely to be lots of distinct customers in the data. This means that data is more likely to be evenly distributed on this column meaning any processing required for the join will be evenly distributed. The supp_nation view is replicated. It is a look-up view holding information about the suppliers. There are less suppliers than customers so this view is small. Therefore it is replicated so that when it joined to large views containing the supplier identifier this can be carried out without having to redistribute or replicate data to satisfy the join in the query processing.