Forum

General discussion on using the Kognitio Analytical Platform.
gnanasekar

Different type of imaging techniques in kognitio

by gnanasekar » Thu Mar 22, 2018 9:00 am

Hello All,

Can anyone please explain me what are all the imaging techniques available in kognitio on hadoop

Would be great if you cover the below techniques.

1) Replicated

2) REPLICATED PARTITION IMAGE BY (column_name)

3) HASHED ON (column_name) PARTITION IMAGE BY (column_name)

Thanks in advance.
Reply with quote Top
Contributor
Offline
User avatar
Posts: 386
Joined: Thu May 23, 2013 4:48 pm

Re: Different type of imaging techniques in kognitio

by markc » Fri Mar 23, 2018 8:54 am

You can find information on images in the documentation at:
https://kognitio.com/documentation/late ... rview.html
https://kognitio.com/documentation/late ... age-create
https://kognitio.com/documentation/late ... age-create
http://www.kognitio.com/forums/latest_810_pdf.zip - if you extract the Kognitio Guide from this ZIP file and look in section 2.2 it has a lot of information on RAM images.

In summary, an image can be distributed as
  1. random - rows can be put on any ram store
  2. replicated - rows have a copy put on every ram store
  3. hashed - rows are put on a ram store based on the values in the columns used for hashing - so row with common values for those columns will be colocated on the same ram store

As discussed in the links above on table and view images, there are then further options including:
  1. dealing with skew when hashing - see the section on partial hashing
  2. reducing scan time for large objects by providing a partitioning clause for an image
  3. compressing images to reduce the space used for storing them
  4. sorting images, which is primarily used to improve compression currently, but can allow for other Kognitio optimizations in future.
Reply with quote Top
Single Poster
Offline
User avatar
Posts: 1
Joined: Thu Dec 06, 2018 8:52 am

Re: Different type of imaging techniques in kognitio

by kaduswapnali456 » Fri Dec 07, 2018 8:35 am

replicated - here a copy of the object is placed in every ram store process. This is typically used for dimension objects to allow them to be joined to large objects, regardless of whether those objects are randomly distributed or hashed.

partitioned (deciding whether to partition or not is independent of whether you are replicating/randomising/hashing) - this allows the ram store to partition on an attribute. The main benefit is that partitions can be eliminated on scans, reducing the amount of data processed. Note the further comments in the documentation on partitioning though.

hashed - hashing on an attribute allows data to be distributed according to that attribute value. For example, in a retail example you might hash the customer table by customer_id, and do the same with the transaction table, then any given transaction is located on the same ram store as the relevant customer record. Note that this distribution is prone to skewing; so consult the documentation for details on using partial distributions to defeat skew.
Reply with quote Top

Who is online

Users browsing this forum: No registered users and 1 guest

cron