Isolate partitions

For Kognitio external scripts there are different ways to partition data to be processed. ISOLATE PARTITIONS starts a new invocation of the external script for each partition and makes it much easier to use than the others as the change in partitions don’t need to be handles by the script. However there is a small overhead cost when starting and stopping each of the isolated scripts which can amount to a lot when many partitions are present.

There are three more partition strategies available in Kognitio and the corresponding script for the default, seperate and mix partition strategies are also available.

Example: python script using isolate partitions to calculate averages

Consider the following external script using ISOLATE PARTITIONS:

CREATE EXTERNAL SCRIPT AvgOverPartID_Iso ENVIRONMENT python27
RECEIVES(part_id, invalue FLOAT)
PARTITION BY part_id
SENDS(part_id, avg1 FLOAT, count1 INTEGER)
ISOLATE PARTITIONS
SCRIPT S'EOF(
#
import csv,sys
input = csv.reader(sys.stdin)
result = csv.writer(sys.stdout)
#
#Initialise variables
sum1=0
count1=0
#
# Process one row at a time.
for row in input:
    # Check it is not empty
    if len(row)>0:
        # add value to sum and increment count
        sum1=sum1+float(row[1])
        count1=count1+1
#
# End of input. Get part_id and write average and count
if count1>0:
    partid=row[0]
    avg1=sum1/count1
    result.writerow([partid,avg1,count1])
)EOF';

Using the ISOLATE PARTITIONS strategy generally leads to the simplest form of code within the external script. Unlike default or seperate partition strategies there is no need to test for the start of a new partition. Kognitio automatically creates a new script invocation for each new partition value.

In this example the code is simply an initialisation of variables followed by processing the average row by row. The results, including the part_id value, are then written out at the end.

This example produces identical results to the code in the default partition strategy and can be used to produce product average prices using the Kognitio retail data set and the SQL below:

SELECT p.product_name
   ,dt1.avg1 avg_price
   ,CAST(dt1.count1 as INTEGER) numtrans
FROM ( EXTERNAL SCRIPT AvgOverPartID_Iso
       FROM ( SELECT prodno, price/100.00
              FROM demo_ret.v_ret_sale
              WHERE storeno BETWEEN 1 AND 5)
     ) dt1
JOIN demo_ret.v_ret_product p
ON dt1.part_id = p.prodno
ORDER BY 3 DESC;