Separate partitions

For Kognitio external scripts there are different ways to partition data to be processed. SEPARATE PARTITIONS inserts a blank line between partitions so that external scripts can know they’ve reached the end of one and the start of another.

There are three more partition strategies available in Kognitio and the corresponding script for the default, isolate and mix partition strategies are also available.

Example: python script using separate partitions to calculate averages

Consider the following external script using SEPERATE PARTITIONS:

CREATE EXTERNAL SCRIPT AvgOverPartID_Sep ENVIRONMENT python27
RECEIVES(part_id, invalue FLOAT) PARTITION BY part_id
SENDS(part_id, avg1 FLOAT, count1 FLOAT)
SEPARATE PARTITIONS
SCRIPT S'EOF(
#
import csv,sys
input = csv.reader(sys.stdin)
result = csv.writer(sys.stdout)
#
#Initialise process control vars no previous part_id needed
sum1=0 #Initialise sum
count1=0 #Initialise count
#
#Process one row at a time.
for row in input:
    # Check row is empty so blank line between part_ids
    if len(row)>0:
    # if row not increment sum and count
        partid=row[0]
        sum1=sum1+float(row[1])
        count1=count1+1
    # if empty write out avg and count and reset
    else:
        avg1=sum1/count1
        result.writerow([partid,avg1,count1])
        partid=''
        sum1=0
        count1=0
#
#write out final result after loop finished
if count1!=0:
    avg1=sum1/count1
    result.writerow([partid,avg1,count1])
)EOF';

In this external script, unlike in the default partition example, tracking the previous partition value is no longer needed. In order to identify the end of each partition the script only needs to checks for a blank line by testing the length of the row. These blank lines are inserted between each partitions automatically when using SEPERATE PARTITIONS.

The external script above produces identical results to the default partition strategy and can be used to produce product average prices using the Kognitio retail data set and the following SQL:

SELECT p.product_name
    ,dt1.avg1 avg_price
    ,CAST(dt1.count1 as INTEGER) numtrans
FROM ( EXTERNAL SCRIPT AvgOverPartID_Sep
       FROM ( SELECT prodno, price/100.00
              FROM demo_ret.v_ret_sale
              WHERE storeno BETWEEN 1 AND 5)
          ) dt1
JOIN demo_ret.v_ret_product p
ON dt1.part_id = p.prodno
ORDER BY 3 DESC;