Partitioning by column in external scripts¶

This advanced example shows how to organize the data sent to Kognitio external scripting with the default behaviour. Typically you’d want all the data of one kind to go to one invocation of an external script to be processed together, each of these defined by some column in your input data is called a partition.

Read about the different partition strategies and the advanced configuration for other ways to get the most from external scripts.

Example: python script using partitions to calculate averages¶

The script below shows a simple python example for calculating the average of the `invalue` field received by the script. The script is partitioned on the `part_id` field and this means an average (`avg1`) and count (`count1`) is calculated and sent back from the external script invocations for each part_id value:

```CREATE EXTERNAL SCRIPT AvgOverPartID_Def ENVIRONMENT python27
PARTITION BY part_id
SENDS( part_id, avg1 DECIMAL(8,2), sum1 DECIMAL(8,2),count1 INT)
REQUIRES 1 GB RAM
SCRIPT S'EOF(
import csv,sys
result = csv.writer(sys.stdout)
#
#Initialise process control vars
prevpartid=""   #Holds the previous partition id
sum1=0           #Initialise sum
count1=0         #Initialise count
#
# Process one row at a time.
for row in input:
# Check row is not empty and set nstop
if len(row)>0:
# Has part id changed?
# if not increment sum and count
if row[0]==prevpartid:
sum1=sum1+float(row[1])
count1=count1+1
# if it has write out sum and count and reset
# do not write out if it is first row of input or if the sum and count are still zero
else:
if prevpartid!="" and count1>0 and sum1>0:
avg1=sum1/count1
result.writerow([prevpartid,avg1,sum1,count1])
prevpartid=row[0]
sum1=float(row[1])
count1=1
#
#write out final result after loop finished
if count1!=0:
avg1=sum1/count1
result.writerow([prevpartid,avg1,sum1,count1])
)EOF';
```

Within the Python script the previous value of `part_id` is stored (in `prevpartid`) so that when the value of `part_id` changes the average and count is calculated and sent back to Kognitio then the script varaibles are re-initialised. Keeping track of changes in `part_id` value is necessary when the ```DEFAULT PARTITIONS``` strategy is used. See the partitioning strategies section to understand how other partitioning strategies can be applied to the same example.

Note

the data type of part_id is not defined, it is therefore taken from data type in the query results used as input into the external script. This data type is simply carried through to the results the script sends back to Kognitio because the values for part_id are just copied and not changed by the python code.

The script can be used to calculate averages of any field used as the partition field. For example, in the SQL query below, average product price is produced:

```SELECT p.product_name
,dt1.avg1 avg_price
,CAST(dt1.count1 as INTEGER) numtrans
FROM ( EXTERNAL SCRIPT AvgOverPartID_Def
FROM ( SELECT prodno, price/100.00
FROM demo_ret.v_ret_sale
WHERE storeno BETWEEN 1 AND 5)
) dt1
JOIN demo_ret.v_ret_product p
ON dt1.part_id = p.prodno
ORDER BY 3 DESC;
```

Note

```SELECT p.product_name