Sending header information to an external script

This advanced example shows how to send column header information to a Kognitio external scripting, useful if the script environments needs names references rather index to access or identify the data. See the advanced configuration for other ways to get the most from external scripts.

Example: passing header information into R for use in model building

In this anonymous inline external script the column_headers attribute is used in the INPUT syntax to pass the column names into the R external script where they are used explicitly in the linear model function (lm) call to produce a daily sales forecast.

EXTERNAL SCRIPT using ENVIRONMENT rscript
RECEIVES(saledate DATE, dow_id VARCHAR(3), day_id INTEGER, part_id INTEGER, dailysales INTEGER )
INPUT 'column_headers on'
ORDER BY part_id, day_id
SENDS (r_output VARCHAR(10000) )
LIMIT 1 THREADS
REQUIRES 400 mb RAM
SCRIPT S'EOF(
#
# Read in the data
input1<-read.csv(file=file("stdin"), header=TRUE)
#
# Fit linear model to row id and day of week
fit1=lm(DAILYSALES ~ DAY_ID+DOW_ID, input1)
#
# assign summary of linear model to sum1
sum1<-summary(fit1)
#
# Get quantiles of residuals
quant1<-quantile(fit1$residuals)
#
# Get probability associated with F statistic
pval1<-pf(summary(fit1)$fstatistic[1], summary(fit1)$fstatistic[2],
summary(fit1)$fstatistic[3], lower.tail=FALSE)
#
# write output using cat function
cat(file="", paste("coeff", input1$PART_ID[1],names(fit1$coefficient), fit1$coefficients, sum1$coefficients[,2],sum1$coefficients[,3], sum1$coefficients[,4], sep="|",collapse="\n"), "\n")
cat(file="", paste("residual", input1$PART_ID[1], sum1$sigma,quant1[1], quant1[2], quant1[3], quant1[4], quant1[5], sep="|",collapse="\n"), "\n")
cat(file="", paste("modeldesc", input1$PART_ID[1],paste(fit1$call)[1], paste(fit1$call)[2], sep="|", collapse="\n"),"\n")
cat(file="", paste("model", input1$PART_ID[1], input1$SALEDATE,input1$DAILYSALES, fit1$fitted.values, fit1$residuals, sep="|",collapse="\n"), "\n")
cat(file="", paste("stats", input1$PART_ID[1], sum1$d[2],sum1$r.squared, sum1$adj.r.squared, sum1$fstatistic[1],sum1$fstatistic[2], sum1$fstatistic[3], pval1, sep="|",collapse="\n"), "\n")
)EOF'
from (
       select date '2017-01-01', 'mon',1,1, value from values between 1 and 25
       union all
       select date '2017-01-02', 'tue',2,1, value from values between 26 and 50
);

Passing column headers in the R scripts means we can refer to the columns by name instead of by index. This means that the R script is generally easier to write and to follow. Explicit columns can be referenced easily and new columns can be added to the input without effecting the processing.