Single node restriction

Another Kognitio external script example where a single node restriction is used instead of the default behaviour. In this case, we want to know the R packages installed on the nodes for our advanced analytics in external scripting and (assuming the typical set up where all nodes have the same packages installed on each of them) we don’t need know the packages on more than one node because they will all be the same.

Example: using node restriction to find installed R

Re-visiting the R package listing external script introduced in controlling invocations, it is possible to restrict the code to a single thread using LIMIT 1 THREAD PER NODE and restricting the nodes that the script is executed on.

First find the name of the nodes from the Kognitio metadata using:

SELECT * FROM sys.ipe_nodeinfo;

We’re looking for the WX2_NODE_NAME, pick one, copy it and make a note of the IP address in the OS_NODE_NAME.

Now paste this into the RUN ON clause in the SQL below. This will restrict invocations to the node we’ve picked:

EXTERNAL SCRIPT using ENVIRONMENT rscript
SENDS(node_name varchar(100),connector_num int,package_name varchar(100),version_no varchar(100),libpath VARCHAR(100),pkg_priority varchar(20))
REQUIRES 200 MB ram
RUN ON '<your_node_name>'
SCRIPT S'EOF(
    #
    #Get hostname and connector number
    cnum<- Sys.getenv("WX2_CONNECTOR_NUM")
    hname <- toString(as.data.frame(Sys.info()["nodename"])[1,1])
    #
    #Get the package names,version and priority (base,recommended,NA)
    pkgs<-as.data.frame(installed.packages())[c(1:4)]
    #
    #Drop row names
    rownames(pkgs) <- NULL
    #
    #Print out result
    cat(file="",paste(hname,cnum,pkgs[,1],pkgs[,3],pkgs[,2],pkgs[,4],sep=",",collapse="\n"),"\n")
)EOF'
;

This SQL is executed without creating an external script object on the system and is known as an anonymous or inline external script.

You’ll notice that the IP address in the node name matches the one we noted down from the OS_NODE_NAME and is the only one to appear. Try changing the node name to a different one and noting down the IP address.

This is not an efficient way to limit to a single thread and is for illustration purposes only. As all invocations must go to the specified node this becomes a bottlneck if lots of queries used this script. For single thread invocations it is better to use LIMIT ONE THREADS so that the resource scheduler can create the required single script invocation on any node, i.e. the one with most available resource. This results in a more balance workload when the script is utilised in multiple concurrent queries.