Single node restriction¶
Another Kognitio external script example where a single node restriction is used instead of the default behaviour. In this case, we want to know the R packages installed on the nodes for our advanced analytics in external scripting and (assuming the typical set up where all nodes have the same packages installed on each of them) we don’t need know the packages on more than one node because they will all be the same.
Example: using node restriction to find installed R¶
Re-visiting the R package listing external script introduced in
controlling invocations, it is possible to restrict the code to a single
LIMIT 1 THREAD PER NODE and restricting the nodes that the
script is executed on.
First find the name of the nodes from the Kognitio metadata using:
SELECT * FROM sys.ipe_nodeinfo;
We’re looking for the
WX2_NODE_NAME, pick one, copy it and make a note of the IP address in the
Now paste this into the
RUN ON clause in the SQL below. This
will restrict invocations to the node we’ve picked:
EXTERNAL SCRIPT using ENVIRONMENT rscript SENDS(node_name varchar(100),connector_num int,package_name varchar(100),version_no varchar(100),libpath VARCHAR(100),pkg_priority varchar(20)) REQUIRES 200 MB ram RUN ON '<your_node_name>' SCRIPT S'EOF( # #Get hostname and connector number cnum<- Sys.getenv("WX2_CONNECTOR_NUM") hname <- toString(as.data.frame(Sys.info()["nodename"])[1,1]) # #Get the package names,version and priority (base,recommended,NA) pkgs<-as.data.frame(installed.packages())[c(1:4)] # #Drop row names rownames(pkgs) <- NULL # #Print out result cat(file="",paste(hname,cnum,pkgs[,1],pkgs[,3],pkgs[,2],pkgs[,4],sep=",",collapse="\n"),"\n") )EOF' ;
This SQL is executed without creating an external script object on the system and is known as an anonymous or inline external script.
You’ll notice that the IP address in the node name matches the one we noted down from the
is the only one to appear. Try changing the node name to a different one and noting down the IP address.
This is not an efficient way to limit to a single thread and is for illustration purposes only.
As all invocations must go to the specified node this becomes a bottlneck if lots of queries used this script.
For single thread invocations it is better to use
LIMIT ONE THREADS so
that the resource scheduler can create the required single script invocation on any node, i.e. the one with most
available resource. This results in a more balance workload when the script is utilised in multiple concurrent