Using R in external scripts
Posts: 39
Joined: Mon Jan 06, 2014 10:36 am

Kognitio and R. Threads and nodes

by skkirkham » Tue Jan 28, 2014 6:18 pm

Hi all

Part 2 of series of topics to introduce Kognitio External Scripting using R is now available.

Following on from the basics this pdf and accompanying examples looks at controlling the script invocations within Kognitio. This is particularly important with R where data sets storage in RAM can get large quickly.

The flexible external scripting interface means you can control how many (threads) and where (nodes) you want to run your R script processes. As in the python version we again concentrate on finding an average over all the rows of data passed to the script. The simplicity of this example allows us to focus on the Kognitio external script functionality. Due to R's "greedy" nature with RAM we take a look at how we can use the parallelism of Kognitio to allow the user to run over larger data sets, i.e. start thinking how to parallelise your analytics.

It is also important to remember: if you can code the problem in Kognitio using SQL do it as it will run faster. Why I hear you ask?
1) Kognitio SQL is extremely mature and the software has been designed as MPP from the start. This means Kognitio SQL optimiser utilises the system's parallelism in the most effective way
2) Kognitio SQL is converted into machine code for more efficient faster code execution.
3) When you write external scripts the code execution must be invoked after which data streams through the invocations. There is an overhead assoicated with this no matter how efficient your code. This will be particularly noticable for simpler tasks that execute quickly (like averaging)

The challenge I issued when we published the python module still stands- if anyone can create an external script in Kognitio that runs faster than an equivalent SQL query on Kognitio then I would love to hear about it.


Note: if you haven't done so already you will need to create an R script environment on your Kognitio system. An example of the script environment creation command is

Code: Select all

create script environment RSCRIPT command '/usr/local/R/bin/Rscript --vanilla --slave' 
Kognitio utilises the script version of R called Rscript that is located in the same directory as the standard R executable. The options minimise unnecessary actions and are designed for programs that use R to produce results (such as Kognitio). For more details see cran.r-project document: An Introduction to R. Appendix B: invoking R.
Reply with quote Top

Who is online

Users browsing this forum: No registered users and 1 guest