Forum

Using R in external scripts
Contributor
Offline
User avatar
Posts: 25
Joined: Fri Nov 29, 2013 9:31 pm

Scripting in R

by quant123 » Tue Dec 24, 2013 11:17 pm

Hi

Are there any guides, examples or tutorials on using R scripting in Kognitio?

Thanks.
Reply with quote Top
Contributor
Offline
User avatar
Posts: 386
Joined: Thu May 23, 2013 4:48 pm

Re: Scripting in R

by markc » Wed Dec 25, 2013 11:10 am

There is an overview of external scripting in the release notes, and some examples using R in the full documentation (in the Kognitio Guide I believe). Both those are available as attachments to this topic:

http://www.kognitio.com/forums/viewtopic.php?f=2&t=3

If you have specific questions which are not covered in those documents, please feel free to post them.

Regards,
Mark.
Reply with quote Top
Contributor
Offline
User avatar
Posts: 25
Joined: Fri Nov 29, 2013 9:31 pm

Re: Scripting in R

by quant123 » Sat Jan 04, 2014 4:54 pm

Mark,

Thank you for the links. I found the most helpful examples in kognitio-guide-v80100.pdf

I was playing with both R and Python scripting and this is in indeed a very powerful feature. For me, it's a huge benefit being able to run scripts within the SQL query. This saves me from having to make external programs to do calculations that can't be done with SQL. Not to mention parallel processing..

However, I ran into a problem trying to load external libraries into R script.
For example, below is a query with simple R script:

DROP EXTERNAL SCRIPT rtest1;
CREATE EXTERNAL SCRIPT rtest1
ENVIRONMENT rsint
RECEIVES(...)
SENDS(priceout INTEGER, closeout INTEGER, sumout VARCHAR )
SCRIPT S'endofr(
options(error = expression(q("no")))
#library(methods)

mydata<-read.csv(file=file("stdin"), header=FALSE)
sink(, type="message")
dim1<-dim(mydata)
mydata$V3<-mydata$V2+10

output1<-array(0,c(dim1[1],3))
output1[,1]<- mydata$V1
output1[,2]<- mydata$V2
output1[,3]<- mydata$V3

#write.table(mydata, row.names = FALSE, col.names = FALSE, sep = "," )
write.table(output1, row.names = FALSE, col.names = FALSE, sep = "," )

)endofr';

and then to run the script, use: EXTERNAL SCRIPT rtest1 FROM (SELECT 55, 25);


The script works fine unless I try to load a library (i.e. if I uncomment the "#library(methods)". Same problem with any other library.
Am I missing a step? Do I need to do anything else to load external libraries wen using R script?

Note that loading libraries, etc works fine when using RStudiio IDE on the same computer.

Thank you.
Reply with quote Top
Contributor
Offline
User avatar
Posts: 386
Joined: Thu May 23, 2013 4:48 pm

Re: Scripting in R

by markc » Sat Jan 04, 2014 7:35 pm

Thanks for the feedback.

I've asked someone from our Analytical team to respond on this. In the interim, the first thing that occurs to me is that your Kognitio system does not have the libraries installed on every node. I'm not sure if you are using a 1 node system, with that 1 node being the "same computer" you mention in the previous comment - if that is the case, I am barking up the wrong tree.

Could you add some more detail on that (i.e. are you on a 1 node Kognitio system, and is that the "same computer" you refer to which works fine with the RStudio IDE), and also indicate how the script fails when you include libraries. Can you also tar up the contents of /var/log/wx2 and attach them to this topic, as that will let us see any extra information logged in the smd directory when you try this - if you could also let us know the date and time when you had a script with libraries fail, we can check the log files for that time.

Regards,
Mark.
Reply with quote Top
Contributor
Offline
User avatar
Posts: 25
Joined: Fri Nov 29, 2013 9:31 pm

Re: Scripting in R

by quant123 » Sun Jan 05, 2014 3:36 am

Mark,
Thanks for the reply.
wx2 tar attached.
on January 4th: 10:00 PM - ran successful query without "library(methods)"
10:02 - ran CREATE SCRIPT with using the "library("methods")
10:03 - ran the script, failed to return any data

I am using a single node and the libraries are installed on it since I can call them with R IDE that runs on the same computer. So, only possibility would be that the R script for some reason does not see the same libraries. Then again, I tried various libraries and the library "methods" that I tried comes pre-loaded with R so I am thinking that script should see it. I did try using both 32 and 64-bit R script with same results.

In this case when the script fails, it does not return any errors at all - just no results (no data returned at all).

Let me know if you need more info.

Thanks.
Attachments
wx2.tar.gz
(2.54 MiB) Downloaded 801 times
Reply with quote Top
Contributor
Offline
Posts: 38
Joined: Mon Jan 06, 2014 10:36 am

Re: Scripting in R

by skkirkham » Mon Jan 06, 2014 11:26 am

Hi ....
I took a look in your serverdbg file to see what the issue with your script is. This is where kognitio writes all the errors received from R. The following is key:

Code: Select all

T_2014-01-03_21:34:03_EST: RS 7 S 136814 R F70016 LO:Script stderr: Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
T_2014-01-03_21:34:03_EST: RS 7 S 136814 R F70016 LO:Script stderr:   no lines available in input
There are scripts invoked by Kognitio that have zero data input. This results in errors in the R read.table command. Therefore within your R you need to test that the "file" you are reading is not empty. A simple way to do this in R is to make use of the command "try". In your R script do

Code: Select all

mydata=try(read.csv(file=file("stdin"), header=FALSE), TRUE)
if (class(mydata)!='try-error') {
...<your R code goes here>...
}
The object mydata will be populated as normal if there is data but if there is an error mydata is populated with a string = 'try-error'. This is used in the subsequent if statement where the main code is only executed if your mydata object has been populated, i.e. there is some input. Note in the try call the optional TRUE sets the error message to silent so that script returns error free if input is empty. (See R help on try command for more details).

Why are you getting scripts with zero input in Kognitio?
When you execute an external script in Kognitio the default behaviour is for one script to be invoked for each RAM store on the system. In a typical Kognitio system there are between 1 and 3 ram stores per CPU core on the system.
Whether data is subsequently passed to a script invocation by Kognitio depends on a number of things, in particular settings controlling parallelism and the size of input data. However you must always handle the possibility of no input data in any external script you create (regardless of language).

With regard to external scripting examples we are building up a library to help users build out their knowledge of external scripts in various languages, initially concentrating on python and R. I will post again on this thread to direct you to these as they are made available.

Regards
Sharon
KACE team
Reply with quote Top
Contributor
Offline
User avatar
Posts: 25
Joined: Fri Nov 29, 2013 9:31 pm

Re: Scripting in R

by quant123 » Tue Jan 07, 2014 3:28 am

Hi Sharon,

Thanks for looking onto this.
The errors that you are referring to happened on 1/3/14 and those can be just a result of my previous testing and experimenting with scripting.
The question that I have is about the problem when trying to import external library in R and the error that I recreated on 1/4/14 on 10:03 PM (please see my previous post).
Anything in logs with that time stamp? I am not sure that zero input is the problem in this case since the script runs just fine unless I try to load a library. When I do, there are simply no query results.
My script in my previous post, it's simple to replicate..

Also thanks for letting me know about the serverdbg - it is very useful to be able to see script errors, and I have been looking for the way to do this.

Great to hear hat more documentation and examples are coming. I look forward to it!
R / Python is an excellent choice too, IMHO.

Thanks.
Reply with quote Top
Contributor
Offline
Posts: 38
Joined: Mon Jan 06, 2014 10:36 am

Re: Scripting in R

by skkirkham » Tue Jan 07, 2014 5:29 pm

Hi quant123.
I looked through your serverdbg file for errors on the 4th Jan and found the following:

Code: Select all

T_2014-01-04_22:03:05_EST: RS 7 S 136814 R 1E10016 LO:Script stderr: Error in dyn.load(file, DLLpath = DLLpath, ...) : 
T_2014-01-04_22:03:05_EST: RS 7 S 136814 R 1E10016 LO:Script stderr:   unable to load shared object '/usr/lib64/R/library/methods/libs/methods.so':
T_2014-01-04_22:03:05_EST: RS 7 S 136814 R 1E10016 LO:Script stderr:   /usr/lib64/R/library/methods/libs/methods.so: failed to map segment from shared object: Cannot allocate memory
T_2014-01-04_22:03:05_EST: RS 7 S 136814 R 1E10016 LO:Script stderr: Error: package or namespace load failed for 'methods'
This looks to me that your script invocations do not have enough RAM available to allocate the memory required for the packages you wish to use.
Below I've gone through some steps (briefly) that should resolve your issue. I will post a more detailed explanation in the future.

Regards
Sharon

Set the maximum RAM limit for a script
In Kognitio there is a parameter called max_script_ram that controls the maximum RAM available to an individual script invocation. You can check it's value (if you have the correct privileges) using SQL query:

Code: Select all

select PVALUE from SYS.IPE_ALL_PARAM where PNAME='max_script_ram'; 

If nothing is returned the default value is 200(MB).
Your script is probably hitting this limit when you try to load the packages. You can set this limit to any value via SQL using

Code: Select all

set parameter max_script_ram to <N>;
Here N is in MB. You need to be careful how you set this maxium value though. It should not exceed the RAM available to external script execution on a node, otherwise you may get OOM errors, see below.
By default an external script is created with no limit on the number of threads and no limit on the RAM required to run it. Therefore when a script is onvoked the number of threads defaults to the number of ram stores on your system and the RAM limit per script invocation is set to the value of max_script_ram (i.e. 200MB by default)

Controlling parallelism
To try to solve your problem I would start by limiting the script execution to a single thread per node and setting the ram required equal to your new max_script_ram value, i.e. remove the parallelism on a node. If your script is already created you can do this via SQL using:

Code: Select all

alter external script <scriptname> set limit 1 thread per node;
alter external script <scriptname> set requires <N> GB RAM;
Does your script run now? If so you can start to play around with the combination of number of threads and RAM required, i.e. re-introduce parallelism. For example if a script runs on a single thread allocated 4GB RAM I would next try 2 threads per node and 2 GB of RAM. Does this run too? If so can I go to 4 threads and 1GB RAM etc. Note that the "requires <N> MB RAM " is always overridden by the max_script_ram value even if it set higher. The number of threads per node is not allowed to exceed the number of ram stores. If your script still gives the same error when set to a single thread per node then you need to change the amount of RAM available to external scripting on each node, see below.

Setting system RAM allocated to external script execution
When you install Kognitio by default it allocates itself 90% of the RAM available on the system. This means that any external scripts are executed using the remaining available RAM (minus system operation overhead). So an approximate estimate of RAM available for external scripts is 10% of system RAM. You can increase the RAM available to external scripts in the system config file, see page 268/269 of Kognitio Guide (http://www.kognitio.com/forums/latest_810_pdf.zip) for details. As an example below are the [boot options] in the config files on one of the systems I use. (Each node has 32GB RAM).

Code: Select all

[boot options]
raid_cluster_size=4
external_tables=yes
external_scripts=yes
min_fixed_pool=5000000000
The last line sets aside 5GB RAM per node for external scripting and the OS overhead. (default would be 3.2GB). This means Kognitio takes remaining 28GB RAM per node (approx). I have max_script_ram set to 4000 (4GB) on this system to leave some RAM for other system processes.
If you are running Kognitio on a relatively small system you can specify the RAM via min_fixed_pool accordingly. Note if you use min_fixed_pool=N and N is less than 100 this is interpreted as a percentage rather than Bytes.
Reply with quote Top
Contributor
Offline
User avatar
Posts: 25
Joined: Fri Nov 29, 2013 9:31 pm

Re: Scripting in R

by quant123 » Wed Jan 08, 2014 2:04 am

Wow, that's a lot of great info!
I have a feeling that memory settings may be it.
I will give it a shot and I will post the results shortly.

Thank you!
Reply with quote Top
Contributor
Offline
User avatar
Posts: 25
Joined: Fri Nov 29, 2013 9:31 pm

Re: Scripting in R

by quant123 » Thu Jan 09, 2014 1:30 am

That worked!

What I had to do is:
1. execute alter external script <scriptname> set requires 1 GB RAM; (or more than 1GB if needed)
2. add the following line to my script: requires 1 GB RAM (or more than 1GB if needed).

and that was it. Now I can load libraries and run R scripts with no problem. I didn't have to do anything with threads.
I noticed that if I try to work with a data set that is too large, I have to increase the RAM or script doesn't return any results (this is normal of course). So it is important to be aware of this..
I guess, it would make sense to use the above two lines with every script?

It is also good to learn about serverdbg - it is difficult to build a script with no error feedback from R / Python...

I am really liking the R scripting feature. I can now use R and the existing R libraries to do calculations with R script embedded inside the SQL query, and get final results into Excel, instead of having to painstakingly do most of the calculations with Excel.

Sharon, thank you very much for your help.
Reply with quote Top
Contributor
Offline
Posts: 38
Joined: Mon Jan 06, 2014 10:36 am

Re: Scripting in R

by skkirkham » Thu Jan 09, 2014 2:20 pm

Hi quant123

No problem glad to help get you up and running.

If you think 1GB is going to be the most common setting for your exernal scripting then rather than having to set "requires 1GB" in every script you could set the max_script_ram parameter to this value and then this will be applied to all scripts where no requires statement is present. It's up to you can explicitly declare RAM requirement on each script if you prefer.

If you want to look at Python examples too check out the new introductory examples here. The KACE team are working on some R examples that will be posted soon.

Sharon
Reply with quote Top
Contributor
Offline
User avatar
Posts: 25
Joined: Fri Nov 29, 2013 9:31 pm

Re: Scripting in R

by quant123 » Fri Jan 10, 2014 1:01 am

OK, thank you.
However, If I don't enter the "requires 1GB" line in the script then I get error: "HY000[Kognitio][WX2 Driver][192.168.10.77:6550] RS0023: Error writing to external script pipe"

Also, the new tutorials look great!
Reply with quote Top
Single Poster
Offline
User avatar
Posts: 1
Joined: Tue Jul 01, 2014 11:46 am

Re: Scripting in R

by Atanu Mitra » Tue Jul 01, 2014 11:56 am

Hi all,

I have started using Kognitio only a few hours back and my project involves porting R codes to Kognitio. But whenever I run a code, I get the following error:

Expected end-of-request at or near select, offset 376 "..ite('Bye', stdout()) } )X'; -->select<--"

The code I tried to run is this very simple one: drop external script DEMO_MOB.RTEST;
create external script DEMO_MOB.RTEST environment RSINT
receives (T varchar) sends (T varchar)
script
S'X(
eof <- FALSE
con <- file('stdin', 'r')
while (!eof){
l <- readLines(con, n = 1, warn = FALSE)
if (length(l) == 0) {
eof <- TRUE
break
}
write(l, stdout())
write('Bye', stdout())
}
)X';
select * from
(external script DEMO_MOB.RTEST from (select 'Hello World')) a;


The error points to the line, i.e. the third last line of the code

)X';

Can anybody please help me sorting this out?
Reply with quote Top
Contributor
Offline
Posts: 38
Joined: Mon Jan 06, 2014 10:36 am

Re: Scripting in R

by skkirkham » Tue Jul 01, 2014 5:10 pm

Hi Atanu

The good news is that I have just copied your code and run it without issue here.

I suspect the error you are seeing is due to how you are submitting the SQL code. I'm assuming you are using the Kognitio Console.
If you are not using Console then let us know which SQL submission tool you are using and we will look into it further.

Console has 2 modes of SQL submission: the left hand SQL button is for single queries only. If I put your code into this I get the error that you are reporting.
The button next to this on the right with SQL in front of a script is what you need. This is the scripting window which allows multiple SQL statements to be submitted. You can step through and debug etc. When I ran your code in this mode it worked fine.

To be honest I only ever use the script mode myself.

There is some docs on Console functionality here. In the Console guide chapter 2 page xii covers the different SQL submission buttons.

Hope this helps. Let me know if you have further questions

Regards
Sharon
Reply with quote Top
Contributor
Offline
User avatar
Posts: 21
Joined: Mon Oct 07, 2013 12:15 pm

Re: Scripting in R

by ChakLeung » Thu Jul 03, 2014 9:11 am

Just to add to the above reply.

Atanu, I can see that you tried to handle the case of zero input using the "eof" variable and breaking once it reads a line with zero length. I believe the break is causing some issues as it returns a nonzero exit code. I've modified it slightly where the length of the line is a condition for the while loop to continue.

Code: Select all

drop external script DEMO_MOB.RTEST;
create external script DEMO_MOB.RTEST environment RSCRIPT
receives (T varchar) 
sends (T varchar)
script 
S'EOF(
con <- file('stdin')
while (length(l <- readLines(con, n = 1,warn = FALSE))>0){
    write(l, stdout())
    write('Bye', stdout())
}
)EOF';

select * from
(external script DEMO_MOB.RTEST from (select 'Hello World')) a;
You can also choose to use "limit 1 threads" under the sends statement (or use "Alter external script <scriptname> set limit 1 threads") for testing purposes.

The above is fine if the data is being streamed line by line but should you want to handle whole csv files then you can check using the number of rows when you store them. Check the code in slide 4 here for an example:

Kognitio External Training R Part 1
Reply with quote Top

Who is online

Users browsing this forum: No registered users and 1 guest

cron