- Your Kognitio system must be running version 8 or 8.1
- Kognitio must have R (and any packages you wish to use) installed on all its nodes
- A Kognitio script environment that runs RScript must be set up on your Kognitio system and you need EXECUTE privileges on this
- You need to be granted the CREATE external script privilege
- Posts: 21
- Joined: Mon Oct 07, 2013 12:15 pm
by AmeliaWad » Thu Aug 27, 2015 3:26 pm
I'm using the RODBC package (sqlSave method) to load data back into a Kognitio database after undertaking analytics in R studio. This is taking a very long time. I think because the even with fast=TRUE option set it is loading data one row at a time.
Does anyone have experience loading data from R - are there any kognitio specific packages (like RORACLE) or maybe ODBC driver options that need to be set to enable bulk loading?
by skkirkham » Tue Sep 01, 2015 9:27 am
We do not have a specific R package for connecting to kognitio that takes advantage of our parallel loading capabilities. I have not specifically tested writing data back to Kognitio using the sqlSave method in the RODBC package but I expect it will be slow as you have suggested because ODBC is not an efficient way to move large data sets around. When working with R we tend to utilise our external scripting capabilities
Data that is already resident in Kognitio is passed through to R and results written back out to Kognitio via our external scripting interface. There are some pre-requisites to using this:
If you cannot get access to external scripting capability then please can you post some more details on what you are trying to do through the RODBC sqlSave method. How big is the result sets in MB/GB, number rows and columns. Finally can you provide details of the options you are using in the call to saveSQL. We will take a quick look to see if we can help speed things up but fundamentally the speed you are going get will be restricted using ODBC and by the networking between where you are running the R and the Kognitio system.
by AmeliaWad » Tue Sep 01, 2015 5:12 pm
Thanks for getting back to me. We are going to have the external scripting in the future - but it is not ready yet.
The test data set was about 100k rows and took 90 minutes to upload. But the final version will be a lot bigger - 180MB as csv.
I have a sense that calling the bulk loader from R would be the way forward. Is this possible? Does the loader need to be on the kognitio machine - can it be remote?
by AmeliaWad » Tue Sep 01, 2015 5:13 pm
prod4 <-odbcConnect(uid="user", pwd="password", dsn="PROD4", rows_at_time = 1000 , believeNRows = FALSE)
sqlSave(prod4, testrow, tablename = "RIM_RESULTS_TEST", append = FALSE,
rownames = FALSE, colnames = FALSE,
safer = FALSE,
fast = TRUE, test = FALSE)
by skkirkham » Wed Sep 02, 2015 12:01 pm
If you are looking at writing results in the region of 200MB then external scripts is definitely the way to go long term.
In the interim calling the client bulk loader is certainly preferable to using sqlSave. I have had a look at sqlSave and the "fast=true" option changes column types from using the default varchars everywhere. However you are still going to get single row inserts and I can't see how to change this.
We will take a look at writing an R function that can call the bulk loader and pipe output to it.
by ChakLeung » Thu Sep 03, 2015 2:09 pm
We found that it's possible to load from R via the system2 command in R: link.
This will call wxloader from where Kognitio client tools are installed on your machine(just replace the pathname I've used if it's in a different location) and I've attached an example but you will need to create the table via console first before loading it. Please be aware of the data types when creating your table as well as the format of your data in R to ensure the match. Notable ones to be careful of include date which should be in "YYYY-MM-DD" format and time which should be in "HH:MM:SS" format.
Code: Select all
system2('C:\\Program Files\\Kognitio Ltd\\wx2clients64\\wxloader.exe', args=c("-text --load-no-nulls","--record-type auto", "--no-statistics","--use-checksums","-s <ip_address>","-u <user_name>","-p <password>", "-t <table_name> -"),stdout=TRUE,input=apply(format(data),1,paste, collapse=","))
Lastly you need to replace what I've labelled as "data" with your data.
Note that for "input" the "system2" command prefers a character vector which the "apply" will do for us and apply a delimiter as well.
We'll also be looking into creating a function for this.
Hope this was helpful.
by skkirkham » Thu Sep 03, 2015 2:39 pm
For this to work on a Linux OS you need to install our Linux client tools from here. This should contain what you need, i.e. wxloader.
We have only tested this on Windows so let us know if works on Linux or not too
Who is online
Users browsing this forum: No registered users and 1 guest