Forum

General topics applicable to all languages
Contributor
Offline
User avatar
Posts: 21
Joined: Mon Oct 07, 2013 12:15 pm

Passing many datasets to external scripts using Kog scripts

by ChakLeung » Mon Jun 23, 2014 11:58 am

Hi all,

With the increasing use of external scripting, it may be useful to utilise some of Kognitio console's other functionalities to be more efficient in our coding. Below are links to the console sub-forum and documentation.

Kognitio Console sub-forum with Kog script questions

Console downloads and documentation

What are Kog scripts?
Kog scripts are an implementation of the Lua language within Kognitio. They allow for much better variable management than standard SQL and can also run "naked" SQL within their code without any special functions such as "SQL(<Query>)". They are not treated like a separate entity like external scripts (where they have their own place in the schemas) but are integrated into the SQL script itself.

There are more details on the motivation and reasoning behind Kog scripts and why Lua was chosen in the console sub-forum.

To change SQL scripts into Kog scripts, open the "options" tab under the script window and on the far right side select "Kog".

Problem
Consider the situation where an external script is already available and different sets of data need to be passed to it. If there were a high number of data sets then this would result in many repetitive duplications of the same SQL query with only a minor adjustment to the data source for the external script.

The script for this instance may look something like the following.

Code: Select all

select * from  (external script <EXT_SCRIPT_NAME> from (select * from <SCHEMA>.<DATA_SET1>))ext;

select * from  (external script <EXT_SCRIPT_NAME> from (select * from <SCHEMA>.<DATA_SET2>))ext;

select * from  (external script <EXT_SCRIPT_NAME> from (select * from <SCHEMA>.<DATA_SET3>))ext;
This leads to very cluttered looking scripts and, once they get very large, are very prone to user error which could be devastating if a large script is being executed over out of office hours.

Solution
To produce a more compact script, the variable management features in the Kog scripting can be used simply and makes it much easier to edit in new data sources.
Consider the following code.

Code: Select all

schema = '<MY_SCHEMA>'
datasets={'<DATA_SET1>','<DATA_SET2>','<DATA_SET3>','<DATA_SET4>','<DATA_SET5>','<DATA_SET6>','<DATA_SET7>', }

for i,v in pairs(datasets) do
    select * from  (external script <EXT_SCRIPT_NAME> from (select * from $schema..'.'..v))ext;
end
Notice that the data sets are defined in a Lua table structure and the script which executes the external script can be iterated for every entry in that table. To reference this a “$” is used and can even be used for specific variables as seen with the schema definition. It is even possible to pass these variables into external scripts as parameters allowing for more flexibility, for example passing the number of rows beforehand so that the external script knows exactly how it should handle and structure the incoming data. Note that the “pairs” is for iterating over a strings index otherwise “ipairs” should be used.

The benefits of this are that many different datasets can be passed to an external script in a much more manageable way (although the above example is rather simple, the advantages will start to show as the script starts to grow in size). This is shown by, which is also good practice, using a format which can be largely modified via some variables at the top like the example above where the schema and datasets are defined before the queries are run.

03/07/14 Edit: Adjusted code to reflect changes to the "dot" bug.
Last edited by ChakLeung on Thu Jul 03, 2014 1:59 pm, edited 1 time in total.
Reply with quote Top
Contributor
Offline
Posts: 38
Joined: Mon Jan 06, 2014 10:36 am

Re: Passing many datasets to external scripts using Kog scri

by skkirkham » Wed Jun 25, 2014 4:17 pm

Rather than defining a variable "dot" you can also include the "." on the end of the schema name if you want

Code: Select all

schema = '<MY_SCHEMA>.'
datasets={'<DATA_SET1>','<DATA_SET2>','<DATA_SET3>','<DATA_SET4>','<DATA_SET5>','<DATA_SET6>','<DATA_SET7>', }

for i,v in pairs(datasets) do
    select * from  (external script <EXT_SCRIPT_NAME> from (select * from $schema..v))ext;
end
Reply with quote Top
Contributor
Offline
Posts: 183
Joined: Wed May 29, 2013 2:10 pm
Location: Bracknell

Re: Passing many datasets to external scripts using Kog scri

by MikeAtkinson » Thu Jun 26, 2014 5:09 pm

The use of '..' like that is a result of a bug in Console kog script implementation. This is now fixed, from the 80100s140628 release.

Kog script variables within a SQL statement should be terminated by any non-alphanumeric character. So "$schema." is the contents of the variable schema followed by a '.'

Code: Select all

schema = '<MY_SCHEMA>'
datasets={'<DATA_SET1>','<DATA_SET2>','<DATA_SET3>','<DATA_SET4>','<DATA_SET5>','<DATA_SET6>','<DATA_SET7>', }

for i,v in pairs(datasets) do
    select * from  (external script <EXT_SCRIPT_NAME> from (select * from $schema.$v))ext;
end
Similarly

Code: Select all

a = '1'
b = '2'
select $a,$b;  -- result = 1,2

Code: Select all

a = '1'
b = '2'
select $a$b;  -- result = 12

Code: Select all

a = 'a'
b = 'b'
select '$a$b';  -- result = 'ab'
Reply with quote Top
Contributor
Offline
Posts: 183
Joined: Wed May 29, 2013 2:10 pm
Location: Bracknell

Re: Passing many datasets to external scripts using Kog scri

by MikeAtkinson » Thu Jun 26, 2014 5:26 pm

Kog scripts are based on the Lua scripting language, with the added feature that SQL statements may be written without extra quotes or escapes.

Within a SQL statement Lua variables are accessed by placing a '$' before the variable. More complex Lua expressions, including function calls may be enclosed within $().

Code: Select all

x = 'name'
y = 'hello'
z = 2
function sq(z) return z*z; end
create table n (name varchar);
insert into n values ('mike');
select $x, '$y ', $(z+1), $sq(3), $(sq(4)+1) from n; -- result single row = mike, hello, 3, 9, 17
Under the hood Kog scripts are converted into Lua using a built in function called sql. So before execution the statements would be converted to:

Code: Select all

x = 'name'
y = 'hello'
z = 2
function sq(z) return z*z; end
sql([[create table n (name varchar)]]);
sql([[insert into n values ('mike')]]);
sql([[select ]]..x..[[, ']]..y..[[ ', ]]..(z+1)..[[, ]]..sq(3)..[[, ]]..(sq(4)+1)..[[ from n]]);
Lua has several ways of quoting text [[ ]] are used as they do not clash with uses of quotes in SQL.
Reply with quote Top
Contributor
Offline
User avatar
Posts: 21
Joined: Mon Oct 07, 2013 12:15 pm

Re: Passing many datasets to external scripts using Kog scri

by ChakLeung » Thu Jul 03, 2014 2:07 pm

skkirkham wrote:Rather than defining a variable "dot" you can also include the "." on the end of the schema name if you want

Code: Select all

schema = '<MY_SCHEMA>.'
datasets={'<DATA_SET1>','<DATA_SET2>','<DATA_SET3>','<DATA_SET4>','<DATA_SET5>','<DATA_SET6>','<DATA_SET7>', }

for i,v in pairs(datasets) do
    select * from  (external script <EXT_SCRIPT_NAME> from (select * from $schema..v))ext;
end
This is a very good suggestion as long as the user does not need to reference the schema on it's own. For example if the user wanted to add a predicate "where schema = <SCHEMA_NAME>" or if the schema name needs to be passed to an external script as parameter. It is always possible to remove the "." in these cases (via some lua function or SQL's strchop) but having the them separately would probably be the most trouble-free solution especially if it is referenced this way multiple times as removing them afterwards would increase run times.
Reply with quote Top

Who is online

Users browsing this forum: No registered users and 1 guest

cron