Hadoop… Let’s not throw the baby out with the bath water again!

03

May
2017
Posted By : Roger Gaskell Comments are off
rubbish
Categories :#AnalyticsNews, Blog
Tags :  ,,

Here we go again! Suddenly the industry seems to have turned on Hadoop. Headlines saying “it’s hit the wall” and “it’s failed” have recently appeared and some are suggesting that organisations look at alternative solutions. Granted, Hadoop has its limitations and has not lived up to the massive hype that surrounded it a year or two ago, but then nothing ever does.

I admit I was not a fan of Hadoop when it first appeared; it seemed like a step backwards. It was very complicated to install, unreliable, and difficult to use, but still it caught the industry’s imagination. Engineers liked it because it was “proper engineering” not a shrink wrapped productionised product, and the business was seduced by the idea of free software. Pretty quickly it became an unstoppable runaway train and the answer “to life the universe and everything” was no longer 42 but Hadoop.

Great expectations generally lead to disappointment and this is Hadoop’s problem. We hyped it up to such an extent that it was always going to be impossible for it to live up to the expectations, no matter how much it improved, and it has, immeasurably! Hadoop is following the Gartner Hype Cycle (one of the cleverest and most accurate representations of how the perception of technology evolves) perfectly. It’s just for Hadoop the curve is enormous!

So what do I mean by let’s not throw out the baby with the bathwater again? In Hadoop’s early days the hot topic was NoSQL. The message was SQL was dead. The problem with SQL was that it was difficult to write the complicated mathematical algorithms required for Advanced Analytics and, as the name suggests, it relies on the data having structure. Advanced analytical algorithms are easier to implement, and unstructured data easier to handle, in languages such as “R” and Python. All perfectly true, but advanced analytics is just the tip of the data analytics triangle and the rest of the space is very well served by traditional SQL. Traditional BI reporting and self-service data visualization tools are still massively in demand and generally use SQL to access data. Even unstructured data is usually processed to give it structure before it is analysed. So when the NoSQL bandwagon claimed SQL was dead, they were effectively throwing out the most widespread and convenient method of business users getting access to Hadoop based data, in favor of something that only developers and data scientists could use.

Of course sense eventually prevailed, NoSQL morphed into Not-Only SQL, and now everyone and his brother is now implementing SQL on Hadoop solutions. The delay has been costly though and the perceived lack of fully functional, high performance SQL support is one of the key reasons why Hadoop is currently under pressure. I say perceived because there are already very good SQL on Hadoop solutions out there if people are willing to look outside the Apache box, but this is not a marketing piece so I will say no more on that subject. My point is that the IT industry has a history of using small weaknesses to suddenly turn on otherwise very useful technologies. There will always be those whose interests are best served by telling the industry that something is broken and we need to throw it away and start again. The IT industry’s problem is that it is often too easily led astray by these minority groups.

Hadoop has come a long way in a short time and although it has problems there is a large community of people working to fix them. Some point to the lack of new Apache Hadoop projects as a sign of Hadoop’s demise; I would argue that this is a positive thing with the community now focused on making and finding stuff that works properly rather constantly focusing on the shiniest, new cool project! I think that Hadoop is finally maturing.

This post first appeared on LinkedIn on April 18, 2017.

Hadoop’s biggest problem, and how to fix it

05

Apr
2017
Posted By : Mark Chopping Comments are off
Tags :  ,

Introduction

Hadoop was seen as a silver bullet for many companies, but recently there has been an increase in critical headlines like:

  1. Hadoop Has Failed Us, Tech Experts Say
  2. You’re doing Hadoop and Spark wrong, and they will probably fail
  3. Has Hadoop Failed? That’s the Wrong Question

The problem

Dig behind the headlines, and a major issue is the inability for users to query data in Hadoop in the manner they are used to with commercial database products.

From the Datanami article:

  • Hadoop’s strengths lie in serving as a cheap storage repository and for processing ETL batch workloads, Johnson says. But it’s ill-suited for running interactive, user-facing applications
  • It’s better than a data warehouse in that have all the raw data there, but it’s a lot worse in that it’s so slow
  • “At the Hive layer, it’s kind of OK. But people think they’re going to use Hadoop for data warehouse…are pretty surprised that this hot new technology is 10x slower that what they’re using before,” Johnson says. “[Kudo, Impala, and Presto] are much better than Hive. But they are still pretty far behind where people would like them to be.”

The Register article based on a Gartner research talk recognises Hadoop’s strength for ETL processing, but highlights the issues with SQL-handling on Hadoop.

The Podium Data article states “Hadoop is terrible as a relational database”, and “Hadoop failed only in the sense that inflated expectations could never be met compared to mature commercial offerings.”

“The Growing Need for SQL for Hadoop” talks about the need for SQL for Hadoop. The ideal is to be “on Hadoop”, and thus processing data within the Hadoop cluster, rather than “off Hadoop” where data has to be extracted from Hadoop for processing.

Similarly, Rick van der Lans talks about “What Do You Mean, SQL Can’t Do Big Data?”, emphasising the need for SQL solutions when working with big data platforms.

RCA of the problem

There can be many reasons for current SQL-on-hadoop products not being performant.

Possibilities include:

  • overhead of starting and stopping processes for interactive workloads – to run relatively simple queries quickly, you need to reduce latency. If you have a lot of overhead for starting and stopping containers to run tasks, that is a big impediment to interactive usage, even if the actual processing is very efficient
  • product immaturity – a lot of commercial databases have built on the shoulders of giants. For example, this wiki lists a set of products that derive from PostgreSQL, including Greenplum, Netezza, ParAccel, Redshift, Vertica. This gives these products a great start in avoiding a lot of mistakes made in the past, particularly in areas such as SQL optimisation. In contrast, most of the SQL-on-hadoop products are built from scratch, and so developers have to learn and solve problems that were long-since addressed in commercial database products. That is why we see great projects like Presto only starting to add a cost-based optimiser now, and Impala not being able to handle a significant number of TPC-DS queries (which is why Impala TPC-DS benchmarks tend to show less than 80 queries, rather than the full 99 from the query set).
  • evolution from batch processing – if a product like Hive starts off based on Map-Reduce, its developers won’t start working on incremental improvements to latency, as they won’t have any effect. Similarly, if Hive is then adopted for a lot of batch processing, there is less incentive to work on reducing latency. Hive 2 with LLAP project aims to improve matters in this area, but in benchmarks such as this AtScale one reported by Datanami it still lags behind Impala and SparkSQL.

Solution

Whilst benchmarks show that SQL on Hadoop solutions like Hive, Impala and SparkSQL are all continually improving, they still cannot provide the performance that business users need.

Kognitio have an SQL engine originally developed for standalone clusters of commodity servers, and used by a host of enterprise companies. Due to this heritage, the software has a proven history of working effectively with tools like Tableau and MicroStrategy, and delivering leading SQL performance with concurrent query workloads – just the sort of problems that people are currently trying to address with data in Hadoop. The Kognitio SQL engine has been migrated to Hadoop, and could be the solution a lot of users of Hive, Impala and SparkSQL need today.

It has the following attributes:

  • free to use with no limits on scalability, functionality, or duration of use
  • mature in terms of query optimisation and functionality
  • performant, particularly with concurrent SQL query workloads
  • can be used both on-premise and in the cloud

For further information about Kognitio On Hadoop, try:

 

This post first appeared on LinkedIn on March 23, 2017.

Strata + Hadoop World – San Jose

22

Mar
2017
Posted By : Sharon Kirkham Comments are off
Categories :#AnalyticsNews, Blog

The Kognitio team had a great trip to Strata + Hadoop World in San Jose last week and we would like to say a big thank you to everyone who stopped by for a chat about getting enterprise level performance for their SQL on Hadoop. We look forwarding to hearing from you when you try out Kognitio on Hadoop.

At the start of the conference we released our benchmarking whitepaper in which Kognitio outperformed Impala and Spark in a TPC-DS benchmarking exercise. This proved to be of great interest and kept us all really busy on the stand. Conversations ranged from people who have been using Hadoop a while and are having problems serving data to their end-user applications such as Tableau and Qliksense right through to those that are just starting out on their Hadoop journey and wanted to understand what Kognitio can bring to their solution stack.

The subject matter of the conference sessions indicates that there is a period of consolidation going on within the Apache® Hadoop® solution stack. Most topics were discussing how to get the most from more established projects and the challenges of enterprise adoption. There was very little new research presented which was a bit disappointing.

 

Marcel Kornacker and Mostafa Mokhtar from Cloudera presented a talk on optimising Impala performance that was really interesting. They had also been using the TPC-DS query set for benchmarking but obviously had to use a cut down version of the query set (75 out of 99 queries). The optimisation details will be useful for us to follow for Impala when we do the next round of benchmarking after Kognitio 8.2 is released in April. Their benchmarks were at the 1 TB and 10TB scale. Increasing scale to 10TB and concurrency above 10 streams is something that we would definitely like to do during the next set of benchmarks.

From a maths perspective it was great to see Bayesian inference in the data science mix. Michael Lee Williams from Fast Forward Labs presented a great overview. I will certainly be checking out some of algorithms and tools with a view to parallelising them within Kognitio’s external scripting framework.

Data streaming also continues to be at the forefront of the conference . It was clear from the number of sessions in the conference that more companies (such as Capital One) have experiences they want to share as well as plenty of contributions from established technology leaders such as Confluent. It is certainly something that we are thinking about here.

If you didn’t make it to our booth at San Jose we hope to see you at one of these upcoming events:

DWS17, Munich, Sponsor, Big Data

We’ll be on Booth #1003.

See us at the next Strata Data Conference in London

23-25 May 2017

Booth #511.

 

The differences between KogScript running in Console and kogscript.exe

24

Feb
2017
Posted By : Michael Atkinson Comments are off
KogScript, Lua, SQL, programming, how to, console window
Categories :Blog, Kognitio How To

KogScript is a general purpose programming language based on Lua. It may be run within Console, which acts as an integrated development environment for it, or run from the command line tool kogscript.exe.

The KogScript running within Console and via kogscript.exe are compiled from the same source files and are identical with the following exceptions.
 

sqldebug

In kogscript.exe the global variable sqldebug is set to 3. The sqldebug variable is used to control how much information is output on each sql query within the KogScript as it is executed. A level of 3 shows the query status, and table contents.

In Console sqldebug is not defined, so it acts as if it had a value of 0, this means no information is output to the terminal on sql execution, instead Console has integration of queries within the script view as info and table and stored in the script history.

withinConsole

When running within Console the global variable withinConsole is set to true
 

Console integration

When running within Console there are two extra global tables holding data and methods used to integrate scripts with Console (within the table kog) and access Qt objects and methods (within the table qt).

An example of how to use the qt methods may be found in the Examples tool as “Using Qt from Kog Scripts”. This shows how to open Qt dialogs, and use Qt grids and views.

The kog global table holds functions to access the main elements of the Console application user interface. From those it is possible to access the rest of the user interface. There are about 100 classes and thousands of objects available. To find out which methods, variables and enumerations are available for one of the kog functions use list(), as in:

mw = kog.mainWindow()
list(mw)

Which gives outout like:

 

Loader and Unloader

To run either of these from KogScript, the script needs to be run within Console as they make use of Console integration to run the Loader or Unloader tools built into Console. The example below shows how to run Unloader from within Console.

-- Get a reference to the console main window.
mw = kog.mainWindow();

config =[[
FILENAME=C:/TEMP/simplevector_test.csv
TABLE="MIKE"."A_SIMPLE_VECTOR"
AUTOUNLOAD=1
SILENT=1]];

-- Now run the unloader using the configuration above. 
mw:runUnloader(config)

 

Conclusion

KogScript within Console and the kogscript.exe command line tools are identical as far as running SQL queries and the Lua language.

Where they differ is that within Console there are extra features to enable integration with Console itself.

What’s new in Kognitio Console version 80199s170123

07

Feb
2017
Posted By : Michael Atkinson Comments are off
Whats new in Kognitio Console
Categories :Blog, Kognitio How To

SQL in expression contexts

 Up until this release SQL could only occur in Lua statements or assignments. For example in Kognitio Scripts we can write SQL within the script as a statement or assigning the results table to a variable.

drop table hr.employee;
t = select * from sys.ipe_user; 

This release of Kognitio Console adds the ability to place SQL in Lua expression contexts. So now within Kognitio Scripts we can place SQL in most places where a Lua expression may occur.

In the example below the data for the Kognitio users are read from the table sys.ipe_user as a Lua table. The rows element of that table is then passed to the apply function which passes each row in turn to the printRow function. The printRow function prints the NAME element of the row to the console, which is attached to the Kognitio Console Script Variables pane.

function apply(t, fn)
    for i, row in pairs(t) do fn(row) end
end
 
function printRow(row) print(row.NAME) end
 
apply( (select * from sys.ipe_user).rows, printRow) 

This feature currently has the limitation that it is not easy for the parser to determine where the SQL query ends. The code below fails because the ‘,’ character does not end an SQL query so it thinks that the , printRow is part of the query. The failure is at run time when the SQL is parsed in the Kognitio Analytical Platform and the SQL select * from sys.ipe_user, printRow errors.

function apply(t, fn)
    for i, row in pairs(t.rows) do fn(row) end
end
 
function printRow(row) print(row.NAME) end
 
apply(select * from sys.ipe_user, printRow) -- ERRORS: , does not end the SQL query

Instead we use ‘;’ to end the SQL expression:

apply(select * from sys.ipe_user; printRow) – Note use of ; to end SQL

Of particular use while debugging scripts is the form:

– any expression of the form print(<sql query>), e.g.
print(select * from sys.ipe_user) 

which outputs the formatted query results to Kognitio Console “Script Variables” pane.

Lua lfs module

The Lua File System module adds common file system manipulation commands to Lua. The lfs module is adapted from the Kepler project  and offers a portable way to access the underlying directory structure and file attributes.

lfs.attributes (filepath [, aname | atable])

Returns a table with the file attributes corresponding to filepath (or nil followed by an error message in case of error). If the second optional argument is given and is a string, then only the value of the named
attribute is returned (this use is equivalent to
 lfs.attributes(filepath)[aname], but the table is not created and only one attribute is retrieved from the O.S.). if a table is passed as the second argument, it is filled with attributes and returned instead of a new table. The attributes are described as follows; attribute mode is a string, all the others are numbers, and the time related attributes use the same time reference of os.time:

dev

on Unix systems, this represents the device that the inode resides on. On Windows systems, represents the drive number of the disk containing the file

ino

on Unix systems, this represents the inode number. On Windows systems this has no meaning

mode

string representing the associated protection mode (the values could be file, directory, link, socket, named pipe, char device, block device or other)

nlink

number of hard links to the file

uid

user-id of owner (Unix only, always 0 on Windows)

gid

group-id of owner (Unix only, always 0 on Windows)

rdev

on Unix systems, represents the device type, for special file inodes. On Windows systems represents the same as dev

access

time of last access

modification

time of last data modification

change

time of last file status change

size

file size, in bytes

permissions

file permissions string

blocks

block allocated for file; (Unix only)

blksize

optimal file system I/O blocksize; (Unix only)

This function uses stat internally thus if the given filepath is a symbolic link, it is followed (if it points to another link the chain is followed recursively) and the information is about the file it refers to. To obtain information about the link itself, see function lfs.symlinkattributes.

lfs.chdir (path)

Changes the current working directory to the given path.

Returns true in case of success or nil plus an error string.

lfs.currentdir ()

Returns a string with the current working directory or nil plus an error string.

iter, dir_obj = lfs.dir (path)

Lua iterator over the entries of a given directory. Each time the iterator is called with dir_obj it returns a directory entry’s name as a string, or nil if there are no more entries. You can also iterate by calling dir_obj:next(), and explicitly close the directory before the iteration finished with dir_obj:close(). Raises an error if path is not a directory.

lfs.link (old, new[, symlink])

Creates a link. The first argument is the object to link to and the second is the name of the link. If the optional third argument is true, the link will by a symbolic link (by default, a hard link is created).

lfs.lock (filehandle, mode[, start[, length]])

Locks a file or a part of it. This function works on open files; the file handle should be specified as the first argument. The string mode could be either r (for a read/shared lock) or w (for a write/exclusive lock). The optional arguments start and length can be used to specify a starting point and its length; both should be numbers.

Returns true if the operation was successful; in case of error, it returns nil plus an error string.

fs.lock_dir (path, [seconds_stale])

Creates a lockfile (called lockfile.lfs) in path if it does not exist and returns the lock. If the lock already exists checks if it’s stale, using the second parameter (default for the second parameter is INT_MAX, which in practice means the lock will never be stale. To free the the lock call lock:free()

In case of any errors it returns nil and the error message. In particular, if the lock exists and is not stale it returns the “File exists” message.

lfs.mkdir (dirname)

Creates a new directory. The argument is the name of the new directory.

Returns true if the operation was successful; in case of error, it returns nil plus an error string.

lfs.rmdir (dirname)

Removes an existing directory. The argument is the name of the directory.

Returns true if the operation was successful; in case of error, it returns nil plus an error string.

lfs.setmode (file, mode)

Sets the writing mode for a file. The mode string can be either “binary” or “text”.

Returns true followed the previous mode string for the file, or nil followed by an error string in case of errors. On non-Windows platforms, where the two modes are identical, setting the mode has no effect, and the mode is always returned as binary.

lfs.symlinkattributes (filepath [, aname])

Identical to lfs.attributes except that it obtains information about the link itself (not the
file it refers to). On Windows this function does not yet support links, and is identical to
 lfs.attributes.

lfs.touch (filepath [, atime [, mtime]])

Set access and modification times of a file. This function is a bind to utime function.

The first argument is the filename, the second argument (atime) is the access time, and the third argument (mtime) is the modification time. Both times are provided in seconds (which should be
generated with Lua standard function 
os.time). If the modification time is omitted, the access time provided is used; if both times are omitted, the current time is used.

Returns true if the operation was successful; in case of error, it returns nil plus an error string.

lfs.unlock (filehandle[, start[, length]])

Unlocks a file or a part of it. This function works on open files; the file handle should be specified as the first argument. The optional arguments start andlength can be used to specify a starting point and its length; both should be numbers.

Returns true if the operation was successful; in case of error, it returns nil plus an error string.

Improvements to Kognitio Script execution and debugging

Metatables with a __tostring method have been added to the table returned by an SQL query. This allows formatted printing of those tables.

t = select * from sys.ipe_user;
print(t) –[[ prints the text below:
|NAME         |ID|DFLT_SCNO|SEC_CLASS|SCH_CLASS|STATUS|PTIME              |
|SYS          |0 |0        |0        |0        |0     |2016-12-21 11:19:57|
|GRP_LOG      |3 |0        |0        |0        |8     |2016-12-21 11:20:15|
|GRP_MONITOR  |1 |0        |0        |0        |8     |2016-12-21 11:20:15|
|GRP_DISKUSAGE|4 |0        |0        |0        |8     |2016-12-21 11:20:15|
|GRP_DBG      |2 |0        |0        |0        |8     |2016-12-21 11:20:15|
]]

This may also be used in the kogscript.exe command line tool.

The SQL error string is added to the table returned from the SQL query, and this is also printed out by the metatable __tostring method

t = select * from nonexistent;
if t.status ~= 0 then 
    print("Query failed: " .. t.error) --[[ prints the text below:
    Query failed: ERROR: [Kognitio][WX2 Driver][latest] CI3013: Table SYS.NONEXISTENT does not exist, offset 15
    ]]
end
 
print(t) –[[ prints the text below:
ERROR: [Kognitio][WX2 Driver][latest] CI3013: Table SYS.NONEXISTENT does not exist, offset 15
]]

Breakpoints may now also be placed on Lua statements within the Kognitio Script, even if the “Trace Kog Script Lines” option is not set. After the breakpoint is hit, single stepping then steps through each Lua line.

Timestamps are now converted to a table with year, month, day, hour, min and sec fields:

t = select * from sys.ipe_allview_img;
list(t.rows[1].CREATE_TIME) –[[ prints the text below:
lua::number day = 16
lua::number min = 45
lua::number month = 1
lua::number year = 2017
lua::number hour = 10
lua::number sec = 34
]]

Beginner tips for Kognitio scripting

24

Jan
2017
Posted By : Chak Leung Comments are off
Kognitio scripts, lua
Categories :Blog, Kognitio How To

Standard ANSI SQL, whilst a great language for database querying, doesn’t allow for commonly used features like looping or extensive variable usage. There are variants which support some of this e.g. Transact-SQL but not on Kognitio until the introduction of Kognitio scripts.

But 2 languages in one script? How does this work?

Kognitio scripts (also referred to as ‘Kog’ scripts) are based on the Lua language where SQL statements can be written together with Lua code for extra functionality. They allow typical procedural language operations without the need of extensive workarounds in a non-procedural language like SQL.

It can be a bit overwhelming for SQL only users so we’ll go over some useful tips for getting started, for more in depth reading refer to the console scripting guide on the forum.

To work with Kog scripts in Kognitio Console make sure that your scripting pane options tab has settings similar to those below. This will allow you to work effectively in debug mode.

 

Variable and function integration (for management)

Set variables via:

use_schema = ''
use_table = 'example'

To call these arguments in SQL just add a $ before the variable name:

create table $use_schema.$use_table(mycolumn varchar);

This is very useful if users share a generic script to be used on different data sets. Rather than changing the schema/table for every query in the script, managing all the variables at the top of the script makes changing the source data/script parameters set much more organised.

Note that if you were to use variables in a Lua only code then you don’t need to reference them with “$”. Here the variable “a” is used in a loop:

a = 0
for i=1,5 do
    a = a +1;
end

Functions work in a similar way:

function square_num(a)
return a*a;
end
select $square_num(value) from values between 1 and 5;

Remember that Lua is case sensitive where it’s typically written in lower case (“and” is a reserved word but “And” or “AND” could be used as variable names).

The variable name isn’t restricted to alphanumeric characters, however if it doesn’t begin with one then you’ll need to enclose with brackets e.g. “$(_myvar)”. You can also reference a Lua table this way “$(myluatable[1])” which refers to the first column.

When stepping through a Kog script in Kognitio Console, you can always view the currently defined variables via the script variables pane (view > panes > script variables).

Try: Managing variables can be extended further via local variables, try creating two functions which take a global variable as an argument and alter it to suit the function need. Further this by using the global table _G..

Saving results to Lua tables e.g. column names/table names

You may have used the feature “$Col1” in Kognitio’s SQL scripting to retrieve a result from a previously run query. However this limits you to only the first row of results. A solution to this is to save them to a Lua table.

Declaring a table and insert via:

mytable= {};
table.insert(mytable,'element_1');
print(mytable[1]);

To set predefined elements:

mytable {'element_1','element_2'}

To save results from an SQL query:

average_values = select avg(value), avg(value*value) from values between 1 and 10;

Example: Save column names for an SQL table/view into a Lua table

mycolumns = select seq,name 
from SYS.IPE_COLUMN
where table_id in (select table_id 
                   from SYS.IPE_VIEW
                   where schema_id in (select id from SYS.IPE_ALLSCHEMA
                                       where name = '')
                   and name = ''
                   )
order by 1;
print(mycolumns.rows[1].NAME);

This saves the column names from a specified SQL table into a Lua table and prints the name of the first column. Some return values for Lua tables taken from our console scripting guide:

Table Variable Value
numRows Integer, Number of rows in the resultset.
numCols Integer, Number of columns in the resultset.
Status Integer, 0 = OK
colNames Table
Rows Table

Try: You’ll notice that you can also access a Lua tables column names via the colNames table variable so you can save an SQL table into a Lua table and call the colNames variable to get them rather than fetching it from the SYS tables.

Note: Returning results of a SQL query to lua tables should be restricted to small result sets as these are stored in the memeory of your local system where Kognitio console is running.

Looping with Lua and SQL

A basic version is possible in ANSI SQL but becomes a bit limited as you’re only able to work with an incrementing defined counter. In Kog scripting you can perform bread and butter loops like the classic “for” and “while” loops. These let you set index limitations as well as have an index paired alongside a value and iterate over Lua defined constructs like arrays and tables.

for i=1,5 do
    ...    
end
while i <5 do
   ...    
end

The “pairs” notation to include an index:

for a,b in pairs() do
   ...    
end

To integrate this with SQL queries just add them in to the loop, remember to end each query with a semicolon.

for i=1,5 do
    select avg(value) from values between 1 and $i ;
end

Example: Writing SQL automatically via concatenation (when there’s lots to write)

In the previous tip we saw the we can saved the results of an SQL query to a Lua table, we can now iterate through them by calling the table name and adding a suffix for the rows. Let’s take the column names of a previous table for example.

mycolumns = select seq,name 
from SYS.IPE_COLUMN
where table_id in (
                   select table_id from SYS.IPE_VIEW
                   where schema_id in (select id from SYS.IPE_ALLSCHEMA
                                       where name = '')
                   and name = ''
                   )
order by 1
;

And apply it to our concatenation. The aim is to write the SQL syntax for every column to calculate the average and maximums whilst excluding the unique ID column from this.

myschema ='CHAK'
mytable = 'CONCAT_STRINGS_EG '

--Table and view columns are stored in different system tables, 
--toggle with flag below depending on data source
--1= table, 2= view
type_flag = 1
if type_flag==1 then
    table_type = "IPE_TABLE"
    selection = 'id'
elseif type_flag==2 then
    table_type = "IPE_VIEW"
    selection = 'table_id'
end

drop table $myschema.$mytable;
create table $myschema.$mytable 
(id int, sale_price decimal(5,2), amount_paid decimal(5,2), no_items int);

insert into $myschema.$mytable
select 
row_number() over(),value*(value mod 5), ceiling(value*(value mod 5)),generate_key() 
from values between 1 and 5;

mycolumns = select seq,name from "SYS"."IPE_COLUMN"
where table_id in (
    select $selection from "SYS".$table_type
    where schema_id in (select id from "SYS"."IPE_ALLSCHEMA"
                       where name = '$myschema')
    and name = '$mytable'
    )
order by 1
;

col_average = ''
col_maximum = ''
for x,y in pairs(mycolumns.rows) do
   --Skip x=1 because that is the ID and we don't want to calculate metrics for it
    if x==2 then
        col_average = 'avg('..y.NAME..') avg_'..y.NAME
        col_maximum = 'max('..y.NAME..') max_'..y.NAME
    elseif x>2 then
        col_average = col_average..',avg('..y.NAME..') avg_'..y.NAME
        col_maximum = col_maximum..',max('..y.NAME..') max_'..y.NAME
    end
end

select id, $col_average from $myschema.$mytable group by 1;
select id, $col_maximum from $myschema.$mytable group by 1;

This will concatenate each of the 3 column names so that you can insert into another SQL statement later via “$col_average” and “$col_maximum” saving the need to type out each column name especially if this code is required in different queries. Extremely useful when producing a generic version for different datasets as the column names don’t need to be redefined. You could even wrap the whole code in a “for” loop and repeat this for different tables or views.

Try: Note that x is the index and y is the value in the above. Since SQL columns are separated by a comma in the coding, we use the index x to determine the first column and not add a comma in front of it. Conversely you could change it so that it only adds a comma when the index is not the first one.

Kognitio modules (Section off code)

As you build up your code you may find that you have functions which can be useful elsewhere i.e. other scripts where you can just copy them across but this can make scripts unnecessarily verbose. Instead putting them in Kog modules allows you to put them into another file to be called up later when needed.
Firstly you’ll need to set the path to your Kog modules. The example below is for a local machine folder but this will also work for network drives allowing multiple users to contribute and access them.

_G.package.path= _G.package.path.. ";?.kog";

Then you can load any files in there them via “require” similar to the module/library/package loading in other languages:

 scr1 = require "Kog_Module_Script1"
 scr1.run()

The first line designates that module as scr1 and the second runs it. In our example we’re going to run two modules depending on the time of day: the average during the day and the sum at the end of the day.

Getting the times:

select time '09:00:00';
stime = Col1;
print(stime)
select time '17:00:00';
etime =  Col1;
print(etime)
ctime=os.date("%X");
print(ctime)

if ctime > stime and ctime < etime then
    mod1 = require ""
    mod1.run()
else if ctime > etime then
    mod2 = require ""
    mod2.run()
else print("Not ready")
end

Which opens the module:

local x = {}
x.name = "Average Script"
select avg(value) from values between 1 and 5;

or:

local x = {}
x.name = "Sum Script"
select sum(value) from values between 1 and 5;

depending on the time of day. The idea being that you can also define functions in there and build up libraries to call in future development rather than inserting copious amounts of code from past scripts and making it very verbose.

Try: Replace the data source in the SQL scripts with your own data set and then insert a function into each script, starting off by printing the time of day and building up the complexity.

Facebook

Twitter

LinkedId