External tables and scripts and user mapping

27

Apr
2017
Posted By : Ben Cohen 0 Comment
external scripts, tables, user mapping, sql query
Categories :Blog, Kognitio How To

Introduction

Kognitio has two mechanisms for external programs to provide or operate on data during an SQL query: external tables and external scripts. These can be written in the user’s preferred language, for example R, Python, shell or even C.

External tables are used like normal tables in SQL queries but the rows are supplied by an external “connector” program rather than being stored by WX2. The connector could generate the rows itself
(TPC-H data, for example) or fetch them from another database, HDFS or an internet data source.

External scripts are table functions in SQL which take a subquery “argument” and return a table. They allow users to create complex analytics or achieve functionality that the database itself doesn’t provide. The script is run under the “script environment” interpreter with the subquery rows as input, and the output rows will be returned to the SQL query.

Here is an example of an external table and an external script. There are more examples in
wx2/current/examples/excon_* and wx2/current/examples/exscripts.sql under your WX2 install directory.

-- External table to return the current environment for each connector instance.
create connector EXCON_ENV
    -- adjust this path if required
    command '/opt/kognitio/wx2/current/examples/excon_env';
create external table ENV (CONNECTOR_NUM int, 
                           NAME varchar(10000),
                           VALUE varchar(10000))
    from EXCON_ENV;

-- Results: environment of each connector
--          e.g. (0, 'WX2_TNO', '629'), (0, 'WX2_METHOD', 'LOAD'), ...
select * from ENV;

-- External script to perform an arbitrary regular expression substitution
-- using the stream editor, sed.
create script environment SED command '/bin/sed -f';
create external script REGEXP environment SED
    receives (TEXT varchar) input 'format "%{quoted=0}s\\n"'
    sends (SUBST_TEXT varchar) output 'format "%0c\\n"'
    script 's/ko\(.*\)/i\1n/';

-- Results: ('ignition')
external script REGEXP from (select 'kognitio');

Kodoop

The following sections on user mapping apply only to Kognitio running in appliance mode.

For Kognitio on Hadoop, external connectors and scripts currently run as the same user that runs the Kodoop processes. This is expected to change in the future to a broadly similar scheme tailored to Hadoop.

Why map users?

The examples above are simple so it doesn’t matter which Linux user runs the external script or connector processes, but it can do in practice. You might want your script to run as a user that has already been set up, perhaps with SSH keys or a uid needed for NFS.

It is also preferable not to execute user-supplied code as a privileged user – or in this case one with access to database files.

Kognitio chooses the Linux user based on the Kognitio user and the query context. By default – unless it has been configured otherwise as described below – it will use the user wxextern, which is created when WX2 is installed. (But wxextern is not created automatically by upgrade from versions earlier than 8.1.)

Allowed Linux users

Kognitio doesn’t simply allow scripts or connectors to run as any Linux user. By default only wxextern is allowed. Any other user must be allowed by name or uid (user id) by editing the “[security]” section of the config file as follows.

A Linux user is allowed if:

  1. The user name is listed in the config file setting “ext_users_allowed” comma-separated list of user names; or
  2. The config file settings “min_ext_user_id” and “max_ext_user_id” are set to positive numbers with min_ext_user_id <= uid <= max_ext_user_id, and the user must not be listed in “ext_users_denied” comma-separated list of user names.

(By default ext_users_allowed is “$extern” and ext_users_denied is “root“. Strictly, in (2) only one of min_ext_user_id and max_ext_user_id needs to be non-negative, with the inequality ignored for the negative value.)

You need to restart WX2 (“wxserver start“) after changing any of these config file settings for them to take effect.

User map strings

Kognitio attempts to map from a Kognitio user to a Linux user using a user map string. This is a comma-separated list of any of the following elements:

User map element Linux user to match
name User name
& Same as Kognitio user (e.g. SYS)
$extern Kognitio extern user, by default wxextern
$admin Kognitio admin user, by default wxadmin
$master Kognitio master user, by default wxroot
$owner Kognitio owner user, by default root

It will try each name in turn and will use the first one which, when converted to lower case, gives the name of a Linux user which is allowed by the security parameters. If no valid match is found then it will fail with the error “ET0111: User mapping for the external operation was invalid”.

For example, for the user map string “kognitio,&,$extern” for a Kognitio user called MEPHISTOPHELES it will try Linux users kognitio, mephistopheles and wxextern.

Note that this is evaluated separately on each node, so it is recommended that the Linux user names and uids are synchronised across the cluster. Otherwise it is possible for different user names to be matched, the same name being matched but the uids differing, or names being matched on only a subset of nodes (the ET0111 error will be raised in this case).

If an ET0111 error is raised the Kognitio user and the user map string are logged, as of version 8.2, in the server debug log file (look at `wxlogd smd`/serverdbg.*), which should be helpful to see why it didn’t work. For example:

T_2017-04-21_09:06:12_BST: ET: Invalid user mapping (ET0111) on 'hp-rack2-enc2-11' for 'SYS' -> 'kognitio,&,$extern'

Which user map string is used?

When Kognitio is about to launch an external script or connector it obtains a user map string. It first looks in IPE_EXTERNAL_USER_MAP for a user map string based on the execution context; if it doesn’t find one then it uses at the config file setting “[boot options] default_usermap“, which defaults to “&,$extern“. It then tries to map the current Kognitio user to a Linux user as described above.

For external scripts the search is performed in this order matching against the columns (OBJECT_TYPE, OBJECT_ID, USER_ID) of IPE_EXTERNAL_USER_MAP.

OBJECT_TYPE OBJECT_ID USER_ID
PRIVDOMAIN_EXT_SCRIPT(13) Script Id UserId
PRIVDOMAIN_EXT_SCRIPT(13) Script Id -1
PRIVDOMAIN_SCRIPT_ENV(10) Script Environment Id UserId
PRIVDOMAIN_SCRIPT_ENV(10) Script Environment Id -1
PRIVDOMAIN_EXT_SCRIPT(13) -1 User Id
PRIVDOMAIN_EXT_SCRIPT(13) -1 -1
-1 -1 User Id
-1 -1 -1

For external tables the search is performed in this order matching against the columns (OBJECT_TYPE, OBJECT_ID, USER_ID) of IPE_EXTERNAL_USER_MAP.

OBJECT_TYPE OBJECT_ID USER_ID
PRIVDOMAIN_TABLE(3) Table Id User Id
PRIVDOMAIN_TABLE(3) Table Id -1
PRIVDOMAIN_CONNECTOR(9) Connector Id User Id
PRIVDOMAIN_CONNECTOR(9) Connector Id -1
PRIVDOMAIN_TABLE(3) -1 User Id
PRIVDOMAIN_TABLE(3) -1 -1
-1 -1 User Id
-1 -1 -1

The administrator can add and remove mappings in IPE_EXTERNAL_USER_MAP using this special syntax.

CREATE EXTERNAL USER MAP
  [OBJECT_TYPE=[EXTERNAL SCRIPT|SCRIPT ENVIRONMENT|EXTERNAL TABLE|CONNECTOR],]
  [OBJECT_ID = 1234,]
  [USER_ID = 1234,]
  USERMAP = '<user map>';

DROP EXTERNAL USER MAP
  [OBJECT_TYPE=[EXTERNAL SCRIPT|SCRIPT ENVIRONMENT|EXTERNAL TABLE|CONNECTOR],]
  [OBJECT_ID = 1234,]
  [USER_ID = 1234];

Hadoop’s biggest problem, and how to fix it

05

Apr
2017
Posted By : Mark Chopping 0 Comment
Tags :  ,

Introduction

Hadoop was seen as a silver bullet for many companies, but recently there has been an increase in critical headlines like:

  1. Hadoop Has Failed Us, Tech Experts Say
  2. You’re doing Hadoop and Spark wrong, and they will probably fail
  3. Has Hadoop Failed? That’s the Wrong Question

The problem

Dig behind the headlines, and a major issue is the inability for users to query data in Hadoop in the manner they are used to with commercial database products.

From the Datanami article:

  • Hadoop’s strengths lie in serving as a cheap storage repository and for processing ETL batch workloads, Johnson says. But it’s ill-suited for running interactive, user-facing applications
  • It’s better than a data warehouse in that have all the raw data there, but it’s a lot worse in that it’s so slow
  • “At the Hive layer, it’s kind of OK. But people think they’re going to use Hadoop for data warehouse…are pretty surprised that this hot new technology is 10x slower that what they’re using before,” Johnson says. “[Kudo, Impala, and Presto] are much better than Hive. But they are still pretty far behind where people would like them to be.”

The Register article based on a Gartner research talk recognises Hadoop’s strength for ETL processing, but highlights the issues with SQL-handling on Hadoop.

The Podium Data article states “Hadoop is terrible as a relational database”, and “Hadoop failed only in the sense that inflated expectations could never be met compared to mature commercial offerings.”

“The Growing Need for SQL for Hadoop” talks about the need for SQL for Hadoop. The ideal is to be “on Hadoop”, and thus processing data within the Hadoop cluster, rather than “off Hadoop” where data has to be extracted from Hadoop for processing.

Similarly, Rick van der Lans talks about “What Do You Mean, SQL Can’t Do Big Data?”, emphasising the need for SQL solutions when working with big data platforms.

RCA of the problem

There can be many reasons for current SQL-on-hadoop products not being performant.

Possibilities include:

  • overhead of starting and stopping processes for interactive workloads – to run relatively simple queries quickly, you need to reduce latency. If you have a lot of overhead for starting and stopping containers to run tasks, that is a big impediment to interactive usage, even if the actual processing is very efficient
  • product immaturity – a lot of commercial databases have built on the shoulders of giants. For example, this wiki lists a set of products that derive from PostgreSQL, including Greenplum, Netezza, ParAccel, Redshift, Vertica. This gives these products a great start in avoiding a lot of mistakes made in the past, particularly in areas such as SQL optimisation. In contrast, most of the SQL-on-hadoop products are built from scratch, and so developers have to learn and solve problems that were long-since addressed in commercial database products. That is why we see great projects like Presto only starting to add a cost-based optimiser now, and Impala not being able to handle a significant number of TPC-DS queries (which is why Impala TPC-DS benchmarks tend to show less than 80 queries, rather than the full 99 from the query set).
  • evolution from batch processing – if a product like Hive starts off based on Map-Reduce, its developers won’t start working on incremental improvements to latency, as they won’t have any effect. Similarly, if Hive is then adopted for a lot of batch processing, there is less incentive to work on reducing latency. Hive 2 with LLAP project aims to improve matters in this area, but in benchmarks such as this AtScale one reported by Datanami it still lags behind Impala and SparkSQL.

Solution

Whilst benchmarks show that SQL on Hadoop solutions like Hive, Impala and SparkSQL are all continually improving, they still cannot provide the performance that business users need.

Kognitio have an SQL engine originally developed for standalone clusters of commodity servers, and used by a host of enterprise companies. Due to this heritage, the software has a proven history of working effectively with tools like Tableau and MicroStrategy, and delivering leading SQL performance with concurrent query workloads – just the sort of problems that people are currently trying to address with data in Hadoop. The Kognitio SQL engine has been migrated to Hadoop, and could be the solution a lot of users of Hive, Impala and SparkSQL need today.

It has the following attributes:

  • free to use with no limits on scalability, functionality, or duration of use
  • mature in terms of query optimisation and functionality
  • performant, particularly with concurrent SQL query workloads
  • can be used both on-premise and in the cloud

For further information about Kognitio On Hadoop, try:

 

This post first appeared on LinkedIn on March 23, 2017.

Simple performance checks against your hardware cluster

24

Mar
2017
Posted By : Simon Darkin 1 Comment
performance hardware cluster, cpu. benchmarks

Kognitio have a lot of experience commissioning clusters of new hardware for our MPP software product. As part of that process, we’ve developed a number of steps for validating the performance of new clusters, and these are the topic of this blog entry.

 

There are many Linux based benchmarking tools on the market however they are not usually installed by default, in which case some simple command line tools can be used to quickly establish if there is a potential hardware issue that warrants further investigation.    The following hardware components are covered by this topic:

  • CPU
  • Disk
  • Networking
  • RAM

 

CPU   

A slow CPU or core could have an adverse effect on query performance, and so with the use of basic command line tools you can help identify laggards.  A background ‘for’ loop can be employed to ensure all cores/threads are tested simultaneously.

 

Integer arithmetic test

Invoked 8 times to run simultaneously against 8 cores

 

for i in `seq 1 8`; do time $(i=0; while (( i < 999999 )); do (( i ++ )); done)& done; wait

 

this will return the time taken to increment an integer over the specified range.  A comparison of the time taken by each core will help identify outliers

 

real    0m8.908s
user    0m8.765s
sys     0m0.140s

real    0m8.943s
user    0m8.789s
sys     0m0.156s

real    0m8.997s
user    0m8.761s
sys     0m0.112s

real    0m9.000s
user    0m8.853s
sys     0m0.144s

real    0m9.023s
user    0m8.881s
sys     0m0.140s

real    0m9.028s
user    0m8.861s
sys     0m0.168s

real    0m9.034s
user    0m8.857s
sys     0m0.176s

real    0m9.073s
user    0m8.781s
sys     0m0.156s

 

Whilst the test is running you can check that each core is under load by running top and expanding the output to show all cores.  If you do encounter outliers in the arithmetic test then you can use the output from top to identify which core(s) remain busy when others have finished

 

Cpu0  : 98.3%us,  1.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu1  : 99.0%us,  1.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu2  : 98.7%us,  1.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu3  : 99.3%us,  0.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu4  : 98.7%us,  1.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu5  : 98.0%us,  2.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu6  : 98.3%us,  1.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu7  : 98.7%us,  1.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

 

Compression test

As with the arithmetic test this example loops around 8 times so that 8 cores are tested simultaneously.  Data is written to /dev/null to avoid any overhead associated with disk IO.

 

for i in `seq 1 8`; do dd if=/dev/zero bs=1000 count=1000000 | gzip >/dev/null&  done; wait

 

this will return the rate at which each core is able to compress 1 GB of data

 

1000000000 bytes (1.0 GB) copied, 11.9277 seconds, 83.8 MB/s

1000000000 bytes (1.0 GB) copied, 11.9277 seconds, 83.8 MB/s

1000000000 bytes (1.0 GB) copied, 11.9545 seconds, 83.7 MB/s

1000000000 bytes (1.0 GB) copied, 11.9799 seconds, 83.5 MB/s

1000000000 bytes (1.0 GB) copied, 11.9831 seconds, 83.5 MB/s

1000000000 bytes (1.0 GB) copied, 12.0085 seconds, 83.3 MB/s

1000000000 bytes (1.0 GB) copied, 12.0382 seconds, 83.1 MB/s

1000000000 bytes (1.0 GB) copied, 12.2655 seconds, 81.5 MB/s

 

With Kognitio software installed you can use the wxtool command  to run the compression test simultaneously against all database nodes to aid comparison across the cluster as a whole.  You can download the software for free at http://kognitio.com/free-download/

 

wxtool -a '{can DB}' -S 'for i in `seq 1 8`; do dd if=/dev/zero bs=1000 count=1000000 | gzip >/dev/null&  done; wait'

 

For node kap1-1 (ecode 0, 866 bytes):

1000000000 bytes (1.0 GB) copied, 11.9422 seconds, 83.7 MB/s

1000000000 bytes (1.0 GB) copied, 11.9659 seconds, 83.6 MB/s

1000000000 bytes (1.0 GB) copied, 11.9876 seconds, 83.4 MB/s

1000000000 bytes (1.0 GB) copied, 12.0142 seconds, 83.2 MB/s

1000000000 bytes (1.0 GB) copied, 12.1293 seconds, 82.4 MB/s

1000000000 bytes (1.0 GB) copied, 12.3754 seconds, 80.8 MB/s

1000000000 bytes (1.0 GB) copied, 12.4132 seconds, 80.6 MB/s

1000000000 bytes (1.0 GB) copied, 12.4386 seconds, 80.4 MB/s

For node kap1-3 (ecode 0, 864 bytes):

1000000000 bytes (1.0 GB) copied, 11.8398 seconds, 84.5 MB/s

1000000000 bytes (1.0 GB) copied, 11.8661 seconds, 84.3 MB/s

1000000000 bytes (1.0 GB) copied, 11.8893 seconds, 84.1 MB/s

1000000000 bytes (1.0 GB) copied, 11.9165 seconds, 83.9 MB/s

1000000000 bytes (1.0 GB) copied, 11.946 seconds, 83.7 MB/s

1000000000 bytes (1.0 GB) copied, 11.953 seconds, 83.7 MB/s

1000000000 bytes (1.0 GB) copied, 11.9637 seconds, 83.6 MB/s

1000000000 bytes (1.0 GB) copied, 12.2996 seconds, 81.3 MB/s

For node kap1-3 (ecode 0, 866 bytes):

1000000000 bytes (1.0 GB) copied, 11.8757 seconds, 84.2 MB/s

1000000000 bytes (1.0 GB) copied, 11.8846 seconds, 84.1 MB/s

1000000000 bytes (1.0 GB) copied, 11.9178 seconds, 83.9 MB/s

1000000000 bytes (1.0 GB) copied, 11.9243 seconds, 83.9 MB/s

1000000000 bytes (1.0 GB) copied, 11.9377 seconds, 83.8 MB/s

1000000000 bytes (1.0 GB) copied, 11.9834 seconds, 83.4 MB/s

1000000000 bytes (1.0 GB) copied, 12.3367 seconds, 81.1 MB/s

1000000000 bytes (1.0 GB) copied, 12.3942 seconds, 80.7 MB/s

For node kap1-4 (ecode 0, 864 bytes):

1000000000 bytes (1.0 GB) copied, 11.91 seconds, 84.0 MB/s

1000000000 bytes (1.0 GB) copied, 11.9291 seconds, 83.8 MB/s

1000000000 bytes (1.0 GB) copied, 11.9448 seconds, 83.7 MB/s

1000000000 bytes (1.0 GB) copied, 11.9498 seconds, 83.7 MB/s

1000000000 bytes (1.0 GB) copied, 12.1232 seconds, 82.5 MB/s

1000000000 bytes (1.0 GB) copied, 12.3896 seconds, 80.7 MB/s

1000000000 bytes (1.0 GB) copied, 12.4449 seconds, 80.4 MB/s

1000000000 bytes (1.0 GB) copied, 12.4504 seconds, 80.3 MB/s

 

 

 

DISKS

Having just one underperforming disk in the system can significantly impact query performance against disk based tables. Here are some simple tests to help identify any anomalies.

 

Iterative write speed test with dd.

 

WARNING: THIS WRITE TEST WILL OVERWRITE DATA AT THE SPECIFIED DESTINATION AND SO MAKE SURE YOU AVOID AREAS THAT CONTAIN SYSTEM DATA OR USER DATA THAT YOU WISH TO KEEP

 

for i in `seq 1 3`; do echo "Loop $i"; dd if=/dev/zero of=/dev/cciss/c0d0p2 bs=10000 count=100000 conv=fsync; echo ""; done

 

this will return the duration and rate at data can be written out to disk.  In this example 1 GB of data is repeatedly written to a raw partition.  Note that fsync is used to flush the writeback cache and ensures data is written to the physical media.

 

Loop 1
100000+0 records in
100000+0 records out
1000000000 bytes (1.0 GB) copied, 13.6466 seconds, 73.3 MB/s

Loop 2
100000+0 records in
100000+0 records out
1000000000 bytes (1.0 GB) copied, 12.8324 seconds, 77.9 MB/s

Loop 3
100000+0 records in
100000+0 records out
1000000000 bytes (1.0 GB) copied, 12.4271 seconds, 80.5 MB/s

 

With Kognitio software installed, the test can be expanded to run on all database nodes allowing for easy comparison of all disks in the system

 

wxtool -a '{can DB}' -S 'for i in `seq 1 3`; do echo "Loop $i"; dd if=/dev/zero of=/dev/cciss/c0d0p2 bs=10000 count=100000 conv=fsync; echo ""; done'

 

Iterative read speed test with dd

 

for i in `seq 1 3`; do let skip=$i*5000; echo "Loop $i - skip = $skip"; sync ; echo 3 >/proc/sys/vm/drop_caches; dd if=/dev/cciss/c0d0p2 of=/dev/null bs=1000 count=1000000 skip=$skip ;echo ""; done

 

this will return the rate at which data can be read from disk.  In this example 1 GB of data is being read from a raw partition, adjusting the offset and flushing the buffer on each iteration to ensure data is being read from the physical media.

 

Loop 1 - skip = 5000
1000000+0 records in
1000000+0 records out
1000000000 bytes (1.0 GB) copied, 14.4355 seconds, 69.3 MB/s

Loop 2 - skip = 10000
1000000+0 records in
1000000+0 records out
1000000000 bytes (1.0 GB) copied, 12.9884 seconds, 77.0 MB/s

Loop 3 - skip = 15000
1000000+0 records in
1000000+0 records out
1000000000 bytes (1.0 GB) copied, 12.6045 seconds, 79.3 MB/s

 

With Kognitio software installed, the test can be expanded to run on all database nodes to aid comparison across the entire system.

 

wxtool -a '{can DB}' -S 'for i in `seq 1 3`; do let skip=$i*5000; echo "Loop $i - skip = $skip"; sync ; echo 3 >/proc/sys/vm/drop_caches; dd if=/dev/cciss/c0d0p2 of=/dev/null bs=1000 count=1000000 skip=$skip ;echo ""; done'

 

Iterative read speed test with hdparm

 

for i in `seq 1 3`; do echo "Loop $i"; hdparm --direct -t /dev/cciss/c0d0p2; echo ""; done

 

this will return the rate at which data can be read sequentially from disk without any file system overhead.

 

Loop 1
/dev/cciss/c0d0p2:
Timing O_DIRECT disk reads:  236 MB in  3.01 seconds =  78.40 MB/sec

Loop 2
/dev/cciss/c0d0p2:
Timing O_DIRECT disk reads:  236 MB in  3.02 seconds =  78.09 MB/sec

Loop 3
/dev/cciss/c0d0p2:
Timing O_DIRECT disk reads:  230 MB in  3.01 seconds =  76.30 MB/sec

 

With Kognitio software installed, the test can be expanded to run on all database nodes to aid comparison across the entire system.

 

wxtool -a '{can DB}' -S 'for i in `seq 1 3`; do echo "Loop $i"; hdparm --direct -t /dev/cciss/c0d0p2; echo ""; done'

 

Disk based table scan

 

If the cluster is running Kognitio database software you can initiate a scan of a large disk based table and review the output from wxtop in order to spot any disk store processes that remain busy for a significant period after others have finished.   For accurate results you should ensure there is no concurrent activity when performing this test.

 

select *
from <large disk based table>
where <condition unlikely to be true>;

 

Monitor the output from wxtop and look out for any disk store processes that remain busy when all or most others have finished.

 

PID       NODE        PROCESS                           SIZE      TIME
15784       kap1-1      WXDB(55): Diskstore             258036       100
22064       kap1-2      WXDB(18): Diskstore             257176        86
25179       kap1-3      WXDB(73): Diskstore top         258200        84
31237       kap1-4      WXDB(37): Diskstore             258068        77

 

If a disk store process does appear to lag behind, then you should eliminate the possibility of it being attributable to data skew by checking the row counts across all of the disks using the following query

 

select
mpid,
sum(nrows) nrows
from ipe_ftable
where table_id = <table_id being scanned>
group by 1
order by 2 desc;

 

 

NETWORKING

You can test the network links between nodes using some simple netcat commands.  This will allow you to spot links that are underperforming.

 

Link speed test using dd and netcat

 

The name and options associated with the netcat binary will depend on the Linux installation, however with Kognitio software installed you can use wxnetread and wxnetwrite for the data transfer regardless.

 

Setup a listening process on the node performing the read

 

netcat -l -p 2000 > /dev/null &

 

use dd to generate some data and pipe through netcat to the IP and port of the node performing the read

 

dd if=/dev/zero bs=1000 count=1000000 | netcat 14.4.21.2 2000

 

this will return the rate at which data can be copied over the link

 

1000000000 bytes (1.0 GB) copied, 8.54445 seconds, 117 MB/s

 

The same test as above this time using wxnetread/wxnetwrite

 

wxnetread -l -p 2000 > /dev/null &

 

dd if=/dev/zero bs=1000 count=1000000 | wxnetwrite 14.4.21.2 2000

 

1000000000 bytes (1.0 GB) copied, 8.5328 seconds, 117 MB/s

 

Shape tests

With Kognitio software installed you can run Shape tests to measure the speed at which RAM based data is re-distributed between nodes

 

wxtester -s <dsn> -u sys -p <password> -Ishape 5000 9000 1

 

Once the tests have been running for a few minutes you can navigate to the logs directory and check the data rate

 

cd `wxlogd wxtester`
grep TSTSHN results | gawk '{ if ($5==64) print ((($3*$5)/$6)/<number of database nodes>)/1048576 }'

 

With older generation hardware you can expect to see performance of  40MB/s/node given sufficient network bandwidth.       With newer hardware, for example HP Gen9 servers with 2x 56Gb/s links per node this increases to 90MB/s/core.

 

 

RAM

Benchmarking RAM performance is best left to a dedicated test suite, however you can perform a very simple write/read speed test using dd in conjunction with a temporary file storage facility in RAM, which at the very least can show up a mismatch in performance between nodes.

 

Write and read speed test using dd

 

mkdir RAM
mount tmpfs -t tmpfs RAM
cd RAM
dd if=/dev/zero of=ram_data bs=1M count=1000

 

this will return the rate at which data is written to RAM

 

1048576000 bytes (1.0 GB) copied, 1.06452 seconds, 985 MB/s

 

dd if=ram_data of=/dev/null bs=1M count=1000

 

this will return the rate at which data is read from RAM

 

1048576000 bytes (1.0 GB) copied, 0.6346 seconds, 1.7 GB/s

 

Strata + Hadoop World – San Jose

22

Mar
2017
Posted By : Sharon Kirkham Comments are off
Categories :#AnalyticsNews, Blog

The Kognitio team had a great trip to Strata + Hadoop World in San Jose last week and we would like to say a big thank you to everyone who stopped by for a chat about getting enterprise level performance for their SQL on Hadoop. We look forwarding to hearing from you when you try out Kognitio on Hadoop.

At the start of the conference we released our benchmarking whitepaper in which Kognitio outperformed Impala and Spark in a TPC-DS benchmarking exercise. This proved to be of great interest and kept us all really busy on the stand. Conversations ranged from people who have been using Hadoop a while and are having problems serving data to their end-user applications such as Tableau and Qliksense right through to those that are just starting out on their Hadoop journey and wanted to understand what Kognitio can bring to their solution stack.

The subject matter of the conference sessions indicates that there is a period of consolidation going on within the Apache® Hadoop® solution stack. Most topics were discussing how to get the most from more established projects and the challenges of enterprise adoption. There was very little new research presented which was a bit disappointing.

 

Marcel Kornacker and Mostafa Mokhtar from Cloudera presented a talk on optimising Impala performance that was really interesting. They had also been using the TPC-DS query set for benchmarking but obviously had to use a cut down version of the query set (75 out of 99 queries). The optimisation details will be useful for us to follow for Impala when we do the next round of benchmarking after Kognitio 8.2 is released in April. Their benchmarks were at the 1 TB and 10TB scale. Increasing scale to 10TB and concurrency above 10 streams is something that we would definitely like to do during the next set of benchmarks.

From a maths perspective it was great to see Bayesian inference in the data science mix. Michael Lee Williams from Fast Forward Labs presented a great overview. I will certainly be checking out some of algorithms and tools with a view to parallelising them within Kognitio’s external scripting framework.

Data streaming also continues to be at the forefront of the conference . It was clear from the number of sessions in the conference that more companies (such as Capital One) have experiences they want to share as well as plenty of contributions from established technology leaders such as Confluent. It is certainly something that we are thinking about here.

If you didn’t make it to our booth at San Jose we hope to see you at one of these upcoming events:

DWS17, Munich, Sponsor, Big Data

We’ll be on Booth #1003.

See us at the next Strata Data Conference in London

23-25 May 2017

Booth #511.

 

Participate in the Kognitio Console beta test program

10

Mar
2017
Posted By : Michael Atkinson Comments are off
Kognitio console, beta test program
Categories :Blog, Kognitio How To

Kognitio Console is Kognitio’s client side management program for the Kognitio Analytical Platform.

Some of its features are:

  • It allows inspection of the metadata tree for schemas, tables, views, external scripts, users, connectors, sessions, queues, etc.
  • It also gives an object view to each of these metadata objects, allowing their inspection and management.
  • There are also lots of tools, wizards and widgets to browse data in Hadoop, load and unload data, identify problem queries and many more.
  • There are also a set of reports and dashboards to monitor the state of Kognitio systems.
  • Macros may be written to extend Kognitio Console, the reports and dashboards are written in these XML macros.
  • Ad-Hoc queries may be executed.
  • KogScripts may be executed and debugged. KogScript is based on Lua but has Kognitio specific enhancements to run SQL natively.
  • It is an integrated development environment (IDE) for KogScripts, and external scripts running in Kognitio.

All this makes Kognitio Console a versatile tool, suitable for database admins, analysts and power users.

Kognitio Console is in constant development, beta and release candidate versions are provided from our update website. Beta and release candidates will be announced on the Kognitio Console forums

There are two ways of obtaining these betas:

Note that this is beta software, so has not gone through a full QA. However, we endeavour to make sure it has no obvious bugs and it will have passed our Console smoke tests. If you experience crashes or other bugs or deficiencies in these beta and release candidates, please report them to us, this will help us make the next release more stable.

By installing it into a different location than the default, you can have both this and the last stable releases installed at the same time.

Monitoring

02

Mar
2017
Posted By : Deborah Martin Comments are off
monitoring data, pnp4nagios
Categories :Blog, Data Science

So, we are now in an era where “Big Data” matters. We have oodles of choice when it comes to storing this data and a myriad of languages and tools to help us extract and report on that data.

But what about monitoring. Do you know if you’ll run out of disk storage in the next month, next six months ? Do you know if your database is performing as well as it should ? Do you need more RAM to satisfy your query requirements ?

These are questions invariably asked when there is a problem. When a database is initially commissioned and it is all new and shiny, the focus is usually to get up and running. Who cares what might happen in six months’ time, just tell me what my customers are doing!!

And then disaster strikes. Six months down the road, space is getting tight, the data feeds are growing and you’ve committed to keeping 3 years worth of data on the database at any one time.
It’s a perfect storm…

I can think of numerous times in the last 20 years when the above scenario and has happened and, dare I say it, could have been avoided if monitoring had been in place. And it doesn’t have to be confined to databases. Any resource you rely on, can and will eventually run out of steam.

For KAP, we monitor various aspects of a database to ensure its running smoothly. When it isn’t, we are told by the monitoring system so that we can proactively do something about it.

So, if there is a predictive disk failure, we can, 99% of the time, fix it before it becomes an issue by hot-swapping out the failed disks. No downtime. No bother. Happy client.

Typically, we also monitor query queues, disk states, KAP backups, JDBC-ODBC bridge, slab management, hardware health for NICs, Port monitoring and VPN monitoring among others.

We can also make our plugins available to clients. These plugins are compatible with Nagios Core (for more information, click here : Nagios Core) which is widely used and open source.

There are many other monitoring systems out there that will also allow our plugins to be easily adapted if required.

We can also produce reports using addons from the monitoring data which gives us an insight on the usage of any given database. For example, disk storage can often be underestimated – whether it be because the data grew faster than expected or that the retention of the data was longer than expected – both can have an impact. By, monitoring disk usage, we can see just how quickly the disk storage is being used and look to try and increase that storage with minimal impact to the client. We can do the same for RAM.

We use pnp4nagios (for more information, click here : pnp4nagios) which is an open source graphing tool that utilizes the data generated by Nagios Core.

 

The differences between KogScript running in Console and kogscript.exe

24

Feb
2017
Posted By : Michael Atkinson Comments are off
KogScript, Lua, SQL, programming, how to, console window
Categories :Blog, Kognitio How To

KogScript is a general purpose programming language based on Lua. It may be run within Console, which acts as an integrated development environment for it, or run from the command line tool kogscript.exe.

The KogScript running within Console and via kogscript.exe are compiled from the same source files and are identical with the following exceptions.
 

sqldebug

In kogscript.exe the global variable sqldebug is set to 3. The sqldebug variable is used to control how much information is output on each sql query within the KogScript as it is executed. A level of 3 shows the query status, and table contents.

In Console sqldebug is not defined, so it acts as if it had a value of 0, this means no information is output to the terminal on sql execution, instead Console has integration of queries within the script view as info and table and stored in the script history.

withinConsole

When running within Console the global variable withinConsole is set to true
 

Console integration

When running within Console there are two extra global tables holding data and methods used to integrate scripts with Console (within the table kog) and access Qt objects and methods (within the table qt).

An example of how to use the qt methods may be found in the Examples tool as “Using Qt from Kog Scripts”. This shows how to open Qt dialogs, and use Qt grids and views.

The kog global table holds functions to access the main elements of the Console application user interface. From those it is possible to access the rest of the user interface. There are about 100 classes and thousands of objects available. To find out which methods, variables and enumerations are available for one of the kog functions use list(), as in:

mw = kog.mainWindow()
list(mw)

Which gives outout like:

 

Loader and Unloader

To run either of these from KogScript, the script needs to be run within Console as they make use of Console integration to run the Loader or Unloader tools built into Console. The example below shows how to run Unloader from within Console.

-- Get a reference to the console main window.
mw = kog.mainWindow();

config =[[
FILENAME=C:/TEMP/simplevector_test.csv
TABLE="MIKE"."A_SIMPLE_VECTOR"
AUTOUNLOAD=1
SILENT=1]];

-- Now run the unloader using the configuration above. 
mw:runUnloader(config)

 

Conclusion

KogScript within Console and the kogscript.exe command line tools are identical as far as running SQL queries and the Lua language.

Where they differ is that within Console there are extra features to enable integration with Console itself.

How to generate an identity / auto_increment column in Kognitio

17

Feb
2017
Posted By : Michael Atkinson Comments are off
Categories :Blog, Kognitio How To

Occasionally we would like the value of the primary key field to be created automatically every time a new record is inserted. That is we would like an auto-incrementable field in a table.

What other databases do

Other databases use a variety of syntax to do this.

MySQL uses an AUTO_INCREMENT keyword, which is used within the table definition

CREATE TABLE fruit (
         ID int NOT NULL AUTO_INCREMENT,
         name varchar(255) NOT NULL,
         PRIMARY KEY (ID) );

 

The SQL Server syntax uses the IDENTITY keyword, again within the table definition, in this example starting at 1 and incrementing by 1.

CREATE TABLE city (
         ID int IDENTITY(1,1) PRIMARY KEY,
         name varchar(255) NOT NULL,
         PRIMARY KEY (ID) );

 

For Oracle you first need to create a sequence and then use its nextval attribute during inserts.

CREATE SEQUENCE seq_people
        MINVALUE 1
        START WITH 1
        INCREMENT BY 1
        CACHE 10;
INSERT INTO People (ID,FirstName,LastName)
VALUES (seq_people.nextval,'Mike','Atkinson');

 

Kognitio’s row_number syntax.

Kognitio does not have a way of creating a unique value for a field directly in the table definition, instead it has two ways of adding unique values to resultsets; and then these resultsets may be inserted into a table. The first uses the ROW_NUMBER function.

The ROW_NUMBER function assigns to each row in the partition or query a sequence number starting from one. This is only a number used in the context of the result set, if the result changes, the ROW_NUMBER will change. The ROW_NUMBER expression takes an ORDER BY statement with the column to be used for the row count with an OVER operator.

In the example below we use ROW_NUMBER to replace the subject name with an id value which is a foreign key reference into the subjects table.

set schema yorschema;
create table facts_origin (subject varchar, predicate varchar, object varchar);
create table subjects (id int, name varchar not null, primary key(id) );
create table facts (subject int, predicate varchar, object varchar, 
               foreign key (subject) references subjects(id) );

insert into facts_origin values ('Peter', 'Brother', 'Sarah'), 
                                                           ('Peter', 'Brother', 'Anne'), 
                                                           ('John', 'Father', 'Peter');


insert into subjects
select with
    ids(name) as (select distinct subject from facts_origin)
select row_number() over (order by name) as "id", name  from ids;

insert into facts
    select s.id, f.predicate, f.object
        from facts_origin f, subjects s
        where f.subject=s.name;

select * from subjects;
select * from facts;
Which gives a subjects table of: And a facts table of:
ID NAME
1 John
2 Peter
SUBJECT PREDICATE OBJECT
1 Father Peter
2 Brother Anne
2 Brother Sarah

 

Kognitio’s generate_key syntax.

For an individual SELECT statement the GENERATE_KEY function generates a unique key for each row. These rows may then be inserted into a table.

The result is an INT8 data type with a value greater than or equal to zero. The results are not typically contiguous, but are guaranteed to be unique for an individual SELECT statement. The values generated are dependent upon the number of Kognitio nodes and the distribution of the data. Rerunning a query may not generate the same results. Multiple occurrences of GENERATE_KEY in a SELECT list will all produce the same result within a single row.

GENERATE_KEY can only be used in the SELECT list, it cannot be used in WHERE, HAVING, GROUP BY or ORDER BY clauses. Within the SELECT list you can perform arithmetic on the columns containing the GENERATE_KEY function.

insert into subjects
select with
    ids(name) as (select distinct subject from facts_origin)
select generate_key(), name  from ids;

The Kognitio SQL Guide has examples of good practice for ensuring that addition INSERT statements have unique keys and how to use them in ETL processes.

 

A more complete example

The following example shows many techniques, it uses both row_number and generate_key functions and illustrates how to use them to add entries to existing tables.

It also illustrates the use of KogScript functions, KogScript variables and a more complex “Grandparent” query.

set schema yourschema;

-- drop tables, they may have been created by previous runs of the script
drop table facts_origin;
drop table facts;
drop table entities;
drop table preds;

-- create tables to use
create table facts_origin (subject varchar, predicate varchar, object varchar);
create table entities (id int, name varchar not null, primary key(id) );
create table preds (id int, name varchar not null, primary key(id) );
create table facts (subject int, predicate int, object int,
               foreign key (subject) references entities(id),
               foreign key (predicate) references preds(id),
               foreign key (object) references entities(id) );

-- add some facts, a small genealogy
insert into facts_origin
     values ('Peter', 'Brother', 'Sarah'), ('Peter', 'Brother', 'Anne'), ('John', 'Father', 'Peter');
insert into facts_origin
     values ('Peter', 'Father', 'James'), ('Jane', 'Mother', 'Peter'), ('Mary', 'Mother', 'John');

-- create the entities table
insert into entities
select with
    ents(name) as (select distinct subject from facts_origin union select object from facts_origin)
select row_number() over (order by name) as "id", name  from ents;


-- create the preds table
insert into preds
select with
    ids(name) as (select distinct predicate from facts_origin)
select generate_key(), name  from ids;

-- create the facts table, this uses references to the entities and preds tables
insert into facts
    select s.id, p.id, o.id
        from facts_origin f, entities s, preds p, entities o
        where f.subject=s.name and f.predicate=p.name and f.object=o.name;

-- check the tables are what we expect
select * from entities;
select * from preds;
select * from facts;

-- Query for all facts where 'Peter' is the subject
with
  -- select the subject id
  s(id) as (select id from entities where name='Peter'),
  -- Filter the facts table for that subject id
  f(subject, predicate, object) as (select subject, predicate, object from facts, s 
                        where subject in (s.id) )
  -- Then join the filter results with the subjects table to get the subject name
select s.name as "SUBJECT", p.name, o.name as "OBJECT"
    from f, entities s, preds p, entities o 
    where f.subject=s.id and f.predicate=p.id and f.object=o.id;


-- Use a function to add entities if they do not already occur in the entities table.
function addEntity(p)
   local t = select id from entities where name='$p';
   if #t.rows~=1  then
       insert into entities
           select mk + 1, '$p' from (select max(id) from entities) dt(mk);
                 t = select id from entities where name='$p';
       assert(#t.rows==1)
   end
   return t.rows[1][1];
end

-- Use a function to add predicates if they do not already occur in the preds table.
function addPred(p)
   local t = select id from preds where name='$p';
   if #t.rows~=1  then
       insert into preds
           select generate_key() + mk + 1, '$p' from (select max(id) from preds) dt(mk);
                 t = select id from preds where name='$p';
       assert(#t.rows==1)
   end
   return t.rows[1][1];
end


-- adding facts
function addFact(s,p,o)
    local sub = addEntity(s) – use local to avoid polluting the global namespace
    local pred = addPred(p)
    local obj = addEntity(o)
    local t = select count(*) as "count" from facts
                               where subject=$sub and predicate=$pred and object=$obj;
    if #t.rows~=1 or t.rows[1].count==0 then
         insert into facts values ($sub,$pred,$obj);
    end

end

addFact('Garry', 'Father', 'Mary')


-- grandparent view.

parent = addPred('Parent')
grandparent = addPred('Grandparent')

-- Now form the query as a view, so that it may be reused easily
create view grandparent
select with
    -- Define the father_of subquery
    father_of(s,p,o) as  (select subject, predicate, object from facts
                 where predicate in (select id from preds where name='Father') ),
    -- Define the mother_of subquery
    mother_of(s,p,o) as (select subject, predicate, object from facts
                 where predicate in (select id from preds where name='Mother') ),
    -- Define the parent_of subquery, this uses the Parent predicate id we have defined above
    parent_of(s,p,o) as  (select s,$parent,o from father_of union
                 select s,$parent,o from mother_of),
    -- Define the grandparent_of subquery, this uses the Grandparent predicate id we 
    -- have defined above
    grandparent_of(s,p,o) as (select p.s, $grandparent, c.o from parent_of p, parent_of c where p.o=c.s)
-- Now output the result in a human readable form
select s.name as "SUBJECT", p.name as "PREDICATE", o.name as "OBJECT"
    from grandparent_of f, entities s, preds p, entities o 
    where f.s=s.id and f.p=p.id and f.o=o.id;


-- Now we can perform various queries
-- Who are John's grandchildren?
select object as "Grandchild" from grandparent where subject='John';

-- Who are Peter's grandparents?
select subject as "Grandparent" from grandparent where object='Peter';

-- Is Jane the grandparent of James?
select 'Yes' from grandparent where subject='Jane' and object='James';

And here is the entity diagram for the facts table

 

Summary

Various databases have different ways of creating unique values in a field. The Kognitio Analytical Platform has two methods GENERATE_KEY and ROW_NUMBER, both of which generate unique values in a subquery that may then be inserted into a table, giving a column in the table with a unique value as a result.

Kits, cats, sacks, wives: unfolding JSON data into tables

14

Feb
2017
Posted By : Graeme Cole Comments are off
unfolding json data into tables, csv
Categories :Blog, Kognitio How To
Tags :  ,,

Each wife had seven sacks…

Converting simple, flat JSON data into CSV or another tabular format is a problem addressed by many freely-available JSON-to-CSV converters. However, these are usually only useful when the JSON is a series of flat objects without any nested arrays.

In practice, real-world JSON files often have nested arrays containing more objects, each of which may contain arrays of their own. They tend to look like the old riddle As I was going to St Ives:

As I was going to St. Ives,
I met a man with seven wives.
Each wife had seven sacks,
Each sack had seven cats,
Each cat had seven kits.
Kits, cats, sacks, wives,
How many were there going to St. Ives?

(more…)

What’s new in Kognitio Console version 80199s170123

07

Feb
2017
Posted By : Michael Atkinson Comments are off
Whats new in Kognitio Console
Categories :Blog, Kognitio How To

SQL in expression contexts

 Up until this release SQL could only occur in Lua statements or assignments. For example in Kognitio Scripts we can write SQL within the script as a statement or assigning the results table to a variable.

drop table hr.employee;
t = select * from sys.ipe_user; 

This release of Kognitio Console adds the ability to place SQL in Lua expression contexts. So now within Kognitio Scripts we can place SQL in most places where a Lua expression may occur.

In the example below the data for the Kognitio users are read from the table sys.ipe_user as a Lua table. The rows element of that table is then passed to the apply function which passes each row in turn to the printRow function. The printRow function prints the NAME element of the row to the console, which is attached to the Kognitio Console Script Variables pane.

function apply(t, fn)
    for i, row in pairs(t) do fn(row) end
end
 
function printRow(row) print(row.NAME) end
 
apply( (select * from sys.ipe_user).rows, printRow) 

This feature currently has the limitation that it is not easy for the parser to determine where the SQL query ends. The code below fails because the ‘,’ character does not end an SQL query so it thinks that the , printRow is part of the query. The failure is at run time when the SQL is parsed in the Kognitio Analytical Platform and the SQL select * from sys.ipe_user, printRow errors.

function apply(t, fn)
    for i, row in pairs(t.rows) do fn(row) end
end
 
function printRow(row) print(row.NAME) end
 
apply(select * from sys.ipe_user, printRow) -- ERRORS: , does not end the SQL query

Instead we use ‘;’ to end the SQL expression:

apply(select * from sys.ipe_user; printRow) – Note use of ; to end SQL

Of particular use while debugging scripts is the form:

– any expression of the form print(<sql query>), e.g.
print(select * from sys.ipe_user) 

which outputs the formatted query results to Kognitio Console “Script Variables” pane.

Lua lfs module

The Lua File System module adds common file system manipulation commands to Lua. The lfs module is adapted from the Kepler project  and offers a portable way to access the underlying directory structure and file attributes.

lfs.attributes (filepath [, aname | atable])

Returns a table with the file attributes corresponding to filepath (or nil followed by an error message in case of error). If the second optional argument is given and is a string, then only the value of the named
attribute is returned (this use is equivalent to
 lfs.attributes(filepath)[aname], but the table is not created and only one attribute is retrieved from the O.S.). if a table is passed as the second argument, it is filled with attributes and returned instead of a new table. The attributes are described as follows; attribute mode is a string, all the others are numbers, and the time related attributes use the same time reference of os.time:

dev

on Unix systems, this represents the device that the inode resides on. On Windows systems, represents the drive number of the disk containing the file

ino

on Unix systems, this represents the inode number. On Windows systems this has no meaning

mode

string representing the associated protection mode (the values could be file, directory, link, socket, named pipe, char device, block device or other)

nlink

number of hard links to the file

uid

user-id of owner (Unix only, always 0 on Windows)

gid

group-id of owner (Unix only, always 0 on Windows)

rdev

on Unix systems, represents the device type, for special file inodes. On Windows systems represents the same as dev

access

time of last access

modification

time of last data modification

change

time of last file status change

size

file size, in bytes

permissions

file permissions string

blocks

block allocated for file; (Unix only)

blksize

optimal file system I/O blocksize; (Unix only)

This function uses stat internally thus if the given filepath is a symbolic link, it is followed (if it points to another link the chain is followed recursively) and the information is about the file it refers to. To obtain information about the link itself, see function lfs.symlinkattributes.

lfs.chdir (path)

Changes the current working directory to the given path.

Returns true in case of success or nil plus an error string.

lfs.currentdir ()

Returns a string with the current working directory or nil plus an error string.

iter, dir_obj = lfs.dir (path)

Lua iterator over the entries of a given directory. Each time the iterator is called with dir_obj it returns a directory entry’s name as a string, or nil if there are no more entries. You can also iterate by calling dir_obj:next(), and explicitly close the directory before the iteration finished with dir_obj:close(). Raises an error if path is not a directory.

lfs.link (old, new[, symlink])

Creates a link. The first argument is the object to link to and the second is the name of the link. If the optional third argument is true, the link will by a symbolic link (by default, a hard link is created).

lfs.lock (filehandle, mode[, start[, length]])

Locks a file or a part of it. This function works on open files; the file handle should be specified as the first argument. The string mode could be either r (for a read/shared lock) or w (for a write/exclusive lock). The optional arguments start and length can be used to specify a starting point and its length; both should be numbers.

Returns true if the operation was successful; in case of error, it returns nil plus an error string.

fs.lock_dir (path, [seconds_stale])

Creates a lockfile (called lockfile.lfs) in path if it does not exist and returns the lock. If the lock already exists checks if it’s stale, using the second parameter (default for the second parameter is INT_MAX, which in practice means the lock will never be stale. To free the the lock call lock:free()

In case of any errors it returns nil and the error message. In particular, if the lock exists and is not stale it returns the “File exists” message.

lfs.mkdir (dirname)

Creates a new directory. The argument is the name of the new directory.

Returns true if the operation was successful; in case of error, it returns nil plus an error string.

lfs.rmdir (dirname)

Removes an existing directory. The argument is the name of the directory.

Returns true if the operation was successful; in case of error, it returns nil plus an error string.

lfs.setmode (file, mode)

Sets the writing mode for a file. The mode string can be either “binary” or “text”.

Returns true followed the previous mode string for the file, or nil followed by an error string in case of errors. On non-Windows platforms, where the two modes are identical, setting the mode has no effect, and the mode is always returned as binary.

lfs.symlinkattributes (filepath [, aname])

Identical to lfs.attributes except that it obtains information about the link itself (not the
file it refers to). On Windows this function does not yet support links, and is identical to
 lfs.attributes.

lfs.touch (filepath [, atime [, mtime]])

Set access and modification times of a file. This function is a bind to utime function.

The first argument is the filename, the second argument (atime) is the access time, and the third argument (mtime) is the modification time. Both times are provided in seconds (which should be
generated with Lua standard function 
os.time). If the modification time is omitted, the access time provided is used; if both times are omitted, the current time is used.

Returns true if the operation was successful; in case of error, it returns nil plus an error string.

lfs.unlock (filehandle[, start[, length]])

Unlocks a file or a part of it. This function works on open files; the file handle should be specified as the first argument. The optional arguments start andlength can be used to specify a starting point and its length; both should be numbers.

Returns true if the operation was successful; in case of error, it returns nil plus an error string.

Improvements to Kognitio Script execution and debugging

Metatables with a __tostring method have been added to the table returned by an SQL query. This allows formatted printing of those tables.

t = select * from sys.ipe_user;
print(t) –[[ prints the text below:
|NAME         |ID|DFLT_SCNO|SEC_CLASS|SCH_CLASS|STATUS|PTIME              |
|SYS          |0 |0        |0        |0        |0     |2016-12-21 11:19:57|
|GRP_LOG      |3 |0        |0        |0        |8     |2016-12-21 11:20:15|
|GRP_MONITOR  |1 |0        |0        |0        |8     |2016-12-21 11:20:15|
|GRP_DISKUSAGE|4 |0        |0        |0        |8     |2016-12-21 11:20:15|
|GRP_DBG      |2 |0        |0        |0        |8     |2016-12-21 11:20:15|
]]

This may also be used in the kogscript.exe command line tool.

The SQL error string is added to the table returned from the SQL query, and this is also printed out by the metatable __tostring method

t = select * from nonexistent;
if t.status ~= 0 then 
    print("Query failed: " .. t.error) --[[ prints the text below:
    Query failed: ERROR: [Kognitio][WX2 Driver][latest] CI3013: Table SYS.NONEXISTENT does not exist, offset 15
    ]]
end
 
print(t) –[[ prints the text below:
ERROR: [Kognitio][WX2 Driver][latest] CI3013: Table SYS.NONEXISTENT does not exist, offset 15
]]

Breakpoints may now also be placed on Lua statements within the Kognitio Script, even if the “Trace Kog Script Lines” option is not set. After the breakpoint is hit, single stepping then steps through each Lua line.

Timestamps are now converted to a table with year, month, day, hour, min and sec fields:

t = select * from sys.ipe_allview_img;
list(t.rows[1].CREATE_TIME) –[[ prints the text below:
lua::number day = 16
lua::number min = 45
lua::number month = 1
lua::number year = 2017
lua::number hour = 10
lua::number sec = 34
]]

Facebook

Twitter

LinkedId