Using input data in external scripts

The majority of Kognitio external scripting will also contain a declaration of the input-column(s) naming the data that the script RECEIVES from Kognitio. The exception is any script that is used to generate data such as the example on the anonymous inline external script page. The data-type for the input-column(s) in the RECEIVES declaration are optional. If these are not included then the data types are taken automatically from the corresponding data types of the query being used as input into the external script. This means different data types can be passed to the same script in subsequent calls provided the external script can handle different data types in the same input variable.

Example: perl script to parse words from input

A simple task that SQL is not designed for is the ability to parse information, such as individual words, out of large text fields. The user will not typically know how many pieces of information are contained in each field therefore for each input row there can be many output rows.

The programming language perl has powerful tools often used for text manipulation:

CREATE EXTERNAL SCRIPT myschema.word_split ENVIRONMENT perl
RECEIVES (sometext)
SENDS(words VARCHAR(200))
SCRIPT S'EOF(
while(<>)
{   #remove trailing \n
    chomp();
    #
    #replace punc. and numbers with white space
    s/[\,\.\!\_\{\}\(\)\#\=\+\:\\0-9]/ /g;
    #
    # split on whitespace of any length
    foreach $c (split(/ +/))
    {
        #print if only made up of letters
        if($c =~ /^[a-zA-Z]+$/)
        {
            print "$c\n"
        }
    }
}
)EOF'

Regular expressions identify and output individual words within the sometext input field. Note the data type for sometext is undefined so this is assigned from the query supplying the input. This is particularly useful for handling different character sets such as UTF8 with the same script.

The inclusion of a while(<>) loop means input data of any size is handled including the possibility that no data is fed to the script invocations. This can happen when the external script is invoked in parallel for efficient processing or there is no input data resulting from the query feeding the script. All external script must handle zero data input without erroring.

The script can be used with any textual field as input from a simple sentence, google search url to results of a SQL query:

--Query 1 simple sentence
EXTERNAL SCRIPT myschema.word_split
from (SELECT 'The quick brown box jumped over the lazy dog.');

--Query 2 google search url
EXTERNAL SCRIPT myschema.word_split
from (SELECT 'https://www.google.co.uk/?gws_rd=ssl#q=aesop%27s+fables+and+greek+myths');

--Query 3 results from query
EXTERNAL SCRIPT myschema.word_split
from (SELECT command FROM sys.ipe_command);

The first 10 results from each query are shown below:

../../_images/word_split.png