CREATE EXTERNAL SCRIPT

CREATE EXTERNAL SCRIPT is used to create Kognitio external scripts. These are callable objects within the SQL schema that take data from the database and pass it to an external process on the nodes, mainly to perform tasks difficult to do in SQL. Anything Linux compatible executable will work (Bash, Python, R etc.) but a script environment needs to be created first.

Usage

CREATE EXTERNAL SCRIPT script-name
ENVIRONMENT script-environment
[ RECEIVES({input-column [data-type]},…) ]
[ INPUT 'input-attribute-list' ]
[ PARTITION BY part-column, ... ORDER BY order-column, ... ]
SENDS ({result-column [data-type]}, ...)
[ OUTPUT 'output-attribute-list' ]
[ {LIMIT n THREADS | LIMIT m THREADS PER NODE | SET NO THREADS LIMIT} ]
[ REQUIRES k {MB|GB} RAM ]
[ {[NOT] RUN ON {('{node-name}', ...) | ALL} ]
[ {DEFAULT | ISOLATE | SEPARATE | MIX} PARTITIONS ]
SCRIPT 'extscript(
    Code in language matching script-environment
)extscript';

Notes

Each external script must have the following 3 elements:

  • An appropriate script-environment declared that can be used to run the code between the quotes in the SCRIPT statement. See how to set up script environments or refer to the script environment DDL syntax for more details.

  • The result-column declaration defines the data field(s) the external script SENDS back to Kognitio. The format is

    result-column1 data-type1, result-column2 data-type2, ...
    

    The data-type for each result-column sent back to Kognitio must be included unless the name exactly matches an input-column name.

  • The SCRIPT command contains the code to be executed by the script-environment. This must be contained within single quotes. An optional name such as extscript can be used with brackets along with an S preceding the first quote to help delimit the script.

All other elements of an external script declaration are optional. However it is essential to have the following syntax included in any script where data is supplied to the script by Kognitio:

  • RECEIVES - defines data that is received from Kognitio by the script. The input-cloumn declarations have the same format as result-column declarations. The data received is specified via an SQL query. For more details see external script invocation syntax or creating external scripts section for some simple examples

All other syntax is used to set the external script configuration and control the flow of data into invocations of the external script:

  • INPUT and OUTPUT - for advanced formatting of data the script RECEIVES from and SENDS back to Kognitio.
  • PARTITION BY - clause declaring the columns whose values are used to split the data as it is passed into the script invocations
  • ORDER BY - clause declaring the columns whose values are used to order the data as it is passed into the script invocations
  • LIMITS n THREADS - limits the total number of simultaneaous invocations of an external script across the whole cluster. Therefore controls the execution parallelism of the external script
  • LIMITS m THREADS PER NODE - limits the number simultaneous invocations of the external script on each node. This is largely redundent since the introduction of the Kognitio external memory resource scheduler. However there may be specific circumstances when this needs to be applied such as finding out about code package installations.
  • REQUIRES k MB/GB RAM - controls the maximum amount of RAM that each script invocation can be allocated. This is a key setting for scripts that need more memory in order and for maximising parallelism
  • RUN ON NODES – used to limit the nodes that an external script runs on. This is useful if you have limited licenses for a particular third-party software or a binary is not available across all nodes in your Hadoop cluster
  • PARTITIONS – declaration to set the partitioning strategy controlling how the partitioned data is presented to the invocations. Only applicable if the PARTITION BY clause is present.

For more information about how to use these optional arguments, see the advanced configuration section.