Formatting Input and Output

The input received by a Kognitio external scripting and the output a script sends back to Kognitio when invoked is fully configurable in a similar way to data that loaded or unloaded in Kognitio, including from external connectors. The most common of these that are applied to external scripts are outlined in this section. A full list is included in the quick reference script for target string format attributes.

Some useful formatting attributes for data include:

  • column_headers - information about the data input is passed into external scripts. The default behaviour is to pass just the column names as the first row into every script invocation.

  • column_header_format - this changes the format of the information passed in the column_headers. The default setting 0 includes column names only, 1 includes extra information about the columns, see the passing additional column information below.

  • fmt_percent_encode - the default setting for external scripts is 1 for ease of parsing on the field seperator. All fields are expected to be unquoted, and any problematic characters in fields (e.g. the field separator) are replaced with a % symbol followed by the hex character code. For example, a comma would be encoded as %2C.

  • fmt_field_seperator - can specify the separator that will appear between the columns (fields) in the input or output. By default, Kognitio does not expect external script strings to be quoted. Therefore if strings contain commas (the default separator) that are not percent encoded these can be interpreted as delimiters. This attribute allows the delimiter to be changed to another string that will not be in the data.

Example: Passing header information into R for use in model building

Example: Using a format string to produce a list of files on a node

Passing additional column information

In some circumstances it may be of use to pass more information about the input columns than just the name. This can be done by setting the additional format attribute column_header_format to 1. In this case an extended header row of the form: COLNUM: NAME: TYPESTR: NULLFLAG [:OPT1][: OPT2] is passed into each script invocation.

It is likely that the main information required will be the data type held in TYPESTR and possibly NULLFLAG. Other information is largely for Kognitio internal or diagnostic purposes but these are listed in the table below for completeness.

Type

Description

COLNUM

An internal sequeunce value for the column

NAME

Column name

TYPESTR

Data type as a string e.g. INT1, INT8,REAL,TIMESTAMP, STRING, STRING32, etc.

NULLFLAG

Indicates whether a column is nullable

OPTn

These fields are dependent on type

TIME, TIMESTAMP, INTERVAL_DT, INTERVAL_DM

OPT1 is a bitmask describing which columns are present

STRING, STRING16, STRING32

OPT1 is the length (negative value indicate the max. length of a variable length string)

OPT2 is the character set number

DECIMAL

OPT1 is the precision

OPT2 is the scale