Introduction

Kognitio has two mechanisms for external programs to provide or operate on data during an SQL query: external tables and external scripts. These can be written in the user’s preferred language, for example R, Python, shell or even C.

External tables are used like normal tables in SQL queries but the rows are supplied by an external “connector” program rather than being stored by WX2. The connector could generate the rows itself (TPC-H data, for example) or fetch them from another database, HDFS or an internet data source.

External scripts are table functions in SQL which take a subquery “argument” and return a table. They allow users to create complex analytics or achieve functionality that the database itself doesn’t provide. The script is run under the “script environment” interpreter with the subquery rows as input, and the output rows will be returned to the SQL query.

Here is an example of an external table and an external script. There are more examples in
wx2/current/examples/excon_* and wx2/current/examples/exscripts.sql under your WX2 install directory.

-- External table to return the current environment for each connector instance.
create connector EXCON_ENV
    -- adjust this path if required
    command '/opt/kognitio/wx2/current/examples/excon_env';
create external table ENV (CONNECTOR_NUM int, 
                           NAME varchar(10000),
                           VALUE varchar(10000))
    from EXCON_ENV;

-- Results: environment of each connector
--          e.g. (0, 'WX2_TNO', '629'), (0, 'WX2_METHOD', 'LOAD'), ...
select * from ENV;

-- External script to perform an arbitrary regular expression substitution
-- using the stream editor, sed.
create script environment SED command '/bin/sed -f';
create external script REGEXP environment SED
    receives (TEXT varchar) input 'format "%{quoted=0}s\\n"'
    sends (SUBST_TEXT varchar) output 'format "%0c\\n"'
    script 's/ko\(.*\)/i\1n/';

-- Results: ('ignition')
external script REGEXP from (select 'kognitio');

Kognitio on Hadoop

The following sections on user mapping apply only to Kognitio running in appliance mode.

For Kognitio on Hadoop, external connectors and scripts currently run as the same user that runs the Kognitio on Hadoop processes. This is expected to change in the future to a broadly similar scheme tailored to Hadoop.

Why map users?

The examples above are simple so it doesn’t matter which Linux user runs the external script or connector processes, but it can do in practice. You might want your script to run as a user that has already been set up, perhaps with SSH keys or a uid needed for NFS.

It is also preferable not to execute user-supplied code as a privileged user – or in this case one with access to database files.

Kognitio chooses the Linux user based on the Kognitio user and the query context. By default – unless it has been configured otherwise as described below – it will use the user wxextern, which is created when WX2 is installed. (But wxextern is not created automatically by upgrade from versions earlier than 8.1.)

Allowed Linux users

Kognitio doesn’t simply allow scripts or connectors to run as any Linux user. By default only wxextern is allowed. Any other user must be allowed by name or uid (user id) by editing the “[security]” section of the config file as follows.

A Linux user is allowed if:

  1. The user name is listed in the config file setting “ext_users_allowed” comma-separated list of user names; or
  2. The config file settings “min_ext_user_id” and “max_ext_user_id” are set to positive numbers with min_ext_user_id <= uid <= max_ext_user_id, and the user must not be listed in “ext_users_denied” comma-separated list of user names.

(By default ext_users_allowed is “$extern” and ext_users_denied is “root“. Strictly, in (2) only one of min_ext_user_id and max_ext_user_id needs to be non-negative, with the inequality ignored for the negative value.)

You need to restart WX2 (“wxserver start“) after changing any of these config file settings for them to take effect.

User map strings

Kognitio attempts to map from a Kognitio user to a Linux user using a user map string. This is a comma-separated list of any of the following elements:

User map element Linux user to match
name User name
& Same as Kognitio user (e.g. SYS)
$extern Kognitio extern user, by default wxextern
$admin Kognitio admin user, by default wxadmin
$master Kognitio master user, by default wxroot
$owner Kognitio owner user, by default root

It will try each name in turn and will use the first one which, when converted to lower case, gives the name of a Linux user which is allowed by the security parameters. If no valid match is found then it will fail with the error “ET0111: User mapping for the external operation was invalid”.

For example, for the user map string “kognitio,&,$extern” for a Kognitio user called MEPHISTOPHELES it will try Linux users kognitio, mephistopheles and wxextern.

Note that this is evaluated separately on each node, so it is recommended that the Linux user names and uids are synchronised across the cluster. Otherwise it is possible for different user names to be matched, the same name being matched but the uids differing, or names being matched on only a subset of nodes (the ET0111 error will be raised in this case).

If an ET0111 error is raised the Kognitio user and the user map string are logged, as of version 8.2, in the server debug log file (look at `wxlogd smd`/serverdbg.*), which should be helpful to see why it didn’t work. For example:

T_2017-04-21_09:06:12_BST: ET: Invalid user mapping (ET0111) on 'hp-rack2-enc2-11' for 'SYS' -> 'kognitio,&,$extern'

Which user map string is used?

When Kognitio is about to launch an external script or connector it obtains a user map string. It first looks in IPE_EXTERNAL_USER_MAP for a user map string based on the execution context; if it doesn’t find one then it uses at the config file setting “[boot options] default_usermap“, which defaults to “&,$extern“. It then tries to map the current Kognitio user to a Linux user as described above.

For external scripts the search is performed in this order matching against the columns (OBJECT_TYPE, OBJECT_ID, USER_ID) of IPE_EXTERNAL_USER_MAP.

PRIVDOMAIN_EXT_SCRIPT(13)Script Id-1PRIVDOMAIN_SCRIPT_ENV(10)Script Environment IdUserIdPRIVDOMAIN_SCRIPT_ENV(10)Script Environment Id-1PRIVDOMAIN_EXT_SCRIPT(13)-1User IdPRIVDOMAIN_EXT_SCRIPT(13)-1-1-1-1User Id

OBJECT_TYPE OBJECT_ID USER_ID
PRIVDOMAIN_EXT_SCRIPT(13) Script Id UserId
-1 -1 -1

For external tables the search is performed in this order matching against the columns (OBJECT_TYPE, OBJECT_ID, USER_ID) of IPE_EXTERNAL_USER_MAP.

PRIVDOMAIN_TABLE(3)Table Id-1PRIVDOMAIN_CONNECTOR(9)Connector IdUser IdPRIVDOMAIN_CONNECTOR(9)Connector Id-1PRIVDOMAIN_TABLE(3)-1User IdPRIVDOMAIN_TABLE(3)-1-1-1-1User Id

OBJECT_TYPE OBJECT_ID USER_ID
PRIVDOMAIN_TABLE(3) Table Id User Id
-1 -1 -1

The administrator can add and remove mappings in IPE_EXTERNAL_USER_MAP using this special syntax.

CREATE EXTERNAL USER MAP
  [OBJECT_TYPE=[EXTERNAL SCRIPT|SCRIPT ENVIRONMENT|EXTERNAL TABLE|CONNECTOR],]
  [OBJECT_ID = 1234,]
  [USER_ID = 1234,]
  USERMAP = '<user map>';

DROP EXTERNAL USER MAP
  [OBJECT_TYPE=[EXTERNAL SCRIPT|SCRIPT ENVIRONMENT|EXTERNAL TABLE|CONNECTOR],]
  [OBJECT_ID = 1234,]
  [USER_ID = 1234];