How to generate an identity / auto_increment column in Kognitio
Occasionally we would like the value of the primary key field to be created automatically every time a new record is inserted. That is we would like an auto-incrementable field in a table.
What other databases do
Other databases use a variety of syntax to do this.
MySQL uses an AUTO_INCREMENT keyword, which is used within the table definition
CREATE TABLE fruit (
ID int NOT NULL AUTO_INCREMENT,
name varchar(255) NOT NULL,
PRIMARY KEY (ID) );
The SQL Server syntax uses the IDENTITY keyword, again within the table definition, in this example starting at 1 and incrementing by 1.
CREATE TABLE city (
ID int IDENTITY(1,1) PRIMARY KEY,
name varchar(255) NOT NULL,
PRIMARY KEY (ID) );
For Oracle you first need to create a sequence and then use its nextval attribute during inserts.
CREATE SEQUENCE seq_people
MINVALUE 1
START WITH 1
INCREMENT BY 1
CACHE 10;
INSERT INTO People (ID,FirstName,LastName)
VALUES (seq_people.nextval,'Mike','Atkinson');
Kognitio’s row_number syntax.
Kognitio does not have a way of creating a unique value for a field directly in the table definition, instead it has two ways of adding unique values to resultsets; and then these resultsets may be inserted into a table. The first uses the ROW_NUMBER function.
The ROW_NUMBER function assigns to each row in the partition or query a sequence number starting from one. This is only a number used in the context of the result set, if the result changes, the ROW_NUMBER will change. The ROW_NUMBER expression takes an ORDER BY statement with the column to be used for the row count with an OVER operator.
In the example below we use ROW_NUMBER to replace the subject name with an id value which is a foreign key reference into the subjects table.
set schema yorschema; create table facts_origin (subject varchar, predicate varchar, object varchar); create table subjects (id int, name varchar not null, primary key(id) ); create table facts (subject int, predicate varchar, object varchar, foreign key (subject) references subjects(id) ); insert into facts_origin values ('Peter', 'Brother', 'Sarah'), ('Peter', 'Brother', 'Anne'), ('John', 'Father', 'Peter'); insert into subjects select with ids(name) as (select distinct subject from facts_origin) select row_number() over (order by name) as "id", name from ids; insert into facts select s.id, f.predicate, f.object
from facts_origin f, subjects s where f.subject=s.name; select * from subjects; select * from facts;
Which gives a subjects table of: | And a facts table of: | ||||||||||||||||||
|
|
Kognitio’s generate_key syntax.
For an individual SELECT statement the GENERATE_KEY function generates a unique key for each row. These rows may then be inserted into a table.
The result is an INT8 data type with a value greater than or equal to zero. The results are not typically contiguous, but are guaranteed to be unique for an individual SELECT statement. The values generated are dependent upon the number of Kognitio nodes and the distribution of the data. Rerunning a query may not generate the same results. Multiple occurrences of GENERATE_KEY in a SELECT list will all produce the same result within a single row.
GENERATE_KEY can only be used in the SELECT list, it cannot be used in WHERE, HAVING, GROUP BY or ORDER BY clauses. Within the SELECT list you can perform arithmetic on the columns containing the GENERATE_KEY function.
insert into subjects select with ids(name) as (select distinct subject from facts_origin) select generate_key(), name from ids;
The Kognitio SQL Guide has examples of good practice for ensuring that addition INSERT statements have unique keys and how to use them in ETL processes.
A more complete example
The following example shows many techniques, it uses both row_number and generate_key functions and illustrates how to use them to add entries to existing tables.
It also illustrates the use of KogScript functions, KogScript variables and a more complex “Grandparent” query.
set schema yourschema; -- drop tables, they may have been created by previous runs of the script drop table facts_origin; drop table facts; drop table entities; drop table preds; -- create tables to use create table facts_origin (subject varchar, predicate varchar, object varchar); create table entities (id int, name varchar not null, primary key(id) ); create table preds (id int, name varchar not null, primary key(id) ); create table facts (subject int, predicate int, object int, foreign key (subject) references entities(id), foreign key (predicate) references preds(id), foreign key (object) references entities(id) ); -- add some facts, a small genealogy insert into facts_origin values ('Peter', 'Brother', 'Sarah'), ('Peter', 'Brother', 'Anne'), ('John', 'Father', 'Peter'); insert into facts_origin values ('Peter', 'Father', 'James'), ('Jane', 'Mother', 'Peter'), ('Mary', 'Mother', 'John'); -- create the entities table insert into entities select with ents(name) as (select distinct subject from facts_origin union select object from facts_origin) select row_number() over (order by name) as "id", name from ents; -- create the preds table insert into preds select with ids(name) as (select distinct predicate from facts_origin) select generate_key(), name from ids; -- create the facts table, this uses references to the entities and preds tables insert into facts select s.id, p.id, o.id from facts_origin f, entities s, preds p, entities o where f.subject=s.name and f.predicate=p.name and f.object=o.name; -- check the tables are what we expect select * from entities; select * from preds; select * from facts; -- Query for all facts where 'Peter' is the subject with -- select the subject id s(id) as (select id from entities where name='Peter'), -- Filter the facts table for that subject id f(subject, predicate, object) as (select subject, predicate, object from facts, s where subject in (s.id) ) -- Then join the filter results with the subjects table to get the subject name select s.name as "SUBJECT", p.name, o.name as "OBJECT" from f, entities s, preds p, entities o where f.subject=s.id and f.predicate=p.id and f.object=o.id; -- Use a function to add entities if they do not already occur in the entities table. function addEntity(p) local t = select id from entities where name='$p'; if #t.rows~=1 then insert into entities select mk + 1, '$p' from (select max(id) from entities) dt(mk); t = select id from entities where name='$p'; assert(#t.rows==1) end return t.rows[1][1]; end -- Use a function to add predicates if they do not already occur in the preds table. function addPred(p) local t = select id from preds where name='$p'; if #t.rows~=1 then insert into preds select generate_key() + mk + 1, '$p' from (select max(id) from preds) dt(mk); t = select id from preds where name='$p'; assert(#t.rows==1) end return t.rows[1][1]; end -- adding facts function addFact(s,p,o) local sub = addEntity(s) – use local to avoid polluting the global namespace local pred = addPred(p) local obj = addEntity(o) local t = select count(*) as "count" from facts where subject=$sub and predicate=$pred and object=$obj; if #t.rows~=1 or t.rows[1].count==0 then insert into facts values ($sub,$pred,$obj); end end addFact('Garry', 'Father', 'Mary') -- grandparent view. parent = addPred('Parent') grandparent = addPred('Grandparent') -- Now form the query as a view, so that it may be reused easily create view grandparent select with -- Define the father_of subquery father_of(s,p,o) as (select subject, predicate, object from facts where predicate in (select id from preds where name='Father') ), -- Define the mother_of subquery mother_of(s,p,o) as (select subject, predicate, object from facts where predicate in (select id from preds where name='Mother') ), -- Define the parent_of subquery, this uses the Parent predicate id we have defined above parent_of(s,p,o) as (select s,$parent,o from father_of union select s,$parent,o from mother_of), -- Define the grandparent_of subquery, this uses the Grandparent predicate id we -- have defined above grandparent_of(s,p,o) as (select p.s, $grandparent, c.o from parent_of p, parent_of c where p.o=c.s) -- Now output the result in a human readable form select s.name as "SUBJECT", p.name as "PREDICATE", o.name as "OBJECT" from grandparent_of f, entities s, preds p, entities o where f.s=s.id and f.p=p.id and f.o=o.id; -- Now we can perform various queries -- Who are John's grandchildren? select object as "Grandchild" from grandparent where subject='John'; -- Who are Peter's grandparents? select subject as "Grandparent" from grandparent where object='Peter'; -- Is Jane the grandparent of James? select 'Yes' from grandparent where subject='Jane' and object='James';
And here is the entity diagram for the facts table
Summary
Various databases have different ways of creating unique values in a field. The Kognitio Analytical Platform has two methods GENERATE_KEY and ROW_NUMBER, both of which generate unique values in a subquery that may then be inserted into a table, giving a column in the table with a unique value as a result.