Blogs

How to generate an identity / auto_increment column in Kognitio

Occasionally we would like the value of the primary key field to be created automatically every time a new record is inserted. That is we would like an auto-incrementable field in a table.

What other databases do

Other databases use a variety of syntax to do this.

MySQL uses an AUTO_INCREMENT keyword, which is used within the table definition

CREATE TABLE fruit (
         ID int NOT NULL AUTO_INCREMENT,
         name varchar(255) NOT NULL,
         PRIMARY KEY (ID) );

 

The SQL Server syntax uses the IDENTITY keyword, again within the table definition, in this example starting at 1 and incrementing by 1.

CREATE TABLE city (
         ID int IDENTITY(1,1) PRIMARY KEY,
         name varchar(255) NOT NULL,
         PRIMARY KEY (ID) );

 

For Oracle you first need to create a sequence and then use its nextval attribute during inserts.

CREATE SEQUENCE seq_people
        MINVALUE 1
        START WITH 1
        INCREMENT BY 1
        CACHE 10;
INSERT INTO People (ID,FirstName,LastName)
VALUES (seq_people.nextval,'Mike','Atkinson');

 

Kognitio’s row_number syntax.

Kognitio does not have a way of creating a unique value for a field directly in the table definition, instead it has two ways of adding unique values to resultsets; and then these resultsets may be inserted into a table. The first uses the ROW_NUMBER function.

The ROW_NUMBER function assigns to each row in the partition or query a sequence number starting from one. This is only a number used in the context of the result set, if the result changes, the ROW_NUMBER will change. The ROW_NUMBER expression takes an ORDER BY statement with the column to be used for the row count with an OVER operator.

In the example below we use ROW_NUMBER to replace the subject name with an id value which is a foreign key reference into the subjects table.

set schema yorschema;
create table facts_origin (subject varchar, predicate varchar, object varchar);
create table subjects (id int, name varchar not null, primary key(id) );
create table facts (subject int, predicate varchar, object varchar, 
               foreign key (subject) references subjects(id) );

insert into facts_origin values ('Peter', 'Brother', 'Sarah'), 
                                                           ('Peter', 'Brother', 'Anne'), 
                                                           ('John', 'Father', 'Peter');


insert into subjects
select with
    ids(name) as (select distinct subject from facts_origin)
select row_number() over (order by name) as "id", name  from ids;

insert into facts
    select s.id, f.predicate, f.object
        from facts_origin f, subjects s
        where f.subject=s.name;

select * from subjects;
select * from facts;
Which gives a subjects table of: And a facts table of:
ID NAME
1 John
2 Peter
SUBJECT PREDICATE OBJECT
1 Father Peter
2 Brother Anne
2 Brother Sarah

 

Kognitio’s generate_key syntax.

For an individual SELECT statement the GENERATE_KEY function generates a unique key for each row. These rows may then be inserted into a table.

The result is an INT8 data type with a value greater than or equal to zero. The results are not typically contiguous, but are guaranteed to be unique for an individual SELECT statement. The values generated are dependent upon the number of Kognitio nodes and the distribution of the data. Rerunning a query may not generate the same results. Multiple occurrences of GENERATE_KEY in a SELECT list will all produce the same result within a single row.

GENERATE_KEY can only be used in the SELECT list, it cannot be used in WHERE, HAVING, GROUP BY or ORDER BY clauses. Within the SELECT list you can perform arithmetic on the columns containing the GENERATE_KEY function.

insert into subjects
select with
    ids(name) as (select distinct subject from facts_origin)
select generate_key(), name  from ids;

The Kognitio SQL Guide has examples of good practice for ensuring that addition INSERT statements have unique keys and how to use them in ETL processes.

 

A more complete example

The following example shows many techniques, it uses both row_number and generate_key functions and illustrates how to use them to add entries to existing tables.

It also illustrates the use of KogScript functions, KogScript variables and a more complex “Grandparent” query.

set schema yourschema;

-- drop tables, they may have been created by previous runs of the script
drop table facts_origin;
drop table facts;
drop table entities;
drop table preds;

-- create tables to use
create table facts_origin (subject varchar, predicate varchar, object varchar);
create table entities (id int, name varchar not null, primary key(id) );
create table preds (id int, name varchar not null, primary key(id) );
create table facts (subject int, predicate int, object int,
               foreign key (subject) references entities(id),
               foreign key (predicate) references preds(id),
               foreign key (object) references entities(id) );

-- add some facts, a small genealogy
insert into facts_origin
     values ('Peter', 'Brother', 'Sarah'), ('Peter', 'Brother', 'Anne'), ('John', 'Father', 'Peter');
insert into facts_origin
     values ('Peter', 'Father', 'James'), ('Jane', 'Mother', 'Peter'), ('Mary', 'Mother', 'John');

-- create the entities table
insert into entities
select with
    ents(name) as (select distinct subject from facts_origin union select object from facts_origin)
select row_number() over (order by name) as "id", name  from ents;


-- create the preds table
insert into preds
select with
    ids(name) as (select distinct predicate from facts_origin)
select generate_key(), name  from ids;

-- create the facts table, this uses references to the entities and preds tables
insert into facts
    select s.id, p.id, o.id
        from facts_origin f, entities s, preds p, entities o
        where f.subject=s.name and f.predicate=p.name and f.object=o.name;

-- check the tables are what we expect
select * from entities;
select * from preds;
select * from facts;

-- Query for all facts where 'Peter' is the subject
with
  -- select the subject id
  s(id) as (select id from entities where name='Peter'),
  -- Filter the facts table for that subject id
  f(subject, predicate, object) as (select subject, predicate, object from facts, s 
                        where subject in (s.id) )
  -- Then join the filter results with the subjects table to get the subject name
select s.name as "SUBJECT", p.name, o.name as "OBJECT"
    from f, entities s, preds p, entities o 
    where f.subject=s.id and f.predicate=p.id and f.object=o.id;


-- Use a function to add entities if they do not already occur in the entities table.
function addEntity(p)
   local t = select id from entities where name='$p';
   if #t.rows~=1  then
       insert into entities
           select mk + 1, '$p' from (select max(id) from entities) dt(mk);
                 t = select id from entities where name='$p';
       assert(#t.rows==1)
   end
   return t.rows[1][1];
end

-- Use a function to add predicates if they do not already occur in the preds table.
function addPred(p)
   local t = select id from preds where name='$p';
   if #t.rows~=1  then
       insert into preds
           select generate_key() + mk + 1, '$p' from (select max(id) from preds) dt(mk);
                 t = select id from preds where name='$p';
       assert(#t.rows==1)
   end
   return t.rows[1][1];
end


-- adding facts
function addFact(s,p,o)
    local sub = addEntity(s) – use local to avoid polluting the global namespace
    local pred = addPred(p)
    local obj = addEntity(o)
    local t = select count(*) as "count" from facts
                               where subject=$sub and predicate=$pred and object=$obj;
    if #t.rows~=1 or t.rows[1].count==0 then
         insert into facts values ($sub,$pred,$obj);
    end

end

addFact('Garry', 'Father', 'Mary')


-- grandparent view.

parent = addPred('Parent')
grandparent = addPred('Grandparent')

-- Now form the query as a view, so that it may be reused easily
create view grandparent
select with
    -- Define the father_of subquery
    father_of(s,p,o) as  (select subject, predicate, object from facts
                 where predicate in (select id from preds where name='Father') ),
    -- Define the mother_of subquery
    mother_of(s,p,o) as (select subject, predicate, object from facts
                 where predicate in (select id from preds where name='Mother') ),
    -- Define the parent_of subquery, this uses the Parent predicate id we have defined above
    parent_of(s,p,o) as  (select s,$parent,o from father_of union
                 select s,$parent,o from mother_of),
    -- Define the grandparent_of subquery, this uses the Grandparent predicate id we 
    -- have defined above
    grandparent_of(s,p,o) as (select p.s, $grandparent, c.o from parent_of p, parent_of c where p.o=c.s)
-- Now output the result in a human readable form
select s.name as "SUBJECT", p.name as "PREDICATE", o.name as "OBJECT"
    from grandparent_of f, entities s, preds p, entities o 
    where f.s=s.id and f.p=p.id and f.o=o.id;


-- Now we can perform various queries
-- Who are John's grandchildren?
select object as "Grandchild" from grandparent where subject='John';

-- Who are Peter's grandparents?
select subject as "Grandparent" from grandparent where object='Peter';

-- Is Jane the grandparent of James?
select 'Yes' from grandparent where subject='Jane' and object='James';

And here is the entity diagram for the facts table

 

Summary

Various databases have different ways of creating unique values in a field. The Kognitio Analytical Platform has two methods GENERATE_KEY and ROW_NUMBER, both of which generate unique values in a subquery that may then be inserted into a table, giving a column in the table with a unique value as a result.

Leave a Reply

Your email address will not be published nor used for any other purpose. Required fields are marked *