Forum

Discussions specific to version 8.1
Contributor
Offline
Posts: 184
Joined: Wed May 29, 2013 2:10 pm
Location: Bracknell

Using Hadoop in Kognitio Console

by MikeAtkinson » Thu May 30, 2013 2:51 pm

My recent posts have shown how to use scripts and external tables to allow browsing of system files and other databases. This post will show how to do the same thing with HADOOP.

The first thing which needs to be done is to install a Java JRE (we use the IBM JRE), then compile and install libhdfs, this is fairly complicated and not the subject of this post.

Now it should be possible to start the HADOOP plugin and set it active:

Code: Select all

create module hadoop mode active;
Then create the HDFS connector:

Code: Select all

create connector hdfscon source hdfs target 'namenode "100.0.0.12:9000"';
Now we may use a query to do a directory listing. Where %1 is replaced by the HADOOP path:

Code: Select all

select * from (external table from hdfscon target 'list "%1" ')et; 
And an anonymous external table to list the contents of a file on HADOOP, where %1 is replaced by the file path:

Code: Select all

select * from (external table (text varchar(32000)) from hdfscon target 'file "%1", format "%0c\n" ')et;
Using a similar external tables it is possible to browse the contents of HADOOP from Kognitio Console connected through a Kognitio Analytics Platform instance.

I've put together an external data source browser for Console. An admin or user with write privileges to a new system table SYS.IPE_ALLEXTERNAL_DATA_SOURCE would insert the following into that table. Those users who also have "connect" privilege on the HDFSCON connector would be able to browse Hadoop.

Code: Select all

insert into sys.ipe_allexternal_data_source values(
    'hadoop',
    'LS',
    ( select id from sys.ipe_external_connector where name='HDFSCON' ),
    NULL,
    'select * from (external table from hdfscon target ''list "%1" '')et;',
    'CONNECTOR_TYPE hadoop'
);    
insert into sys.ipe_allexternal_data_source values(
    'hadoop',
    'SAMPLE',
    ( select id from sys.ipe_external_connector where name='HDFSCON' ),
    NULL,
    'select * from (external table (%1) from hdfscon target ''file "%2", %3 '')et fetch first 100 rows only;',
    ''
);    
insert into sys.ipe_allexternal_data_source values(
    'hadoop',
    'CAT',
    ( select id from sys.ipe_external_connector where name='HDFSCON' ),
    NULL,
    'select * from (external table (text varchar(32000)) from hdfscon target ''file "%1", %2 '')et;',
    'MENU "view file"
     ACTION "sample_view" '
);


I've created a Wizard to take the hard work out of creating data source that may be used by Console's External Data Browser.

Using the sys.ipe_external_data_source view Kognitio Console presents a list of available data sources to the user. The user may then browse them and create external tables based on those data sources.

In the screenshot below, Kognitio Console is browsing "hadoop" connected through the HDFSCON connector. The contents of data files on HADOOP may be viewed, in this case the file /user/andy/testfile

Image
Reply with quote Top
Contributor
Offline
User avatar
Posts: 48
Joined: Tue May 28, 2013 1:44 pm

Re: Using Hadoop in Kognitio Console

by anonymous2 » Fri Jun 14, 2013 12:52 pm

Thanks Mike, very helpful.
Reply with quote Top

Who is online

Users browsing this forum: No registered users and 1 guest

cron