Forum

Information and discussion related to the Kognitio on Hadoop product
Contributor
Offline
User avatar
Posts: 384
Joined: Thu May 23, 2013 4:48 pm

problem handling NULL values in HDFS external table

by markc » Mon Dec 05, 2016 9:57 am

I'm creating external tables on top of HDFS data.

NULL values in the data are causing errors, which I can then see in SYS.IPE_CONV_ERROR. What can I do to fix this?
Reply with quote Top
Contributor
Offline
User avatar
Posts: 384
Joined: Thu May 23, 2013 4:48 pm

Re: problem handling NULL values in HDFS external table

by markc » Mon Dec 05, 2016 10:01 am

More information on any errors you are seeing should be visible in the external table data conversion logging table and you can see this by running the following SQL, which it sounds like you've already found:

select * from sys.ipe_conv_error

The error message for each operation should contain a transaction number which corresponds to the tno column in the above object.

By default the software expects NULL values in a column to be blank (e.g. 1,2,,4) to load 1, 2, NULL, 4. Some products use a null string instead to represent a null value and this will cause a conversion error. The external table wizard in Kognitio console isn't currently able to define external tables like this but you can manually create an external table which will work using SQL. You can see the external table definition in use for an existing object by looking at the create statement in the external table creation wizard, by looking at the create text in an existing external table using Kognitio console or by running the SQL 'explain <external table name>'.

To define an external table which uses a custom null string you need to set the fmt_null_value attribute to an appropriate value in the target string portion of the create statement. The target string is a comma separated list of 'attribute value' pairs. You also need to be careful that the data type of the column containing the custom null is correct. Kognitio console will sometimes autodetect a column with custom nulls as a string instead of using the native column type.

So if, for example, I want to connect to CSV that looks like this:

1,3,6,NULL,7
1,2,3,4,7

I can do so with an external table creation statement like this:

CREATE EXTERNAL TABLE "ANDY"."CSVFILE" (
"C1" INTEGER,
"C2" INTEGER,
"C3" INTEGER,
"C4" INTEGER,
"C5" INTEGER) FROM "HDFS" TARGET 'file "/user/andy/csvfile.txt", fmt_null_value "NULL"'
Reply with quote Top

Who is online

Users browsing this forum: No registered users and 1 guest

cron