Forum

General topics applicable to all languages
Contributor
Offline
User avatar
Posts: 386
Joined: Thu May 23, 2013 4:48 pm

example reverse script errors when applied twice

by markc » Fri Dec 06, 2013 11:44 am

The reverse script in the examples/exscripts.sql file of the release gives ET010F when called twice on data from e.g. SYS.IPE_BASE like this:

EXTERNAL SCRIPT reverse FROM (EXTERNAL SCRIPT reverse FROM (SELECT substring(text FROM 1 FOR 80) from IPE_BASE));
ET010F: Invalid external data record, see SYS.IPE_CONV_ERROR for details (tno 1174)

When I look in IPE_CONV_ERROR I see entries like this:

NAME SESSION TNO STMT USER_ID TABLE_ID MPID EDATE ETIME FILENAME RECORD_FILE_OFFSET RECORD_NO COLUMN_NO RECORD_CHAR_POS ECODE MESSAGE RECORD
16 1174 4 0 (null) 41 2013-12-05 14:17:29 (null) (null) 0 -1 1162412564 IE0214: Invalid string representation 435245415445205441424C4520353225323253595335322532322E35322532324950455F5245464D4F44455F544558543532253232202835322532325550444154454D4FADC3323220494E544547455235322532432035322532325445585435322532322056415243484152283332290A
16 1174 4 0 (null) 20 2013-12-05 14:17:29 (null) (null) 0 -1 1162412564 IE0214: Invalid string representation 435245415445205441424C452035322532324F448BC332322E35322532324950455F54595045494E464F3532253232202835322532324C4F43414C45353225323220494E5445474552204E4F54204E554C4C3532253243203532253232545950455F4E414D4535322532322056415243480A
...

Why is this going wrong?
Reply with quote Top
Contributor
Offline
User avatar
Posts: 386
Joined: Thu May 23, 2013 4:48 pm

Re: example reverse script errors when applied twice

by markc » Fri Dec 06, 2013 11:51 am

The problem is caused because we use percent-encoding by default when passing data between the server and an external script in either direction. The reason we use percent-encoding rather than CSV-style quoting when outputting data records for external scripts is that it's easier to parse a record if you can use the Perl or Python split function rather than having to use a CSV module or write your own de-quoting code. So percent-encoding provides us with a simple way of ensuring that the field separator, quote character or record terminator do not appear anywhere in the record. The output of the script is expected to be encoded in the same way.

So if you give the REVERSE script a string like this:

foobar"

This gets encoded as:

foobar%22

The Python script reverses that, so it outputs this:

22%raboof

And because the two characters following the % character aren't both legal hex digits, it's not taken as a percent encoding sequence and is taken literally. So you get 22%raboof as output.

However, if you give the REVERSE script the following input:

14"

This gets encoded as:

14%22

The Python script reverses it, like so:

22%41

Then the server takes the %41 to mean "the character with codepoint 0x41", which is A. So what you get is this:

22A

The reason you were getting unusual characters by reversing the contents of TEXT in IPE_BASE was that all column names are quoted, and the quotes get percent-encoded, so there's lots of potential to trip over this. Take the example:

UN TON REGETNI 22%DI_RETSULC22%( 22√ěTNARG_SKCOL_ROX_EPI22%.22%SYS22% ELBAT YLNO MAR ETAERC

The name of the table is IPE_XOR_LOCKS_GRANTED. Since a double quote follows GRANTED in the original create text, when that gets encoded and then reversed it looks like 22%DETNARG, and when that's loaded back in it sees the %DE and happily converts it to a lowercase thorn for you. Then when you try to reverse the result of that, by applying the script to its own output, Python reverses the two bytes that make up that character and that makes an invalid sequence.

You probably don't want to rewrite the Python script so it identifies percent-encoding sequences and refrains from reversing those parts of the string. The solution is that in this case, since your script isn't picking the record apart into individual fields, it's better for it to be quoted like ordinary CSV, which can be reversed without breaking it, than for it to be percent-encoded. You can do that by supplying the appropriate input and output attribute values when you create the script environment:

create script environment pythonnotencoded
command '/usr/bin/python'
input 'fmt_percent_encode 0'
output 'fmt_percent_encode 0';

create external script reversenotencoded
environment pythonnotencoded
RECEIVES(text VARCHAR)
SENDS(textout VARCHAR)
SCRIPT '
import sys
for line in sys.stdin:
print line[0:-1][::-1]
';

external script reversenotencoded from (select 'GRANTED"')

Result:
"DETNARG

However, note that this still won't help if you feed multibyte characters into your script - this will still cause the script to output invalid UTF-8 sequences.
Reply with quote Top

Who is online

Users browsing this forum: No registered users and 1 guest

cron