As a database tester, sometimes there’s a need to create tables with randomised data for testing. This could be because there are security and privacy issues preventing the use of production data, or it could help in testing for robustness against unusual characters, integer ranges and so on.  There are various ways to achieve this. One flexible method is to use Kognitio’s external scripting feature.

External scripts are Kognitio’s way of passing data to and from other Linux environments such as Python, Bash and R, described in section 10 of the Kognitio Guide (you can download the documentation from http://www.kognitio.com/forums/viewtopic.php?f=2&t=3/).

If external scripting for Python is already enabled in a Kognitio system, then the following SQL will create a table with randomised data. The SQL and embedded Python code can be modified to specify the number and type of columns, the number of rows, and any desired ranges or characteristics for the data.


create table t4 as select * from
(external script using environment python
sends (a int, b varchar, c date, d timestamp) -- these fields must be consistent with the fields in each row returned by Python
limit 1 threads -- remove this setting to produce larger amounts of data, but be aware of how many threads will run and how the data size will multiply!
script '
import csv, sys, random, string, time
csvwriter = csv.writer(sys.stdout)
for i in range(100): # we want 100 rows of random data
          row = []
          # first field should be an integer between 100 and 1000 (or bigint, tinyint):
          row.append( random.randrange( 100, 1000 ) )
          # then a string field of length 20(varchar, char, etc):
          row.append( "".join(random.SystemRandom().choice(string.ascii_uppercase + string.digits) for _ in range(20)))
          # a date field:
row.append(time.strftime("%Y-%m-%d", time.localtime(random.randrange(1000000000, 1150000000))))
          # and a timestamp field:
row.append(time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(random.randrange(1000000000, 1150000000))))
          # send the finished row back to Kognitio:
csvwriter.writerow(row)
'
) fd

 

 

The above SQL produces a table with rows like:

468, 6O6SZVX2NQVWT0VYV69O, 2004-03-09, 2004-10-29, 17:28:35.000000

In the event of errors, it may be helpful to review section 10.4 of the Kognitio Guide, or contact Kognitio support.