Changing RAM requirementsΒΆ

By default Kognitio external scripts are allocated 150MB of address space per invocation but they can be created or invoked (if using anonymous scripts) with a specified amount of memory.

The lower the amount of memory allocated to the script, the more copies of the script Kognitio can run in parallel but the script obviously needs enough memory to do whatever processing is required.

To determine the amount of memory to allocate to the script we can allocate more memory than the script needs and then measure the peak usage at the end of the script. To do this we look in the system file /proc/<pid>/status which contains information about the process. The status element we are interested in is VmPeak which records the peak memory usage of the process.

The anonymous external script below loads some basic libraries and outputs the peak usage so far and it then allocates a large array to show how this affects the memory usage. The script prints the information which will then appear in the results and it also writes it to stderr which will appear in the sys.ipe_script_debug table. When using this technique in your own script, just use the write to stderr to avoid putting the information in your results.

EXTERNAL SCRIPT USING ENVIRONMENT local_python
SENDS(mem_usage varchar)
OUTPUT 'fmt_field_separator "~"'
LIMIT 1 THREADS
REQUIRES 300 MB RAM
SCRIPT S'python(
import os, re, math, sys
#
# function to return the minimum amount of memory that needs to
# be allocated in the external script REQUIRES clause
#
def getMinimumRequires():
  pid = os.getpid()
  with open("/proc/{}/status".format(pid)) as f:
    for line in f:
      if(line.startswith("VmPeak")):
        kb = re.sub(r".*:\s*(\d*)\s.*\s", r"\1", line)
        return(int(math.ceil(float(kb) / 1024)))

print("With minimum libraries {}".format(getMinimumRequires()))
sys.stderr.write("With minimum libraries {}\n".format(getMinimumRequires()))

# allocate a 100M element array
buckets=[0]*10000000

print("After allocating array {}".format(getMinimumRequires()))
sys.stderr.write("After allocating array {}\n".format(getMinimumRequires()))
)python';

If you run the above script you will obtain results like:

MEM_USAGE
With minimum libraries = 24 MB
After allocating array = 101 MB

For this script to run successfully, the REQUIRES n MB RAM clause must be set to a minimum of 101. If it is set to less than this the line where the buckets array is allocated will fail.

EXTERNAL SCRIPT USING ENVIRONMENT local_R
SENDS(mem_usage varchar)
OUTPUT 'fmt_field_separator "~"'
LIMIT 1 THREADS
REQUIRES 300 MB RAM
SCRIPT S'EOF(
getMinimumRequires = function(filepath) {
  pid = Sys.getpid()
  filename = sprintf("/proc/%d/status", pid)
  con = file(filename, "r")
  while ( TRUE ) {
    line = readLines(con, n = 1)
    if ( length(line) == 0 ) {
      break
    }
    pattern <- "(.*):\\s*(\\d*).*"
    if(identical(gsub(pattern, "\\1", line, perl=TRUE), "VmPeak")) {
       return(ceiling(as.numeric(gsub(pattern, "\\2", line, perl=TRUE)) / 1024))
    }
  }
  close(con)
}
print(sprintf("With minimum libraries %d MB", getMinimumRequires(filename)))
write(sprintf("With minimum libraries %d MB", getMinimumRequires(filename)), stderr())

result <- vector(mode = "list", 10000000)

print(sprintf("After allocating vector = %d MB", getMinimumRequires(filename)))
write(sprintf("After allocating vector = %d MB", getMinimumRequires(filename)), stderr())
)EOF';

If you run the above script you will obtain results like:

MEM_USAGE
With minimum libraries = 166 MB
After allocating array = 240 MB

For this script to run successfully, the REQUIRES n MB RAM clause must be set to a minimum of 240. If it is set to less than this the line where the buckets array is allocated will fail.

You can obtain the output from stderr using the SQL query:

SELECT *
FROM sys.ipe_script_debug
WHERE tno = (SELECT max(tno)
             FROM sys.ipe_script_debug)
ORDER BY pid, seq;

To get the minimum memory requirement for your script you should copy the function getMinimumRequires into your script and put the line that writes the result of getMinimumRequires() function call at the end of the program.

The above example is very simple and will always use the same amount of memory. Your script may use different amounts depending on how it processes data and if this is the case, you can use this technique to get a baseline value and increase it by a suitable margin. You can then examine the values over time to determine a safe minimum value.