Ordering Input DataΒΆ

Ordering is generally used in conjunction with partitioning in Kognitio external scripts to ensure the data within a given partition is passed to the script invocation in a specified order. This means the code within the script does not have to perform any sorting of its own.

Kognitio is extremely fast at data manipulation due to its use of in-memory massively parallel processing. Therefore it makes sense to do any data sorting that is required prior to passing the data to any external script process.

When the ORDER BY order-column(s) clause is utilized in the external script syntax on its own (without partitioning), it will ensure that the data from the local ram-store that forms the input stream will be ordered before being received by the corresponding script invocation.

When ORDER BY order-column(s) is used in conjunction with PARTITION BY part-column(s) the data forming the input into the external script is simply sorted by part-columns(s),order-column(s) prior to be received by the script invocations. This ensures that the input data rows are received in the correct partitions and the rows within the partitions are ordered correctly.

It is important to remember that Kognitio is a parallel system with queries made up of a number of processes. This means even when results appear to be fully ordered, this does not guarantee that the next time the process is run that the order of results will be identical unless the SQL syntax uses an ORDER BY clause on fields that uniquely describe each result row. When using ORDER BY in external scripts the same logic applies. For an exactly repeatable order of data to be received by a script invocation the ORDER BY clause in the external script syntax must uniquely describe a row of script input data within the partition rows.