Analytic Functions Overview

Using analytic functions in your business Intelligence queries provides the following benefits:

  • Improved query processing — using these functions results in better performance, because the system no longer must perform complex procedural processing and instead can perform simple SQL queries.

  • Enhanced productivity — you can perform complex analysis with clearer, more concise code. The code is quicker to formulate and easy to maintain.

  • Standardized syntax — because these functions are part of the ANSI standard, they are supported in many software packages.

Analytic functions compute an aggregate value based on a group of rows. The group of rows is called a window and is defined by the analytic clause. Analytic functions differ from aggregate functions in that while an aggregate returns one row for each group, analytic functions return all rows in the window.

For each row, a “sliding” window of rows is defined. The window determines the range of rows used to perform the calculations for the “current row”. Window sizes can be based on either a physical number of rows or a numeric interval.

Analytic functions are the last set of operations performed in a query except for the final ORDER BY clause. All joins and all WHERE, GROUP BY, and HAVING clauses are completed before the analytic functions are processed. Therefore, analytic functions can appear only in the SELECT list or ORDER BY clause.

Example Data Set

All analytical function examples are based on the following table and data:

CREATE TABLE calls (
    subscriber VARCHAR(20), -- Calling number
    cdate DATE, -- Date of call
    ctime TIME, -- Time of call
    csecs INTEGER, -- Length of call
    ccost DECIMAL(6,2), -- Cost of call
    calledno VARCHAR(20)) -- Called number

SELECT * FROM CALLS

SUBSCRIBER

CDATE

CTIME

CSECS

CCOST

CALLEDNO

01277824068

2007-09-14

15:32:57

12

0.20

01344300770

01285720653

2007-09-17

15:34:58

200

0.50

01344300770

01285720653

2007-09-18

15:36:58

10

0.20

01344300770

01277824068

2007-09-18

15:38:58

127

0.50

01285720653

01344300770

2007-09-19

15:41:58

110

0.25

01277824068

01277824068

2007-09-14

15:31:57

120

0.25

01344300770

01277824068

2007-09-17

15:33:57

20

0.20

01344300770

01285720653

2007-09-17

15:35:58

1

0.20

01344300770

01285720653

2007-09-18

15:37:58

242

0.50

01344300770

01277824068

2007-09-19

15:39:58

414

0.75

01285720653

Analytical Function Syntax

The Syntax of Ranking and Windowing functions is:

analytic-function ( [value-expr] [,...] )
OVER ( query-partition-clause
       order-by-clause
       windowing-clause)

analytic-function

Specify the name of an analytic function, Kognitio provides many analytic functions such as AVG, COUNT, DENSE_RANK, FIRST, LAG, LAST, LEAD, MAX, MIN, NTILE, RATIO_TO_REPORT, RANK, ROW_NUMBER, STDDEV, SUM, VARIANCE.

query-partition-clause

[PARTITION BY column-reference [,...]]

The PARTITION BY clause logically breaks a single result set into N partitions, according to the criteria set by the partition expressions. The analytic functions are applied to each partition independently; they are reset for each partition.

order-by-clause

ORDER BY {value-expression [ASC | DESC]} [,...]

The ORDER BY clause specifies how the data is sorted within each group (partition).

windowing-clause

[ROWS | RANGE
{window-start} |
{BETWEEN window-bound AND window-bound}]

where window-start is:

UNBOUNDED PRECEDING |
literal PRECEDING |
CURRENT ROW |

and window-bound is:

UNBOUNDED PRECEDING |
literal PRECEDING |
CURRENT ROW |
literal FOLLOWING |
UNBOUNDED FOLLOWING

The windowing clause defines a sliding or anchored window of data, on which the analytic function will operate, within a group. This clause can be used to have the analytic function compute its value based on any arbitrary sliding or anchored window within a group.

The default window is an anchored window that simply starts at the first row of a group and continues to the current row.

Windows can be specified using one of two criteria: RANGES of data values or ROWS offset from the current row. The existence of an ORDER BY in an analytic function will add a default window clause of RANGE UNBOUNDED PRECEDING, i.e. includes all rows in our partition with a value equal to or less than our value specified by the ORDER BY clause.

Note: To simply get all previous rows before you in the window that utilizes an ORDER BY clause you need to use ROW UNBOUNDED PRECEDING.

Example 1: Using Row Windows to obtain moving averages

Row Windows specify the physical number of rows to include in the window. The following example uses a sliding window within a group and computes the sum of the current row’s CCOST column plus the previous 2 rows in that group:

SELECT subscriber,
       ccost,
       CAST(AVG(ccost) OVER (
               PARTITION BY subscriber
               ORDER BY ccost ROWS 2 PRECEDING)
            AS DEC(5,3)) moving_avg
FROM calls
ORDER BY 1, 2, 3;

SUBSCRIBER

CCOST

MOVING_AVG

01277824068

0.20

0.200

01277824068

0.20

0.200

01277824068

0.25

0.217

01277824068

0.50

0.317

01277824068

0.75

0.500

01285720653

0.20

0.200

01285720653

0.20

0.200

01285720653

0.50

0.300

01285720653

0.50

0.400

01344300770

0.25

0.250

The PARTITON BY clause means the AVG (CCOST) is computed for each subscriber. The AVG (ccost) is ‘ reset ‘ as the subscriber changes. The ORDER BY CCOST clause sorts the data for each subscriber by CCOST; this allows the window clause: ROWS 2 PRECEDING, to access the 2 rows prior to the current row in a group in order to average the costs.

CAST is used to round the moving average to 3 decimal places.

With ROW partitions the data may be of any type and the order by may include many columns.

Example 2: Using Range Windows to count similar records

Range windows collect rows together based on a WHERE clause. For example RANGE 5 PRECEDING will generate a sliding window that has the set of all rows in the group such that they are equal to or within 5 units preceding the value of the current row. These units must be numeric comparisons and it is not valid to use RANGE with data types other than numbers.

The following example counts the calls with a cost that are less than (or equal to) but within £0.50 of the cost of the current call. The range window goes back £0.50 from the current row’s call cost and then counts the rows within this range.

Note: when another row has the same value as the current row it will be included within the range and therefore the count, even if it appears after the current row in the results:

SELECT subscriber,
       ccost,
       COUNT(*) OVER(
           PARTITION BY SUBSCRIBER
           ORDER BY CCOST ASC RANGE 0.5 PRECEDING) Within_50p
FROM CALLS
ORDER BY SUBSCRIBER, CCOST ASC;

SUBSCRIBER

CCOST

WITHIN_50P

01277824068

0.20

2

01277824068

0.20

2

01277824068

0.25

3

01277824068

0.50

4

01277824068

0.75

3

01285720653

0.20

2

01285720653

0.20

2

01285720653

0.50

4

01285720653

0.50

4

01344300770

0.25

1

The PARTITON BY clause means the range and count are applied within each subscriber separately. The ORDER BY CCOST clause sorts the data for each subscriber by CCOST; this allows the window clause: RANGE 0.5 PRECEDING, to access the rows prior to the current row and check if they are within 50p of the cuurent row.

Example 3: Running Totals and Sub-Totals

This example shows how to calculate a “Running Total” for the entire query. This is done using the entire ordered result set. In addition we compute a running sub-total for each subscriber:

SELECT subscriber, cdate, ctime, ccost,
    SUM(ccost) OVER(
        ORDER BY cdate, ctime) rtot,
    SUM(ccost) OVER(
        PARTITION BY subscriber
        ORDER BY cdate, ctime) AS subtot
FROM calls
ORDER BY rtot

SUBSCRIBER

CDATE

CTIME

CCOST

RTOT

SUBTOT

01277824068

2007-09-14

15:31:57

0.25

0.25

0.25

01277824068

2007-09-14

15:32:57

0.20

0.45

0.45

01277824068

2007-09-17

15:33:57

0.20

0.65

0.65

01285720653

2007-09-17

15:34:58

0.50

1.15

0.50

01285720653

2007-09-17

15:35:58

0.20

1.35

0.70

01285720653

2007-09-18

15:36:58

0.20

1.55

0.90

01285720653

2007-09-18

15:37:58

0.50

2.05

1.40

01277824068

2007-09-18

15:38:58

0.50

2.55

1.15

01277824068

2007-09-19

15:39:58

0.75

3.30

1.90

01344300770

2007-09-19

15:41:58

0.25

3.55

0.25

In the RTOT calculation there is not PARTITION BY clause so this means the the whole data set is treated as one. The ORDER BY cdate,ctime clause sorts the data and gives the running total RTOT. This is used in the final ORDER BY to order the complete result set.

For the SUBTOT calucaltions the PARTITON BY clause means the SUM(CCOST) is computed for each subscriber. The SUM(ccost) is started at zer0 for each subscriber allowing sub-totals to be calculated. The ORDER BY cdate,ctime clause sorts the data for each subscriber. From the 4th row we can see that the SUBTOT equals CCOST as this is first row for SUBSCRIBER 01285720653. Looking at SUBSCRIBER 01277824068 and comparing the SUBTOT for rows 3 and 8; the difference in the SUBTOT values is 0.5. This is equal to CCOST in row 8 because rows 5, 6, 7 belong to another SUBSCRIBER.