# Analytic Functions Overview¶

Using analytic functions in your business Intelligence queries provides the following benefits:

• Improved query processing — using these functions results in better performance, because the system no longer must perform complex procedural processing and instead can perform simple SQL queries.

• Enhanced productivity — you can perform complex analysis with clearer, more concise code. The code is quicker to formulate and easy to maintain.

• Standardized syntax — because these functions are part of the ANSI standard, they are supported in many software packages.

Analytic functions compute an aggregate value based on a group of rows. The group of rows is called a window and is defined by the analytic clause. Analytic functions differ from aggregate functions in that while an aggregate returns one row for each group, analytic functions return all rows in the window.

For each row, a “sliding” window of rows is defined. The window determines the range of rows used to perform the calculations for the “current row”. Window sizes can be based on either a physical number of rows or a numeric interval.

Analytic functions are the last set of operations performed in a query except for the final ORDER BY clause. All joins and all WHERE, GROUP BY, and HAVING clauses are completed before the analytic functions are processed. Therefore, analytic functions can appear only in the SELECT list or ORDER BY clause.

## Example Data Set¶

All analytical function examples are based on the following table and data:

```CREATE TABLE calls (
subscriber VARCHAR(20), -- Calling number
cdate DATE, -- Date of call
ctime TIME, -- Time of call
csecs INTEGER, -- Length of call
ccost DECIMAL(6,2), -- Cost of call
calledno VARCHAR(20)) -- Called number

SELECT * FROM CALLS
```

SUBSCRIBER

CDATE

CTIME

CSECS

CCOST

CALLEDNO

01277824068

2007-09-14

15:32:57

12

0.20

01344300770

01285720653

2007-09-17

15:34:58

200

0.50

01344300770

01285720653

2007-09-18

15:36:58

10

0.20

01344300770

01277824068

2007-09-18

15:38:58

127

0.50

01285720653

01344300770

2007-09-19

15:41:58

110

0.25

01277824068

01277824068

2007-09-14

15:31:57

120

0.25

01344300770

01277824068

2007-09-17

15:33:57

20

0.20

01344300770

01285720653

2007-09-17

15:35:58

1

0.20

01344300770

01285720653

2007-09-18

15:37:58

242

0.50

01344300770

01277824068

2007-09-19

15:39:58

414

0.75

01285720653

# Analytical Function Syntax¶

The Syntax of Ranking and Windowing functions is:

```analytic_function ( [value_expr] [,...] )
OVER ( query_partition_clause
order_by_clause
windowing_clause)
```

## analytic_function¶

Specify the name of an analytic function, Kognitio provides many analytic functions such as AVG, COUNT, DENSE_RANK, FIRST, LAG, LAST, LEAD, MAX, MIN, NTILE, RATIO_TO_REPORT, RANK, ROW_NUMBER, STDDEV, SUM, VARIANCE.

## query_partition_clause¶

```[PARTITION BY column-reference [,...]]
```

The `PARTITION` BY clause logically breaks a single result set into N partitions, according to the criteria set by the partition expressions. The analytic functions are applied to each partition independently; they are reset for each partition.

## order_by_clause¶

```ORDER BY {value-expression [ASC | DESC]} [,...]
```

The `ORDER` BY clause specifies how the data is sorted within each group (partition).

## windowing_clause¶

```[ROWS | RANGE
{window-start} |
{BETWEEN window-bound AND window-bound}]
```

where `window-start` is:

```UNBOUNDED PRECEDING |
literal PRECEDING |
CURRENT ROW |
```

and `window-bound` is:

```UNBOUNDED PRECEDING |
literal PRECEDING |
CURRENT ROW |
literal FOLLOWING |
UNBOUNDED FOLLOWING
```

The windowing clause defines a sliding or anchored window of data, on which the analytic function will operate, within a group. This clause can be used to have the analytic function compute its value based on any arbitrary sliding or anchored window within a group.

The default window is an anchored window that simply starts at the first row of a group and continues to the current row.

Windows can be specified using one of two criteria: a `RANGE` of data values or `ROWS` offset from the current row. The existence of an `ORDER BY` in an analytic function will add a default window clause of ```RANGE UNBOUNDED PRECEDING```, i.e. includes all rows in our partition with a value equal to or less than our value specified by the `ORDER BY` clause.

Note: To simply get all previous rows before you in the window that utilizes an `ORDER BY` clause you need to use `ROW UNBOUNDED PRECEDING`.

## Example 1: Using Row Windows to obtain moving averages¶

Row Windows specify the physical number of rows to include in the window. The following example uses a sliding window within a group and computes the sum of the current row’s CCOST column plus the previous 2 rows in that group:

```SELECT subscriber,
ccost,
CAST(AVG(ccost) OVER (
PARTITION BY subscriber
ORDER BY ccost ROWS 2 PRECEDING)
AS DEC(5,3)) moving_avg
FROM calls
ORDER BY 1, 2, 3;
```

SUBSCRIBER

CCOST

MOVING_AVG

01277824068

0.20

0.200

01277824068

0.20

0.200

01277824068

0.25

0.217

01277824068

0.50

0.317

01277824068

0.75

0.500

01285720653

0.20

0.200

01285720653

0.20

0.200

01285720653

0.50

0.300

01285720653

0.50

0.400

01344300770

0.25

0.250

The `PARTITON BY` clause means the `AVG(ccost)` is computed for each subscriber. The `AVG(ccost)` is “reset” as the subscriber changes. The `ORDER BY ccost` clause sorts the data for each subscriber by `ccost`; this allows the window clause: `ROWS 2 PRECEDING`, to access the 2 rows prior to the current row in a group in order to average the costs.

`CAST` is used to round the moving average to 3 decimal places.

With `ROW` partitions the data may be of any type and the order by may include many columns.

## Example 2: Using Range Windows to count similar records¶

Range windows collect rows together based on a `WHERE` clause. For example `RANGE 5 PRECEDING` will generate a sliding window that has the set of all rows in the group such that they are equal to or within 5 units preceding the value of the current row. These units must be numeric comparisons and it is not valid to use RANGE with data types other than numbers.

The following example counts the calls with a cost that are less than (or equal to) but within £0.50 of the cost of the current call. The range window goes back £0.50 from the current row’s call cost and then counts the rows within this range.

Note: when another row has the same value as the current row it will be included within the range and therefore the count, even if it appears after the current row in the results:

```SELECT subscriber,
ccost,
COUNT(*) OVER(
PARTITION BY SUBSCRIBER
ORDER BY CCOST ASC RANGE 0.5 PRECEDING) Within_50p
FROM CALLS
ORDER BY SUBSCRIBER, CCOST ASC;
```

SUBSCRIBER

CCOST

WITHIN_50P

01277824068

0.20

2

01277824068

0.20

2

01277824068

0.25

3

01277824068

0.50

4

01277824068

0.75

3

01285720653

0.20

2

01285720653

0.20

2

01285720653

0.50

4

01285720653

0.50

4

01344300770

0.25

1

The `PARTITON BY` clause means the `RANGE` and `COUNT` are applied within each subscriber separately. The `ORDER BY ccost` clause sorts the data for each subscriber by `ccost`; this allows the window clause: `RANGE 0.5 PRECEDING`, to access the rows prior to the current row and check if they are within 50p of the cuurent row.

## Example 3: Running Totals and Sub-Totals¶

This example shows how to calculate a “Running Total” for the entire query. This is done using the entire ordered result set. In addition we compute a running sub-total for each subscriber:

```SELECT subscriber,
cdate,
ctime,
ccost,
SUM(ccost) OVER(ORDER BY cdate, ctime) rtot,
SUM(ccost) OVER(PARTITION BY subscriber
ORDER BY cdate, ctime) AS subtot
FROM calls
ORDER BY rtot
```

SUBSCRIBER

CDATE

CTIME

CCOST

RTOT

SUBTOT

01277824068

2007-09-14

15:31:57

0.25

0.25

0.25

01277824068

2007-09-14

15:32:57

0.20

0.45

0.45

01277824068

2007-09-17

15:33:57

0.20

0.65

0.65

01285720653

2007-09-17

15:34:58

0.50

1.15

0.50

01285720653

2007-09-17

15:35:58

0.20

1.35

0.70

01285720653

2007-09-18

15:36:58

0.20

1.55

0.90

01285720653

2007-09-18

15:37:58

0.50

2.05

1.40

01277824068

2007-09-18

15:38:58

0.50

2.55

1.15

01277824068

2007-09-19

15:39:58

0.75

3.30

1.90

01344300770

2007-09-19

15:41:58

0.25

3.55

0.25

In the `rtot` calculation there is no `PARTITION BY` clause so this means the the whole data set is treated as one. The `ORDER BY` cdate,ctime clause sorts the data and gives the running total. This is used in the final `ORDER BY` to order the complete result set.

For the `subtot` calucaltions the `PARTITON BY` clause means the `SUM(ccost)` is computed for each subscriber. The `SUM(ccost)` is started at zero for each subscriber allowing sub-totals to be calculated. The `ORDER BY cdate,ctime` clause sorts the data for each subscriber. From the 4th row we can see that the `subtot` equals `ccost` as this is first row for subscriber 01285720653. Looking at subscriber 01277824068 and comparing the `subtot` for rows 3 and 8; the difference in the `subtot` values is 0.5. This is equal to `ccost` in row 8 because rows 5, 6, 7 belong to another subscriber.