The SELECT statement - full syntax

Use the SELECT statement to retrieve data from a database, and return it in the form of a table of query results.

Usage

[ [SELECT] WITH vname AS (select-statement) [,vname AS...] ]
SELECT [TOP row-count] [ALL | DISTINCT] {* | select-list}
FROM { { {table | view [{TABLESAMPLE [method] (percentage) [REPEATABLE (arg)]}] }
       |  joined-table
       | derived-table
       | literal }
      [correlation] }
, ...
[WHERE search-condition]
[GROUP BY { column-name | column-number | GROUPING SETS (column-list), ... } ]
[HAVING search-condition]
[ORDER BY { {column-name | column-number}
            [ASC | DESC]
            [NULLS {FIRST | LAST | MAX | MIN}], ... }]
[{AT {NOW | FULL_HISTORY | at_mode }]
[FETCH FIRST row-count ROWS ONLY | LIMIT row-count ]

Basic Functionality

As outlined in basic SELECT statement:

The SELECT Clause

The select clause specifies which columns to include. The asterisk (*) specifies “all columns”. If you don’t use the asterisk, you must give the names of the columns to include.

TOP row-count defines the maximum number of rows that are to be returned. The default is to return all rows. Use ORDER BY to specify the sequence the rows are returned. It is also possible to specify the maximum number of rows to be returned by using the FETCH FIRST row-count ROWS ONLY clause.

DISTINCT is a keyword that means, where identical rows exist, only one row is returned. The default is ALL.

Columns are returned in the order in which they are specified in the SELECT clause. If the asterisk is used then the columns are returned in the order that they are defined in the underlying table or tables.

Example 1: SELECT *

Select all columns from the CUST table:

SELECT * FROM cust

Example 2: SELECT DISTINCT

Use the DISTINCT keyword to select the distinct values in the CUSTNO column of the table CUST:

SELECT DISTINCT custno
FROM cust

The FROM Clause

The FROM clause specifies the table(s) or view(s) to select rows from

FROM
{ { { table | view
      [{TABLESAMPLE [method] (percentage) [REPEATABLE (arg)]}]} |
      joined-table | derived-table | literal}
  [correlation]}, ...

A derived table is a SQL construct consisting of a SELECT statement embedded in the FROM clause of another SELECT statement. Derived table support is required for full ANSI-92 SQL compliance. Derived tables are referred to by a variety of names, including: table subqueries, nested queries, and table value constructors (the formal ANSI-92 SQL name).

Derived tables let you develop queries in a modular way. This means that using them is quite similar to using views and view images, but there are obvious advantages for one-off queries. They can also remove duplication and make queries easier to read. Derived tables are discussed in detail in the Kognitio Guide. There are also various examples in this manual.

Example 3: Using literal values in SELECT and FROM

It is also possible to use literal values instead of a table or view:

SELECT 2 * pi() * r AS circumference
FROM ( VALUES (100), (200), (400) ) AS radius(r)

The FROM clause can be completely omitted if the SELECT statement is a constant expression. For example:

SELECT 2 * pi() * 100 AS circumference

Example 4: Using Table Name Aliases

The following three-way join, lists each partkey, name, and supplier for all supplied parts:

SELECT p.p_partkey, s.s_name, p.p_name
FROM part p, supplier s, partsupp ps
WHERE p.p_partkey = ps.ps_partkey AND
      s.s_suppkey = ps.ps_suppkey
ORDER BY 1, 2

The FROM clause lists three tables, and gives an alias for each (p for the PART table, s for the SUPPLIER table, and ps for the PARTSUPP table). The aliases are used to identify the tables in the select-list and also in the WHERE clause.

TABLESAMPLE

The TABLESAMPLE [method] (percentage) [REPEATABLE(arg)] clause causes a sample of the table’s contents to be selected. The method can be either the default SYSTEM, or BERNOULLI, the percentage is a value between 0 and 100 indicating the size of the sample that should be returned and the REPEATABLE(arg) is a positive integer that will ensure repeatability of the results by being used as the random seed that is used to identify the table sample, (i.e. if you use the same argument on the same table you will get the same sample); if arg is zero or omitted then the sample returned will be random.

Note: The BERNOULLI method was implemented because it is part of the SQL standard; however using the BERNOULLI method is likely to be slow due to the requirement to handle identical rows independently.

The WHERE Clause

The WHERE clause gives the join criteria and search-condition (to restrict the number of rows returned).

[WHERE search-condition]

You can use a Join expression instead of, or in addition to, join criteria in the WHERE clause. Note that there is a difference, as join expressions are applied before any WHERE condition.

Example 5: Adding a Search Condition

The following WHERE limits the result set to parts costing less than £100.00:

SELECT p_partkey, p_type, p_retailprice
FROM part
WHERE p_retailprice < 100
ORDER BY 3, 1

You can quickly change the query to find details of the rows that have been omitted (parts costing more than £99.99), by changing the less than operator (<) in the search-condition to the equal to or greater than operator (>=).

Example 6: Self-Join using WHERE

The following self-join shows all pairs of parts that are in the same container. Note that the < clause prevents (x, x) pairs as well as both combinations of (x, y); that is (y, x) and (x, y):

SELECT p1.p_partkey part1,
       p2.p_partkey part2,
       p1.p_container container
FROM part p1, part p2
WHERE p1.p_container = p2.p_container AND
p1.p_partkey < p2.p_partkey
ORDER BY p1.p_container, 1, 2

For more information on self-joins see the Kognitio Guide.

Example 7: Alternative WHERE Conditions

Example 6 has two conditions in the WHERE clause that work in combination to reduce the number of rows returned. Sometimes, you may want to retrieve rows that meet just one out of a series of conditions. To do this use OR. The example selects rows that meet either condition:

SELECT c_custkey, c_name, c_address
FROM customer
WHERE (c_name LIKE '%Brown%') OR
      (c_name LIKE '%Andy%')
ORDER BY 2

Using brackets is not necessary here but in more complex examples their use allows you to collect conditions together.

AT NOW

AT NOW is a Kognitio extension to SQL that shows the current state of a table excluding any currently active transactions that might be altering the contents. AT FULL_HISTORY is primarily for use by the archiving mechanism as it shows all the records in a table—including those that have been updated.

In Kognitio version 8.2.0 onwards, this functionality has been expanded see AT-modes for more details.

The WITH Clause

The WITH clause allows “views” to be temporarily created on a per-query basis. If the vname is the same as an existing table or view, it takes priority over the existing object. WITH clauses cannot be nested but they can refer to previously defined WITH clauses.

WITH Clauses are very similar to derived tables which are discussed in this manual and the Kognitio Guide.

Example 8: using WITH for simplifying queries

Use a WITH clause to simplify a query to count the total number of line items excluding the group of line items that occur most infrequently:

SELECT WITH
-- lncnt counts the occurrences per linenumber
lncnt(ln, c) AS (SELECT l_linenumber,
                 COUNT(l_orderkey)
                 FROM lineitem GROUP BY 1),
-- lnmin determines the line number that occurs least
lnmin(mn) AS (SELECT ln
              FROM lncnt
              WHERE c IN (SELECT MIN(c)
                          FROM lncnt))
-- Main query: return the count of line numbers excluding the minimum
SELECT COUNT(l_linenumber)
FROM lineitem,
     lnmin
WHERE l_linenumber NOT IN (mn);

The GROUP BY Clause

The GROUP BY clause specifies columns to form groups from the rows returned by the SELECT. Each group contains identical values in the specified column(s)

| [GROUP BY {column-name | column-number |
             GROUPING SETS (column-list)}, ...]

A column-name in the GROUP BY clause can refer to any column from any table in the FROM clause. You can use a column-number to reference any column in the select-list—but note that this is a Kognitio extension to SQL.

If you include a GROUP BY clause in a SELECT statement, the columns in the select-list that aren’t included in the grouping must be made up of aggregate functions (for example, AVG, MAX). In other words, the columns not used in the grouping must be reduced to a single value for each group.

GROUPING SETS are discussed in more detail in OLAP Functions

Example 9: Simple GROUP BY

This query counts the numbers of suppliers for each part in the PARTSUPP table. It groups on the part, counts the number of rows (COUNT(*)) and displays the rows in descending order of suppliers:

SELECT ps_partkey AS part, COUNT(*) AS suppliers
FROM partsupp
GROUP BY ps_partkey
ORDER BY suppliers desc, 1

The HAVING Clause

Use the HAVING clause to identify groups of rows that appear in a logical table defined by a SELECT statement

[HAVING search-condition]

The search-condition applies to

  • columns specified within a GROUP BY clause

  • columns created by the use of aggregate functions, or

  • expressions that contain column functions.

If you include a GROUP BY clause before the HAVING clause, the search-condition in the HAVING clause applies to each group formed by like values in the columns specified by the GROUP BY clause.

If you don’t include a GROUP BY clause, then the search-condition in the HAVING clause applies to the single row returned after aggregating the entire logical table defined by the SELECT statement.

Example 10: GROUP BY and HAVING

Where more than one supplier supplies a part, display the maximum and minimum prices:

SELECT ps_partkey part,
       MIN(ps_supplycost),
       MAX(ps_supplycost)
FROM partsupp
GROUP BY 1
HAVING COUNT(*) > 1
ORDER BY 1

Note the search-condition in the HAVING clause does not have to appear in the SELECT list

Example 11: HAVING with No GROUP BY Clause

If HAVING is used in a statement with no GROUP BY clause, then it must be in an aggregated row:

SELECT MAX(o_totalprice) AS totalprice_high
FROM ORDERTAB
HAVING MAX(o_totalprice) > 200

The ORDER BY Clause

Using ORDER BY is the only way to specify the sequence of rows returned from a SELECT statement—if you don’t use an ORDER BY clause, the order of returned results is unspecified

ORDER BY {{column-name | column-number}
[ASC | DESC] [NULLS {FIRST | LAST | MAX | MIN}],...}

You can use the column-name or column-number, or a combination of the two. But, if the column-name is ambiguous, you must use a relative column-number in the clause.

You can specify ascending order (ASC) or descending order (DESC) for each column returned. If neither is specified, ASC is assumed.

You can specify whether NULLS should come before (FIRST/MIN) or after (LAST/MAX) any other values. The default is LAST/MAX.

Note: If you don’t specify the order for a column, Kognitio returns the results in random order. If you run the same query a second time, the random order may be different from that in the first result set.

Example 12: Changing the Sorted Order

Example 5 includes an ORDER BY clause, which sorts the data by p_retailprice (column 3) and then by p_partkey (column 1):

SELECT p_partkey, p_type, p_retailprice
FROM part
WHERE p_retailprice < 100
ORDER BY 3, 1

If you change the ORDER BY clause to:

SELECT p_partkey, p_type, p_retailprice
FROM part
WHERE p_retailprice < 100
ORDER BY p_retailprice, p_partkey

You get the same result, because you are just using column names instead of numbers.

If you changed the ORDER BY clause to:

SELECT p_partkey, p_type, p_retailprice
FROM part
WHERE p_retailprice < 100
ORDER BY p_partkey, p_retailprice

Then you would first sort by p_partkey and then p_retailprice. Since p_partkey is unique, you would never actually sort by price, note that this is not the same as the earlier example where several parts may have the same price, which would then be sorted by part number.

Example 13: Sorting in Descending Order

So far, the majority of result sets have been sorted in ascending order (lower through higher), which is the default. You can also specify descending order (higher through lower) for any column. The statement below sorts the results by p_retailprice in descending order, but any parts with the same value are sorted on p_partkey in ascending order. We also ensure that any p_retailprice values that are NULL are returned first:

SELECT p_partkey, p_type, p_retailprice
FROM part
WHERE p_retailprice < 100
ORDER BY p_retailprice DESC NULLS FIRST, p_partkey