The SELECT statement

Fundemental to using SQL queries to access data is the SELECT statement. There is a vast amount of functionality in this command so we provide basic functioanlity here and details of the full syntax in the DDL.

SELECT [TOP row-count] [ALL | DISTINCT] {* | select-list}
FROM { {table | view} | joined-table | derived-table | literal }
, ...
[WHERE search-condition]
[GROUP BY { column-name | column-number ]
[HAVING having-condition]
[ORDER BY {{ column-name | column-number}
           [ASC | DESC] ]
[FETCH FIRST row-count ROWS ONLY | LIMIT row-count]
  • SELECT clause - specifies which columns to include. The asterisk (*) specifies “all columns”. If you don’t use the asterisk, you must give the names of the columns to include in the select-list. Columns are returned in the order in which they are specified in the SELECT clause. If the asterisk is used then the columns are returned in the order that they are defined in the underlying tables or views.

  • TOP row-count - defines the maximum number of rows that are to be returned. The default is to return all rows. Use ORDER BY to specify the sequence the rows are returned. It is also possible to specify the maximum number of rows to be returned by using the FETCH FIRST row-count ROWS ONLY or LIMIT row-count clause. These are equivalent.

  • DISTINCT - is a keyword that means, where identical rows exist, only one row is returned. The default is ALL.

  • FROM clause - specifies where the SELECT rows come from. This can be:

    1. a table or view

    2. joined-table - specifies a JOIN between 2 or more tables or views. These are very common in SQL. See example 3 below and the types of JOIN available for more details

    3. derived-table - a SQL construct consisting of a SELECT statement embedded in the FROM clause of the SELECT statement. Derived tables are referred to by a variety of names, including: table subqueries, nested queries, and table value constructors (the formal ANSI-92 SQL name). Derived tables let you develop queries in a modular way. Using them is very similar to using views but in one-off queries using derived tables means objects do not have to be created. They can also remove duplication and make queries easier to read. For an example See step 7 in the Getting Started queries

    4. literal - any fixed value, set of values or generated series

  • WHERE clause - gives the search-condition that restricts the number of rows returned. It can also be used for performing joins instead of or in addtion to using a join-table statement. Note that there is a difference, as join-table expressions are always applied before any WHERE condition.

  • GROUP BY - specifies columns to form groups from the rows returned by the SELECT. Each group contains identical values in the specified column(s) and each group will return a single row in the results.

    A column-name in the GROUP BY clause can refer to any column from any table in the FROM clause. You can use a column-number to reference any column in the select-list but note that this is a Kognitio extension to SQL.

    If you include a GROUP BY clause in a SELECT statement, the columns in the select-list that aren’t included in the GROUP BY must be made up of aggregate functions (for example, AVG, MAX). In other words, the columns not used in the GROUP BY must return a single value for each group.

  • HAVING clause - identifies groups of rows that appear in a logical table defined by a SELECT statement: or in other words acts on the results of the SELECT. The having-condition applies to: #. columns specified within a GROUP BY clause #. columns created by the use of aggregate functions, or #. expressions that contain column functions.

    If you include a GROUP BY clause before the HAVING clause, the having-condition applies to each group formed by like values in the columns specified by the GROUP BY clause.

    If you don’t include a GROUP BY clause, then the having-condition applies to the single row returned after aggregating the entire logical table defined by the SELECT statement.

  • ORDER BY Clause - specify the sequence of rows returned from a SELECT statement. If you don’t use an ORDER BY clause, the order of returned results is unspecified.

    You can use the column-name or column-number, or a combination of the two. But, if the column-name is ambiguous, you must use a relative column-number in the clause.

    Specify the order as either ascending (ASC) or descending (DESC) for each column returned. If neither is specified, ASC is assumed.

    You can specify whether NULLS should come before (FIRST/MIN) or after (LAST/MAX) any other values. The default is LAST/MAX.

    Note: If you don’t include a column in ORDER BY, Kognitio returns the results in random order. If you run the same query a second time, the random order may be different from that in the first result set.

Examples

Various examples of different SELECT functionality are given below. These use the Kognitio training data set.

Example 1: SELECT *

Select all columns from the CUST table:

SELECT * FROM cust

Example 2: SELECT DISTINCT

Use the DISTINCT keyword to select the distinct values in the CUSTNO column of the table CUST:

SELECT DISTINCT custno
FROM cust

Example 3: Using Table JOINS and Aliases

The following three-way join, lists each partkey, name, and supplier for all supplied parts:

SELECT p.p_partkey, s.s_name, p.p_name
FROM part p
JOIN  partsupp ps
ON p.p_partkey = ps.ps_partkey
JOIN supplier s
ON s.s_suppkey = ps.ps_suppkey
ORDER BY 1, 2

The FROM clause lists three tables, and gives an alias for each (p for the PART table, s for the SUPPLIER table, and ps for the PARTSUPP table). The aliases are used to identify the tables in the select-list and in the JOIN clause.

An alternative way of writing this query is to use the WHERE clause to perform the JOIN:

SELECT p.p_partkey, s.s_name, p.p_name
FROM part p, supplier s, partsupp ps
WHERE p.p_partkey = ps.ps_partkey
AND s.s_suppkey = ps.ps_suppkey
ORDER BY 1, 2

Example 4: Using literal values in SELECT and FROM

It is also possible to use literal values instead of a table or view:

SELECT 2 * pi() * r AS circumference
FROM ( VALUES (100), (200), (400) ) AS radius(r)

The FROM clause can be completely omitted if the SELECT statement is a constant expression. For example:

SELECT 2 * pi() * 100 AS circumference

Example 5: Adding a Search Condition

The following WHERE limits the result set to parts costing less than £100.00:

SELECT p_partkey, p_type, p_retailprice
FROM part
WHERE p_retailprice < 100
ORDER BY 3, 1

You can quickly change the query to find details of the rows that have been omitted (parts costing more than £99.99), by changing the less than operator (<) in the search-condition to the equal to or greater than operator (>=).

Example 6: Self-Join using WHERE

The following self-join shows all pairs of parts that are in the same container. Note that the < clause prevents (x, x) pairs as well as both combinations of (x, y); that is (y, x) and (x, y):

SELECT p1.p_partkey part1,
       p2.p_partkey part2,
       p1.p_container container
FROM part p1, part p2
WHERE p1.p_container = p2.p_container
AND p1.p_partkey < p2.p_partkey
ORDER BY p1.p_container, 1, 2

Example 7: Alternative WHERE Conditions

Example 6 has two conditions in the WHERE clause that work in combination to reduce the number of rows returned. Sometimes, you may want to retrieve rows that meet just one out of a series of conditions. To do this use OR. The example selects rows that meet either condition:

SELECT c_custkey, c_name, c_address
FROM customer
WHERE (c_name LIKE '%Brown%')
OR (c_name LIKE '%Andy%')
ORDER BY 2

Using brackets is not necessary here but in more complex examples their use allows you to collect conditions together.

Example 8: Simple GROUP BY and ORDER BY

This query counts the numbers of suppliers for each part in the PARTSUPP table. It groups on the part, counts the number of rows (COUNT(*)) and displays the rows in descending order of suppliers:

SELECT ps_partkey AS part, COUNT(*) AS suppliers
FROM partsupp
GROUP BY ps_partkey
ORDER BY suppliers DESC, 1

Example 10: Using GROUP BY and HAVING clauses

Where more than one supplier supplies a part, display the maximum and minimum prices:

SELECT ps_partkey part,
       MIN(ps_supplycost),
       MAX(ps_supplycost)
FROM partsupp
GROUP BY 1
HAVING COUNT(*) > 1
ORDER BY 1

Note the having-condition in the HAVING clause does not have to appear in the SELECT list.

Example 11: HAVING with No GROUP BY Clause

If HAVING is used in a statement with no GROUP BY clause, then it must be in an aggregated row:

SELECT MAX(o_totalprice) AS totalprice_high
FROM ORDERTAB
HAVING MAX(o_totalprice) > 200

Example 12: Changing the Sorted Order

Example 5 includes an ORDER BY clause, which sorts the data by p_retailprice (column 3) and then by p_partkey (column 1):

SELECT p_partkey, p_type, p_retailprice
FROM part
WHERE p_retailprice < 100
ORDER BY 3, 1

If you change the ORDER BY clause to:

SELECT p_partkey, p_type, p_retailprice
FROM part
WHERE p_retailprice < 100
ORDER BY p_retailprice, p_partkey

You get the same result, because you are just using column names instead of numbers.

If you changed the ORDER BY clause to:

SELECT p_partkey, p_type, p_retailprice
FROM part
WHERE p_retailprice < 100
ORDER BY p_partkey, p_retailprice

Then you would first sort by p_partkey and then p_retailprice. Since p_partkey is unique, you would never actually sort by price, note that this is not the same as the earlier example where several parts may have the same price, which would then be sorted by part number.

Example 13: Sorting in Descending Order

So far, the majority of result sets have been sorted in ascending order (lower through higher), which is the default. You can also specify descending order (higher through lower) for any column. The statement below sorts the results by p_retailprice in descending order, but any parts with the same value are sorted on p_partkey in ascending order. We also ensure that any p_retailprice values that are NULL are returned first:

SELECT p_partkey, p_type, p_retailprice
FROM part
WHERE p_retailprice < 100
ORDER BY p_retailprice DESC NULLS FIRST, p_partkey