Skip to content

6. Advanced SQL

6.1. Introduction

In this chapter, we present

  • additional syntaxes of the SELECT statement that make it a very powerful query command, particularly for querying multiple tables at once.
  • extended syntaxes of commands already covered

To illustrate the various orders, we will work with the following tables used for order management in a small-to-medium-sized book distribution company:

6.1.1. the CLIENTS table

It stores information about the company's customers:

 

Image

ID
A unique identifier for the customer - primary key
NAME
customer name
STATUS
I=Individual, E=Business, A=Government
FIRST NAME
first name for an individual
CONTACT
Name of the contact person at the client’s location (in the case of a company or government agency)
STREET
Client address - street
CITY
City
ZIP
ZIP
Phone
Phone
SINCE
Since when have you been a customer?
DEBTOR
Y (Yes) if the customer owes money to the company and N (No) otherwise.

6.1.2. The ARTICLES table

It stores information about the products sold, in this case books. Its structure is as follows:

Image

ISBN
A unique identifier for a book (ISBN = International Standard Book Number) - primary key
TITLE
Title of the book
PUBLISHER_CODE
Code uniquely identifying a publisher
AUTHOR
Author's name
SUMMARY
Book summary
QTY
Quantity sold this year
QTEANPREC
Quantity sold the previous year
LAST SALE
Date of last sale
QTERECUE
Quantity of the last delivery
LASTDELIVERY
Date of last delivery
SELLPRICE
Selling price
COST
Purchase cost
MINCDE
Minimum Order Quantity
MINSTOCK
Minimum stock level
QTESTOCK
Quantity in stock

Its contents could be as follows:

Image

6.1.3. the ORDERS table

It stores information about orders placed by customers. Its structure is as follows:

Image

NOCMD
A unique identifier for an order - primary key
CUSTID
Customer ID for this order - foreign key - reference to CUSTOMERS(ID)
ORDER_DATE
Date this order was entered
CANCELED
O (Yes) if the order was canceled and N (No) otherwise.

Image

6.1.4. The DETAILS table

It contains the details of an order, i.e., the titles and quantities of the books ordered. Its structure is as follows:

Image

NOCMD
Order number - foreign key referencing the NOCMD column of the COMMANDES table
ISBN
Ordered book number - foreign key referencing the ISBN column in the BOOKS table
QTE
Ordered quantity

Its contents could be as follows:

Image

Above, we see that order No. 3 (NOCMD) involves three books. This means that the customer ordered three books at the same time. This customer’s records can be found in the [ORDERS] table, where we see that order No. 3 was placed by customer No. 5. The [CUSTOMERS] table tells us that customer No. 5 is the company NetLogos in Segré.

6.2. The SELECT Statement

Here, we aim to deepen our understanding of the SELECT statement by introducing new syntaxes for it.

6.2.1. Syntax of a multi-table query

syntax
SELECT column1, column2, ...
FROM table1, table2, ..., tablep
WHERE condition
ORDER BY ...
action
The novelty here is that the columns column1, column2, ... come from multiple tables table1, table2, ... If two tables have columns with the same name, the ambiguity is resolved using the notation tablei.columnj. The condition can apply to columns from different tables.

How it works

1
The Cartesian product of tables table1, table2, ..., tablep is constructed. If n_i is the number of rows in table_i, the resulting table has n₁*n₂*...*n_p rows containing all the columns from the different tables.
2
The WHERE condition is applied to this table. A new table is thus produced
3
This table is sorted according to the method specified in ORDER.
4
The columns specified after SELECT are displayed.

Examples

We use the tables presented earlier. We want to see the details of orders placed after September 25:

SQL>select details.nocmd, isbn, qte from orders, details
  where orders.order_date > '25-sep-91'
  and details.nocmd=orders.nocmd

Image

Note that after FROM, we list the names of all the tables whose columns we are referencing. In the previous example, the selected columns all belong to the DETAILS table. However, the condition references the ORDERS table. Hence the need to list the latter after FROM. The operation that tests for equality between columns in two different tables is often called an equijoin.

The SELECT query could also have been written as follows:

SQL> select details.nocmd, isbn, qte from commandes
    inner join details on details.nocmd=commandes.nocmd
    where orders.order_date > '25-Sep-91'

Let’s continue with our examples. We want the same result as before but with the title of the ordered book, rather than its ISBN:

SQL>select orders.order_id, items.title, details.quantity
  from orders, items, details
  where orders.order_date > '25-Sep-91'
  and details.orderNo = orders.orderNo
  and details.isbn=articles.isbn

Image

The same result is obtained with the following SQL query, which is less readable:

SQL> select details.order_id, articles.title, details.quantity from details
    inner join orders on details.order_id=orders.order_id
    inner join articles on details.isbn=articles.isbn
    where orders.order_date > '25-Sep-91'

Above, two inner joins are performed with the [DETAILS] table:

  • one with the [ORDERS] table to access the order date of a book
  • one with the [ARTICLES] table to access the title of the ordered book

We also want the name of the customer placing the order:

SQL>select orders.order_id, items.title, qty, customers.name
  from orders,details,items,customers
  where orders.order_date > '25-Sep-91'
  and details.orderID=orders.orderID
  and details.isbn = items.isbn
  and orders.customer_id = customers.id

Image

We also want the order dates and a display of these dates in descending order:

SQL>select orders.orderId, orders.orderDate, items.title, qty, customers.name
    from orders, details, items, customers
    where orders.order_date > '25-Sep-91'
    and details.orderId=orders.orderId
    and details.isbn = articles.isbn
    and orders.customer_id = customers.id
    order by orders.order_date descending

Image

Here are a few rules to follow when creating joins:

  1. After SELECT, list the columns you want to display. If the column exists in multiple tables, precede it with the table name.
  2. After FROM, list all the tables that will be queried by the SELECT statement, i.e., the tables containing the columns listed after SELECT and WHERE.

6.2.2. Self-join

We want to find the books that have a higher retail price than the book 'Using SQL':

SQL>select a.title from articles a, articles b
  where b.title='Using SQL'
  and a.selling_price > b.selling_price

Image

The two tables in the join are identical here: the articles table. To distinguish them, we give them aliases: from articles a, articles b. The alias for the first table is a, and the alias for the second is b. This syntax can be used even if the tables are different. When using an alias, it must be used throughout the SELECT statement in place of the table it refers to.

6.2.3. Outer Join

We want to identify the customers who made a purchase in September, along with the order date. The other customers are displayed without this date:

SQL>SELECT clients.name, orders.order_date FROM clients
    LEFT OUTER JOIN orders  ON clients.id = orders.idcli
    where order_date between '01-Sep-91' and '30-Sep-91'

Image

It is surprising here that we do not get the correct result. We should have all the customers present in the [CLIENTS] table, which is not the case. When we think about how an outer join works, we realize that customers who haven’t made a purchase have been matched with an empty row in the ORDERS table and therefore with an empty date (a NULL value in SQL terminology). This date doesn’t meet the condition set for the date, so the corresponding customer isn’t displayed. Let’s try something else:

SQL>select clients.name, orders.order_date from clients
    left outer join orders  on customers.id=orders.id
    where (orders.order_date between '01-sep-91' and '30-sep-91')
        or (orders.order_date is null)

Image

This time, we get the correct answer to our question.

6.2.4. Nested queries

syntax
SELECT column[s] FROM table[s]
WHERE expression query operator
ORDER BY ...
How it works
A query is a SELECT statement that returns a set of 0, 1, or more values. We then have a WHERE condition of the type
expression operator (val1, val2, ..., vali)
expression and vali must be of the same type. If the query returns a single value, we are reduced to a condition of the type
expression operator value
which we are familiar with. If the query returns a list of values, we can use the following operators:
IN
expression IN (val1, val2, ..., vali): true if expression evaluates to one of the elements in the list vali.
NOT IN
opposite of IN
ANY
must be preceded by =, !=, >, >=, <, <=
expression >= ANY (val1, val2, .., valn): true if expression is >= one of the values vali in the list
ALL
must be preceded by =, !=, >, >=, <, <=
expression >= ALL (val1, val2, .., valn): true if expression is >= all valid values in the list
EXISTS
query: true if the query returns at least one row.

Examples

Let’s revisit the question already solved by an equijoin: display the titles with a selling price higher than that of the book 'Using SQL'.

SQL>select title from ARTICLES
    where selling_price > (select selling_price from ARTICLES where title='Using SQL')

Image

This solution seems more intuitive than the equijoin. We perform an initial filter with a SELECT, then a second one on the resulting set. We can perform several filters in sequence this way.

We want to find the titles with a selling price higher than the average selling price:

SQL> select title from ARTICLES
    where selling_price > (select avg(selling_price) from ARTICLES)

Image

Which customers ordered the titles returned by the previous query?

SQL> select distinct idcli from ORDERS, DETAILS
    where DETAILS.isbn in
    (select isbn from ARTICLES where selling_price
        > (select avg(selling_price) from ITEMS))
    and ORDERS.order_id = DETAILS.order_id

Image

Explanation

  1. We select from the DETAILS table the ISBN codes found among the books with a price higher than the average book price.
  2. In the rows selected in the previous step, the customer ID (IDCLI) is not present. It is found in the ORDERS table. The link between the two tables is established via the order number (NOCMD), hence the join condition ORDERS.nocmd=DETAILS.nocmd.
  3. A single customer may have purchased one of the relevant books multiple times, in which case their IDCLI code would appear multiple times. To avoid this, we place the DISTINCT keyword after SELECT. DISTINCT generally eliminates duplicates from the rows returned by a SELECT query.
  4. To retrieve the customer's name, we would need to perform an additional join between the ORDERS and CUSTOMERS tables, as shown in the following query.
SQL> select distinct CLIENTS.name from ORDERS, DETAILS, CLIENTS
    where DETAILS.isbn in
    (select isbn from ITEMS where selling_price
        > (select avg(selling_price) from ITEMS))
    and ORDERS.order_id = DETAILS.order_id
    and ORDERS.CUSTID=CUSTOMERS.ID

Image

Find customers who haven't placed an order since September 24:

SQL>select name from CUSTOMERS
    where clients.id not in
    (select distinct orders.customer_id from orders where order_date >= '24-Sep-91')

Image

We have seen that rows can be filtered in ways other than using the WHERE clause: by using the HAVING clause in conjunction with the GROUP BY clause. The HAVING clause filters groups of rows.

Just like with the WHERE clause, the syntax


     HAVING expression operator query 

is possible, with the constraint already mentioned that expression must be one of the expressions expri in the clause


     GROUP BY expr1, expr2, ...

Examples

What are the sales figures for books priced over 200F?

First, let’s display the quantities sold by title:

SQL>select ARTICLES.title, sum(qty) QTY from ARTICLES, DETAILS
    where DETAILS.isbn=ARTICLES.isbn
    group by title

Image

Now, let’s filter the titles:

SQL> select ARTICLES.title, sum(qty) QTY from ARTICLES, DETAILS
    where DETAILS.isbn=ARTICLES.isbn
    group by title
    where title is in (select title from ARTICLES where retail_price > 200)

Image

Perhaps more obviously, we could have written:

SQL>select ARTICLES.title, sum(qty) QTY from ARTICLES, DETAILS
    where DETAILS.isbn=ARTICLES.isbn
    and ARTICLES.selling_price > 200
    group by title

Image

6.2.5. Nested Queries

In the case of nested queries, there is a parent query (the outermost query) and a child query (the innermost query). The parent query is evaluated only after the child query has been fully evaluated.

Correlated queries have the same syntax, with the following minor difference: the child query performs a join on the parent query’s table. In this case, the parent-child query pair is evaluated repeatedly for each row in the parent table.

Example

Let’s revisit the example where we want the names of customers who haven’t placed an order since September 24:

SQL> 
select name from customers
    where not exists
        (select customer_id from orders
            where order_date >= '24-Sep-91'
                and orders.customer_id = customers.id)

Image

The parent query operates on the customers table. The child query performs a join between the customers and orders tables. This is therefore a correlated query. For each row in the clients table, the child query runs: it searches for the client’s ID in orders placed after September 24. If it finds none (not exists), the client’s name is displayed. Then, it moves on to the next row in the clients table.

6.2.6. Criteria for Writing the SELECT Statement

We have seen, on several occasions, that it is possible to obtain the same result using different SELECT statements. Let’s take an example: Display customers who have placed an order:

Join

SQL>
select distinct name from customers,orders
    where customers.id = orders.id

Image

Nested queries

SQL> 
SELECT name FROM customers
    where id in (select idcli from orders)

returns the same result.

Correlated queries

SQL>
SELECT name FROM customers
    where exists (select * from orders where orders.idcli=clients.id)

returns the same result.

Authors Christian MAREE and Guy LEDANT, in their book 'SQL: Introduction, Programming, and Mastery,' suggest a few selection criteria:

Performance

The user does not know how the DBMS "manages" to find the results they request. It is therefore only through experience that they will discover that one query is more efficient than another. MAREE and LEDANT assert from experience that correlated queries generally seem slower than nested queries or joins.

Formulation

Formulation using nested queries is often more readable and intuitive than joins. However, it is not always usable. Two points in particular should be noted:

  • The tables containing the columns specified in the SELECT clause (SELECT col1, col2, ...) must be listed after the FROM keyword. The Cartesian product of these tables is then performed, which is known as a join.
  • When the query displays results from a single table, and filtering the rows of that table requires consulting another table, nested queries can be used.

6.3. Syntax Extensions

For convenience, we have mostly presented abbreviated syntaxes for the various commands. In this section, we present their expanded syntaxes. They are self-explanatory because they are analogous to those of the widely studied SELECT command.

INSERT

syntax1
INSERT INTO table (col1, col2, ...) VALUES (val1, val2, ...)
syntax2
INSERT INTO table (col1, col2, ..) (query)
explanation
These two syntaxes have been presented

DELETE

syntax1
DELETE FROM table WHERE condition
explanation
This syntax is well-known. Note that the condition may contain a query using the syntax WHERE expression operator (query)

UPDATE

syntax1
UPDATE table
SET col1=expr1, col2=expr2, ...
WHERE condition
explanation
This syntax has already been presented. Note that the condition may contain a query using the syntax WHERE expression operator (query)
syntax2
UPDATE table
SET (col1, col2, ..) = query1, (cola, colb, ..) = query2, ...
WHERE condition
explanation
The values assigned to the various columns may come from a query.