6. Advanced SQL

6.1. Introduction

In this chapter, we present

additional syntaxes of the SELECT statement that make it a very powerful query command, particularly for querying multiple tables at once.
extended syntaxes of commands already covered

To illustrate the various orders, we will work with the following tables used for order management in a small-to-medium-sized book distribution company:

6.1.1. the CLIENTS table

It stores information about the company's customers:

`ID`	A unique identifier for the customer - primary key
`NAME`	customer name
`STATUS`	I=Individual, E=Business, A=Government
`FIRST NAME`	first name for an individual
`CONTACT`	Name of the contact person at the client’s location (in the case of a company or government agency)
`STREET`	Client address - street
`CITY`	City
`ZIP`	ZIP
`Phone`	Phone
`SINCE`	Since when have you been a customer?
`DEBTOR`	Y (Yes) if the customer owes money to the company and N (No) otherwise.

6.1.2. The ARTICLES table

It stores information about the products sold, in this case books. Its structure is as follows:

`ISBN`	A unique identifier for a book (ISBN = International Standard Book Number) - primary key
`TITLE`	Title of the book
`PUBLISHER_CODE`	Code uniquely identifying a publisher
`AUTHOR`	Author's name
`SUMMARY`	Book summary
`QTY`	Quantity sold this year
`QTEANPREC`	Quantity sold the previous year
`LAST SALE`	Date of last sale
`QTERECUE`	Quantity of the last delivery
`LASTDELIVERY`	Date of last delivery
`SELLPRICE`	Selling price
`COST`	Purchase cost
`MINCDE`	Minimum Order Quantity
`MINSTOCK`	Minimum stock level
`QTESTOCK`	Quantity in stock

Its contents could be as follows:

6.1.3. the ORDERS table

It stores information about orders placed by customers. Its structure is as follows:

`NOCMD`	A unique identifier for an order - primary key
`CUSTID`	Customer ID for this order - foreign key - reference to CUSTOMERS(ID)
`ORDER_DATE`	Date this order was entered
`CANCELED`	O (Yes) if the order was canceled and N (No) otherwise.

6.1.4. The DETAILS table

It contains the details of an order, i.e., the titles and quantities of the books ordered. Its structure is as follows:

`NOCMD`	Order number - foreign key referencing the NOCMD column of the COMMANDES table
`ISBN`	Ordered book number - foreign key referencing the ISBN column in the BOOKS table
`QTE`	Ordered quantity

Its contents could be as follows:

Above, we see that order No. 3 (NOCMD) involves three books. This means that the customer ordered three books at the same time. This customer’s records can be found in the [ORDERS] table, where we see that order No. 3 was placed by customer No. 5. The [CUSTOMERS] table tells us that customer No. 5 is the company NetLogos in Segré.

6.2. The SELECT Statement

Here, we aim to deepen our understanding of the SELECT statement by introducing new syntaxes for it.

6.2.1. Syntax of a multi-table query

`syntax`	SELECT column1, column2, ... FROM table1, table2, ..., tablep WHERE condition ORDER BY ...
`action`	The novelty here is that the columns column1, column2, ... come from multiple tables table1, table2, ... If two tables have columns with the same name, the ambiguity is resolved using the notation tablei.columnj. The condition can apply to columns from different tables.

How it works

`1`	The Cartesian product of tables table1, table2, ..., tablep is constructed. If n_i is the number of rows in table_i, the resulting table has n₁n₂...n_p* rows containing all the columns from the different tables.
`2`	The WHERE condition is applied to this table. A new table is thus produced
`3`	This table is sorted according to the method specified in ORDER.
`4`	The columns specified after SELECT are displayed.

Examples

We use the tables presented earlier. We want to see the details of orders placed after September 25:

SQL>select details.nocmd,isbn,qte from commandes,details
  where commandes.datecmd>'25-sep-91'
  and details.nocmd=commandes.nocmd

Note that after FROM, we list the names of all the tables whose columns we are referencing. In the previous example, the selected columns all belong to the DETAILS table. However, the condition references the ORDERS table. Hence the need to list the latter after FROM. The operation that tests for equality between columns in two different tables is often called an equijoin.

The SELECT query could also have been written as follows:

SQL> select details.nocmd,isbn,qte from commandes
    inner join details on details.nocmd=commandes.nocmd
    where commandes.datecmd>'25-sep-91'

Let’s continue with our examples. We want the same result as before but with the title of the ordered book, rather than its ISBN:

SQL>select commandes.nocmd, articles.titre, details.qte
  from commandes,articles,details
  where commandes.datecmd>'25-sep-91'
  and details.nocmd=commandes.nocmd
  and details.isbn=articles.isbn

The same result is obtained with the following SQL query, which is less readable:

SQL> select details.nocmd,articles.titre,details.qte from details
    inner join commandes on details.nocmd=commandes.nocmd
    inner join articles on  details.isbn=articles.isbn
    where commandes.datecmd>'25-sep-91'

Above, two inner joins are performed with the [DETAILS] table:

one with the [ORDERS] table to access the order date of a book
one with the [ARTICLES] table to access the title of the ordered book

We also want the name of the customer placing the order:

SQL>select commandes.nocmd, articles.titre, qte ,clients.nom
  from commandes,details,articles,clients
  where commandes.datecmd>'25-sep-91'
  and details.nocmd=commandes.nocmd
  and details.isbn=articles.isbn
  and commandes.idcli=clients.id

We also want the order dates and a display of these dates in descending order:

SQL>select commandes.nocmd, commandes.datecmd, articles.titre, qte ,clients.nom
    from commandes,details,articles,clients
    where commandes.datecmd>'25-sep-91'
    and details.nocmd=commandes.nocmd
    and details.isbn=articles.isbn
    and commandes.idcli=clients.id
    order by commandes.datecmd descending

Here are a few rules to follow when creating joins:

After SELECT, list the columns you want to display. If the column exists in multiple tables, precede it with the table name.
After FROM, list all the tables that will be queried by the SELECT statement, i.e., the tables containing the columns listed after SELECT and WHERE.

6.2.2. Self-join

We want to find the books that have a higher retail price than the book 'Using SQL':

SQL>select a.titre from articles a, articles b
  where b.titre='Using SQL'
  and a.prixvente>b.prixvente

The two tables in the join are identical here: the articles table. To distinguish them, we give them aliases: from articles a, articles b. The alias for the first table is a, and the alias for the second is b. This syntax can be used even if the tables are different. When using an alias, it must be used throughout the SELECT statement in place of the table it refers to.

6.2.3. Outer Join

We want to identify the customers who made a purchase in September, along with the order date. The other customers are displayed without this date:

SQL>select clients.nom,commandes.datecmd from clients
    left outer join commandes  on clients.id=commandes.idcli
    where datecmd between '01-sep-91' and '30-sep-91'

It is surprising here that we do not get the correct result. We should have all the customers present in the [CLIENTS] table, which is not the case. When we think about how an outer join works, we realize that customers who haven’t made a purchase have been matched with an empty row in the ORDERS table and therefore with an empty date (a NULL value in SQL terminology). This date doesn’t meet the condition set for the date, so the corresponding customer isn’t displayed. Let’s try something else:

SQL>select clients.nom,commandes.datecmd from clients
    left outer join commandes  on clients.id=commandes.idcli
    where (commandes.datecmd between '01-sep-91' and '30-sep-91')
        or (commandes.datecmd is null)

This time, we get the correct answer to our question.

6.2.4. Nested queries

syntax

SELECT column[s] FROM table[s]

WHERE expression query operator

ORDER BY ...

How it works

A query is a SELECT statement that returns a set of 0, 1, or more values. We then have a WHERE condition of the type

expression operator (val1, val2, ..., vali)

expression and vali must be of the same type. If the query returns a single value, we are reduced to a condition of the type

expression operator value

which we are familiar with. If the query returns a list of values, we can use the following operators:

IN

expression IN (val1, val2, ..., vali): true if expression evaluates to one of the elements in the list vali.

NOT IN

opposite of IN

ANY

must be preceded by =, !=, >, >=, <, <=

expression >= ANY (val1, val2, .., valn): true if expression is >= one of the values vali in the list

ALL

must be preceded by =, !=, >, >=, <, <=

expression >= ALL (val1, val2, .., valn): true if expression is >= all valid values in the list

EXISTS

query: true if the query returns at least one row.

Examples

Let’s revisit the question already solved by an equijoin: display the titles with a selling price higher than that of the book 'Using SQL'.

SQL>select titre from ARTICLES
    where prixvente > (select prixvente from ARTICLES where titre='Using SQL')

This solution seems more intuitive than the equijoin. We perform an initial filter with a SELECT, then a second one on the resulting set. We can perform several filters in sequence this way.

We want to find the titles with a selling price higher than the average selling price:

SQL> select titre from ARTICLES
    where prixvente > (select avg(prixvente) from ARTICLES)

Which customers ordered the titles returned by the previous query?

SQL>select distinct idcli from COMMANDES,DETAILS
    where DETAILS.isbn in
    (select isbn from ARTICLES where prixvente
        > (select avg(prixvente) from ARTICLES))
    and COMMANDES.nocmd=DETAILS.nocmd

Explanation

We select from the DETAILS table the ISBN codes found among the books with a price higher than the average book price.
In the rows selected in the previous step, the customer ID (IDCLI) is not present. It is found in the ORDERS table. The link between the two tables is established via the order number (NOCMD), hence the join condition ORDERS.nocmd=DETAILS.nocmd.
A single customer may have purchased one of the relevant books multiple times, in which case their IDCLI code would appear multiple times. To avoid this, we place the DISTINCT keyword after SELECT. DISTINCT generally eliminates duplicates from the rows returned by a SELECT query.
To retrieve the customer's name, we would need to perform an additional join between the ORDERS and CUSTOMERS tables, as shown in the following query.

SQL> select distinct CLIENTS.nom from COMMANDES,DETAILS,CLIENTS
    where DETAILS.isbn in
    (select isbn from ARTICLES where prixvente
        > (select avg(prixvente) from ARTICLES))
    and COMMANDES.nocmd=DETAILS.nocmd
    and COMMANDES.IDCLI=CLIENTS.ID

Find customers who haven't placed an order since September 24:

SQL>select nom from CLIENTS
    where clients.id not in
    (select distinct commandes.idcli from commandes where datecmd>='24-sep-91')

We have seen that rows can be filtered in ways other than using the WHERE clause: by using the HAVING clause in conjunction with the GROUP BY clause. The HAVING clause filters groups of rows.

Just like with the WHERE clause, the syntax


     HAVING expression opérateur requête

is possible, with the constraint already mentioned that expression must be one of the expressions expri in the clause


     GROUP BY expr1, expr2, ...

Examples

What are the sales figures for books priced over 200F?

First, let’s display the quantities sold by title:

SQL>select ARTICLES.titre,sum(qte) QTE from ARTICLES, DETAILS
    where DETAILS.isbn=ARTICLES.isbn
    group by titre

Now, let’s filter the titles:

SQL> select ARTICLES.titre,sum(qte) QTE from ARTICLES, DETAILS
    where DETAILS.isbn=ARTICLES.isbn
    group by titre
    having titre in (select titre from ARTICLES where prixvente>200)

Perhaps more obviously, we could have written:

SQL>select ARTICLES.titre,sum(qte) QTE from ARTICLES, DETAILS
    where DETAILS.isbn=ARTICLES.isbn
    and ARTICLES.prixvente>200
    group by titre

6.2.5. Nested Queries

In the case of nested queries, there is a parent query (the outermost query) and a child query (the innermost query). The parent query is evaluated only after the child query has been fully evaluated.

Correlated queries have the same syntax, with the following minor difference: the child query performs a join on the parent query’s table. In this case, the parent-child query pair is evaluated repeatedly for each row in the parent table.

Example

Let’s revisit the example where we want the names of customers who haven’t placed an order since September 24:

SQL> 
select nom from clients
    where not exists
        (select idcli from commandes
            where datecmd>='24-sep-91'
                and commandes.idcli=clients.id)

The parent query operates on the customers table. The child query performs a join between the customers and orders tables. This is therefore a correlated query. For each row in the clients table, the child query runs: it searches for the client’s ID in orders placed after September 24. If it finds none (not exists), the client’s name is displayed. Then, it moves on to the next row in the clients table.

6.2.6. Criteria for Writing the SELECT Statement

We have seen, on several occasions, that it is possible to obtain the same result using different SELECT statements. Let’s take an example: Display customers who have placed an order:

Join

SQL>
select distinct nom from clients,commandes
    where clients.id=commandes.idcli

Nested queries

SQL> 
select nom from clients
    where id in (select idcli from commandes)

returns the same result.

Correlated queries

SQL>
select nom from clients
    where exists (select * from commandes where commandes.idcli=clients.id)

returns the same result.

Authors Christian MAREE and Guy LEDANT, in their book 'SQL: Introduction, Programming, and Mastery,' suggest a few selection criteria:

Performance

The user does not know how the DBMS "manages" to find the results they request. It is therefore only through experience that they will discover that one query is more efficient than another. MAREE and LEDANT assert from experience that correlated queries generally seem slower than nested queries or joins.

Formulation

Formulation using nested queries is often more readable and intuitive than joins. However, it is not always usable. Two points in particular should be noted:

The tables containing the columns specified in the SELECT clause (SELECT col1, col2, ...) must be listed after the FROM keyword. The Cartesian product of these tables is then performed, which is known as a join.

When the query displays results from a single table, and filtering the rows of that table requires consulting another table, nested queries can be used.

6.3. Syntax Extensions

For convenience, we have mostly presented abbreviated syntaxes for the various commands. In this section, we present their expanded syntaxes. They are self-explanatory because they are analogous to those of the widely studied SELECT command.

INSERT

`syntax1`	INSERT INTO table (col1, col2, ...) VALUES (val1, val2, ...)
`syntax2`	INSERT INTO table (col1, col2, ..) (query)
`explanation`	These two syntaxes have been presented

DELETE

`syntax1`	DELETE FROM table WHERE condition
`explanation`	This syntax is well-known. Note that the condition may contain a query using the syntax WHERE expression operator (query)

UPDATE

`syntax1`	UPDATE table SET col1=expr1, col2=expr2, ... WHERE condition
`explanation`	This syntax has already been presented. Note that the condition may contain a query using the syntax WHERE expression operator (query)

`syntax2`	UPDATE table SET (col1, col2, ..) = query1, (cola, colb, ..) = query2, ... WHERE condition
`explanation`	The values assigned to the various columns may come from a query.