Introduction
SQL is the predominant language for database queries, and proficiency in SQL is important for accurate data querying. This requires a comprehensive understanding of the sequence in which SQL executes its clauses. Debugging your SQL script effectively and creating precise queries necessitates knowledge of how a database interprets and executes your SQL query.
In this article, we will discuss the specific sequence in which the clauses of an SQL query execute. However, if your query includes sub-queries or Common Table Expressions (CTE), bear in mind that these will always be executed first before any action takes place on the main query. Nonetheless, the execution order of clauses within a CTE or subquery remains unchanged.
We will be referencing the following two tables:
Customers
customer_id | customer |
---|---|
1 | Bruce Wayne |
2 | Clark Kent |
3 | Tony Stark |
4 | Bruce Banner |
5 | Peter Parker |
Purchases
purchase_id | item | price | customer_id |
---|---|---|---|
1 | Red Cape | 3.75 | 2 |
2 | Web Shooter | 9.26 | 5 |
3 | Batarang | 23.24 | 1 |
4 | Smoke Pellet | 2.99 | 1 |
5 | Red Boots | 17.41 | 2 |
6 | Sunglasses | 299.99 | 3 |
7 | Lab Coat | 74.23 | 4 |
Here is our SQL query, which identifies the two customers who have spent the most money, excluding purchases exceeding $200 and customers whose total purchases are less than $10:
SELECT
customers.customer_id,
customers.customer,
SUM(price) as
total_money_spent
FROM customers
INNER JOIN
purchases
on customers.customer_id = purchases.customer_id
WHERE
price < 200
GROUP BY
customers.customer_id,
customers.customer
HAVING
total_money_spent > 10
ORDER BY
total_money_spent desc
LIMIT
2
Here is the sequence of execution, breaking down what occurs at each stage:
FROM
(including joins:INNER JOIN
,LEFT JOIN
,RIGHT JOIN
,OUTER JOIN
,CROSS JOIN
, etc.)WHERE
GROUP BY
HAVING
SELECT
ORDER BY
LIMIT
Step 1: FROM and JOINS
FROM customers
INNER JOIN
purchases
on customers.customer_id = purchases.customer_id
The complete customers
table is invoked and combined with the purchases
table based on the customer_id
, resulting in a new primary table that includes the matches from both tables. Therefore, after our query execution, the database assembles this primary table:
customer_id | customer | purchase_id | purchase | price |
---|---|---|---|---|
1 | Bruce Wayne | 3 | Batarang | 23.24 |
1 | Bruce Wayne | 4 | Smoke Pellet | 2.99 |
2 | Clark Kent | 1 | Red Cape | 3.75 |
2 | Clark Kent | 5 | Red Boots | 17.41 |
3 | Tony Stark | 6 | Sunglasses | 299.99 |
4 | Bruce Banner | 7 | Lab Coat | 74.23 |
5 | Peter Parker | 2 | Web Shooter | 9.26 |
Step 2: The WHERE Clause
WHERE
price < 200
The WHERE
clause serves as our filter, enabling us to omit undesired data from the main table and retain the data we wish to view. In this scenario, we are retaining all purchases below $200, thereby excluding the sunglasses purchase valued at $299.99.
Note: It's important to bear in mind that you can't utilize a WHERE
clause on any columns that are undergoing aggregation (sum, avg, etc) in the statement. For this purpose, you'll need to use the HAVING
clause, which we'll discuss later. If you are aggregating, the WHERE
clause will omit rows BEFORE
the aggregation commences. Therefore, in the context of our table, we will NOT be excluding total purchases exceeding $200.
customer_id | customer | purchase_id | purchase | price |
---|---|---|---|---|
1 | Bruce Wayne | 3 | Batarang | 23.24 |
1 | Bruce Wayne | 4 | Smoke Pellet | 2.99 |
2 | Clark Kent | 1 | Red Cape | 3.75 |
2 | Clark Kent | 5 | Red Boots | 17.41 |
4 | Bruce Banner | 7 | Lab Coat | 74.23 |
5 | Peter Parker | 2 | Web Shooter | 9.26 |
Step 3: Group Columns with GROUP BY
GROUP BY
customers.customer_id,
customers.customer
As there is an aggregation (SUM
) in our query, GROUP BY
will execute, aggregating the price by the two non-aggregated columns (customer_id
and customer
).
Note: When there is an aggregation in your statement, it's imperative to group by all non-aggregated columns that you are incorporating into your query. In this instance, since we are including both customer_id
and customer
in our SELECT
clause, we must GROUP BY
both of these columns. The sequence here is deliberate as it ensures that the query executes GROUP BY
with their distinct ID first and then their name.
customer_id | customer | price |
---|---|---|
1 | Bruce Wayne | 26.23 |
21.16 | ||
4 | Bruce Banner | 74.23 |
Step 4: The HAVING Clause
HAVING
total_money_spent > 10
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
This clause enables us to filter by our aggregation as this occurs post-GROUP BY
, when the data has been aggregated. It can't be used in place of a WHERE
clause. In this scenario, we're excluding total purchases under $10 (poor Peter Parker).
customer_id | customer | price | |
---|---|---|---|
1 | Bruce Wayne | 26.23 | |
2 | Clark Kent | Peter Parker | 21.16 |
4 | Bruce Banner | 74.23 |
Step 5: SELECT Statement
SELECT
customers.customer_id,
customers.customer,
SUM(price) as
total_money_spent
This clause specifies the particular columns we wish to extract from the primary table we've assembled. In this instance, since we've used the GROUP BY
clause and performed aggregation, we've already narrowed down to these columns. However, outside of aggregation, this step is pivotal to ensure that we're only extracting the desired data and assigning a suitable alias to each column, if required.
customer_id | customer | total_money_spent |
---|---|---|
1 | Bruce Wayne | 26.23 |
2 | Clark Kent | 21.16 |
4 | Bruce Banner | 74.23 |
Step 6: Utilizing ORDER BY
ORDER BY
total_money_spent DESC
This clause facilitates the sorting of the table. We can ORDER BY
column ASC
(Ascending, which is the default) or ORDER BY
column DESC
(descending). Additionally, you can ORDER BY
multiple columns. In this instance, we are sorting by our aggregated column in descending order (note that ORDER BY
is one of the few instances where the aggregated column's alias can be used because it occurs after the SELECT
clause)
customer_id | customer | total_money_spent |
---|---|---|
4 | Bruce Banner | 74.23 |
1 | Bruce Wayne | 26.23 |
2 | Clark Kent | 21.16 |
Step 7: Setting LIMIT
LIMIT
2
This clause restricts the number of data rows we wish to retrieve. For instance, in our scenario, we are interested in only the top two customers. Therefore, setting a LIMIT
of 2 confines our results to Bruce Banner and Bruce Wayne.
customer_id | customer | total_money_spent |
---|---|---|
4 | Bruce Banner | 74.23 |
1 | Bruce Wayne | 26.23 |
When you're merging multiple SQL queries vertically using Union or Union All, it's important to remember that both ORDER BY
and LIMIT
indicate the conclusion of the clause. Here's a brief example:
SELECT
customer_id,
customer
FROM customers
WHERE
customers.customer = 'Clark Kent'
UNION
SELECT
customer_id,
customer
FROM customers
WHERE
customer = 'Bruce Wayne'
ORDER BY
customer_id
LIMIT
2
While the above query won't result in any errors, the ORDER BY
and LIMIT
won't execute until after the queries have been merged into a single table, instead of just on the second query. Consider the following example as well:
SELECT
customer_id,
customer
FROM customers
WHERE
customers.customer = 'Clark Kent'
ORDER BY
customer_id
UNION
SELECT
customer_id,
customer
FROM customers
WHERE
customer = 'Bruce Wayne'
This example will generate an error because ORDER BY
indicates the termination of the query. The same applies to LIMIT
, so it is impossible to execute the second SQL query after UNION
at the bottom. If you wish to include more than one ORDER BY
and/or LIMIT
in a UNION
/UNION ALL
statement, you must use parentheses to enclose the queries:
(SELECT
customer_id,
customer
FROM customers
WHERE
customers.customer = 'Clark Kent'
ORDER BY
customer_id)
UNION
(SELECT
customer_id,
customer
FROM customers
WHERE
customer = 'Bruce Wayne'
ORDER BY
customer_id)
Conclusion
Understanding the order in which SQL clauses execute can enhance your ability to write dynamic, accurate, and efficient queries that extract the precise data required for your projects. This understanding will also aid in troubleshooting your SQL queries when data retrieval is erroneous, by facilitating careful tracking of your steps in the sequence of execution until the problem is identified. Happy querying!