Order of Execution of SQL Clauses

Order of Execution of SQL Clauses

Introduction

SQL is the predominant language for database queries, and proficiency in SQL is important for accurate data querying. This requires a comprehensive understanding of the sequence in which SQL executes its clauses. Debugging your SQL script effectively and creating precise queries necessitates knowledge of how a database interprets and executes your SQL query.

In this article, we will discuss the specific sequence in which the clauses of an SQL query execute. However, if your query includes sub-queries or Common Table Expressions (CTE), bear in mind that these will always be executed first before any action takes place on the main query. Nonetheless, the execution order of clauses within a CTE or subquery remains unchanged.

We will be referencing the following two tables:

Customers

customer_id customer
1 Bruce Wayne
2 Clark Kent
3 Tony Stark
4 Bruce Banner
5 Peter Parker

Purchases

purchase_id item price customer_id
1 Red Cape 3.75 2
2 Web Shooter 9.26 5
3 Batarang 23.24 1
4 Smoke Pellet 2.99 1
5 Red Boots 17.41 2
6 Sunglasses 299.99 3
7 Lab Coat 74.23 4

Here is our SQL query, which identifies the two customers who have spent the most money, excluding purchases exceeding $200 and customers whose total purchases are less than $10:

SELECT 
   customers.customer_id,
   customers.customer,
   SUM(price) as 
   total_money_spent
FROM customers 

INNER JOIN
    purchases
on customers.customer_id = purchases.customer_id

WHERE
    price < 200

GROUP BY 
   customers.customer_id,
   customers.customer

HAVING
   total_money_spent > 10

ORDER BY
   total_money_spent desc

LIMIT
   2

Here is the sequence of execution, breaking down what occurs at each stage:

  • FROM (including joins: INNER JOIN, LEFT JOIN, RIGHT JOIN, OUTER JOIN, CROSS JOIN, etc.)
  • WHERE
  • GROUP BY
  • HAVING
  • SELECT
  • ORDER BY
  • LIMIT

Step 1: FROM and JOINS

FROM customers 

INNER JOIN
    purchases
on customers.customer_id = purchases.customer_id

The complete customers table is invoked and combined with the purchases table based on the customer_id, resulting in a new primary table that includes the matches from both tables. Therefore, after our query execution, the database assembles this primary table:

customer_id customer purchase_id purchase price
1 Bruce Wayne 3 Batarang 23.24
1 Bruce Wayne 4 Smoke Pellet 2.99
2 Clark Kent 1 Red Cape 3.75
2 Clark Kent 5 Red Boots 17.41
3 Tony Stark 6 Sunglasses 299.99
4 Bruce Banner 7 Lab Coat 74.23
5 Peter Parker 2 Web Shooter 9.26

Step 2: The WHERE Clause

WHERE
    price < 200

The WHERE clause serves as our filter, enabling us to omit undesired data from the main table and retain the data we wish to view. In this scenario, we are retaining all purchases below $200, thereby excluding the sunglasses purchase valued at $299.99.

Note: It's important to bear in mind that you can't utilize a WHERE clause on any columns that are undergoing aggregation (sum, avg, etc) in the statement. For this purpose, you'll need to use the HAVING clause, which we'll discuss later. If you are aggregating, the WHERE clause will omit rows BEFORE the aggregation commences. Therefore, in the context of our table, we will NOT be excluding total purchases exceeding $200.

customer_id customer purchase_id purchase price
1 Bruce Wayne 3 Batarang 23.24
1 Bruce Wayne 4 Smoke Pellet 2.99
2 Clark Kent 1 Red Cape 3.75
2 Clark Kent 5 Red Boots 17.41
4 Bruce Banner 7 Lab Coat 74.23
5 Peter Parker 2 Web Shooter 9.26

Step 3: Group Columns with GROUP BY

GROUP BY 
   customers.customer_id,
   customers.customer

As there is an aggregation (SUM) in our query, GROUP BY will execute, aggregating the price by the two non-aggregated columns (customer_id and customer).

Note: When there is an aggregation in your statement, it's imperative to group by all non-aggregated columns that you are incorporating into your query. In this instance, since we are including both customer_id and customer in our SELECT clause, we must GROUP BY both of these columns. The sequence here is deliberate as it ensures that the query executes GROUP BY with their distinct ID first and then their name.

customer_id customer price
1 Bruce Wayne 26.23
21.16
4 Bruce Banner 74.23

Step 4: The HAVING Clause

HAVING
   total_money_spent > 10
Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

This clause enables us to filter by our aggregation as this occurs post-GROUP BY, when the data has been aggregated. It can't be used in place of a WHERE clause. In this scenario, we're excluding total purchases under $10 (poor Peter Parker).

customer_id customer price
1 Bruce Wayne 26.23
2 Clark Kent Peter Parker 21.16
4 Bruce Banner 74.23

Step 5: SELECT Statement

SELECT 
   customers.customer_id,
   customers.customer,
   SUM(price) as 
   total_money_spent

This clause specifies the particular columns we wish to extract from the primary table we've assembled. In this instance, since we've used the GROUP BY clause and performed aggregation, we've already narrowed down to these columns. However, outside of aggregation, this step is pivotal to ensure that we're only extracting the desired data and assigning a suitable alias to each column, if required.

customer_id customer total_money_spent
1 Bruce Wayne 26.23
2 Clark Kent 21.16
4 Bruce Banner 74.23

Step 6: Utilizing ORDER BY

ORDER BY
   total_money_spent DESC

This clause facilitates the sorting of the table. We can ORDER BY column ASC (Ascending, which is the default) or ORDER BY column DESC (descending). Additionally, you can ORDER BY multiple columns. In this instance, we are sorting by our aggregated column in descending order (note that ORDER BY is one of the few instances where the aggregated column's alias can be used because it occurs after the SELECT clause)

customer_id customer total_money_spent
4 Bruce Banner 74.23
1 Bruce Wayne 26.23
2 Clark Kent 21.16

Step 7: Setting LIMIT

LIMIT
   2

This clause restricts the number of data rows we wish to retrieve. For instance, in our scenario, we are interested in only the top two customers. Therefore, setting a LIMIT of 2 confines our results to Bruce Banner and Bruce Wayne.

customer_id customer total_money_spent
4 Bruce Banner 74.23
1 Bruce Wayne 26.23

When you're merging multiple SQL queries vertically using Union or Union All, it's important to remember that both ORDER BY and LIMIT indicate the conclusion of the clause. Here's a brief example:

SELECT
    customer_id,
    customer
FROM customers

WHERE
    customers.customer = 'Clark Kent'

UNION 

SELECT
    customer_id,
    customer
FROM customers

WHERE
    customer = 'Bruce Wayne'

ORDER BY
    customer_id

LIMIT
    2

While the above query won't result in any errors, the ORDER BY and LIMIT won't execute until after the queries have been merged into a single table, instead of just on the second query. Consider the following example as well:

SELECT
    customer_id,
    customer
FROM customers

WHERE
    customers.customer = 'Clark Kent'

ORDER BY
    customer_id

UNION 

SELECT
    customer_id,
    customer
FROM customers

WHERE
    customer = 'Bruce Wayne'

This example will generate an error because ORDER BY indicates the termination of the query. The same applies to LIMIT, so it is impossible to execute the second SQL query after UNION at the bottom. If you wish to include more than one ORDER BY and/or LIMIT in a UNION/UNION ALL statement, you must use parentheses to enclose the queries:

(SELECT
    customer_id,
    customer
FROM customers

WHERE
    customers.customer = 'Clark Kent'

ORDER BY
    customer_id) 

UNION 

(SELECT
    customer_id,
    customer
FROM customers

WHERE
    customer = 'Bruce Wayne'

ORDER BY
    customer_id)

Conclusion

Understanding the order in which SQL clauses execute can enhance your ability to write dynamic, accurate, and efficient queries that extract the precise data required for your projects. This understanding will also aid in troubleshooting your SQL queries when data retrieval is erroneous, by facilitating careful tracking of your steps in the sequence of execution until the problem is identified. Happy querying!

Last Updated: June 27th, 2023
Was this article helpful?

Improve your dev skills!

Get tutorials, guides, and dev jobs in your inbox.

No spam ever. Unsubscribe at any time. Read our Privacy Policy.

Make Clarity from Data - Quickly Learn Data Visualization with Python

Learn the landscape of Data Visualization tools in Python - work with Seaborn, Plotly, and Bokeh, and excel in Matplotlib!

From simple plot types to ridge plots, surface plots and spectrograms - understand your data and learn to draw conclusions from it.

© 2013-2024 Stack Abuse. All rights reserved.

AboutDisclosurePrivacyTerms