Common Table Expressions (CTEs)

Last modified: March 30, 2025
You are here:

SQL is a powerful language to retrieve data from relational databases. It allows us to ask complex questions and retrieve insights. However, as our questions become more sophisticated, our SQL queries can often grow into intimidating, multi-layered behemoths. Nested subqueries, complex joins, and repeated logic can quickly obscure a sql query, making it difficult to read, debug, and maintain. Common Table Expression (CTE) might offer a simple solution. Think of CTEs as a way to bring structure, clarity, and elegance to your SQL statements. They are a powerful feature, available in almost every relational database system, designed to simplify complex queries and make your SQL code significantly more understandable.

This article will dive deep into the world of CTEs: what they are, how to structure them, and, most importantly, when they become indispensable tools in your data manipulation arsenal. We’ll cap it off with two practical examples showcasing their power.

What Exactly is a Common Table Expression (CTE)?

At its core, a Common Table Expression is a named, temporary result set that you define within the execution scope of a single SQL statement. This can a SELECT but also INSERT, UPDATE, DELETE, and MERGE statements can use CTEs.

Think of it like defining a temporary view or creating a variable that holds the result of a subquery, but one that only exists for the duration of the main query it’s associated with. Once the main query finishes executing, the CTE is gone.

CTEs are defined using the WITH keyword, placed before the main SQL statement. They allow you to break down a complex query into smaller, logical, named building blocks.

How to Use CTEs: The Syntax Breakdown

The basic syntax for a CTE looks like this:

WITH cte_name [(column_name1, column_name2, ...)] AS (
    -- The query definition for the CTE
    SELECT column1, column2
    FROM your_table
    WHERE condition
)
-- The main query that uses the CTE
SELECT *
FROM cte_name
WHERE some_other_condition;

CTE components are:

  1. WITH Clause: This keyword signals the start of one or more CTE definitions.
  2. cte_name: This is the name you give to your temporary result set. Choose descriptive names that reflect the data or calculation the CTE represents (e.g., HighValueCustomers, DepartmentSalaries, ProductSalesSummary).
  3. (column_name1, column_name2, ...) (Optional): You can explicitly list the column names for your CTE. If you omit this, the CTE inherits the column names from the SELECT list within its definition. Explicitly naming columns can improve readability, especially if the CTE’s query involves complex expressions or functions where default names might be unclear.
  4. AS Keyword: This separates the CTE name (and optional column list) from its defining query.
  5. (SELECT ...): This is the standard SQL query that generates the result set for the CTE. It can be as simple or complex as needed, involving joins, aggregations, window functions, etc.
  6. Main Query: This is the final SELECT, INSERT, UPDATE, or DELETE statement that can reference the cte_name as if it were a regular table or view. This main query must immediately follow the CTE definition(s).

Defining Multiple CTEs:

You can define multiple CTEs in a single WITH clause, separating them with commas. Crucially, a later CTE can reference preceding CTEs defined within the same WITH clause.

WITH
  cte1_name AS (
      SELECT id, value
      FROM source_table1
      WHERE condition1
  ), -- Comma separates CTE definitions
  cte2_name AS (
      SELECT cte1.id, cte1.value, t2.extra_data
      FROM cte1_name cte1 -- cte2 references cte1
      JOIN source_table2 t2 ON cte1.id = t2.foreign_id
      WHERE condition2
  )
-- Main query using one or both CTEs
SELECT cte2.id, cte2.extra_data, cte1.value -- Can reference columns from both if joined/needed
FROM cte2_name cte2
JOIN cte1_name cte1 ON cte2.id = cte1.id -- Example join in main query
WHERE final_condition;

Notice how cte2_name can directly query cte1_name. This chaining capability is one of the most powerful aspects of CTEs for breaking down sequential logic!

When Should You Use Common Table Expressions?

While you could often achieve the same results using derived tables (subqueries in the FROM clause) or temporary tables, CTEs offer distinct advantages in specific scenarios:

  1. Improving Readability and Maintainability: This is arguably the biggest win. Complex queries involving multiple levels of nesting or intricate logic become significantly easier to understand when broken down into named CTEs. Each CTE handles a specific piece of logic, making the final query read more like a sequence of steps rather than a tangled mess. Debugging also becomes simpler, as you can often test each CTE’s logic independently.

  2. Reducing Repetition (Code Reusability within a Query): If you need to use the same calculated result set multiple times within a single query (e.g., in different parts of a UNION ALL or joining it to different tables), a CTE allows you to define it once and reference it multiple times by name. This avoids repeating the same subquery logic, making the code shorter and less prone to copy-paste errors.

  3. Simplifying Complex Joins and Aggregations: CTEs are excellent for performing intermediate calculations or aggregations. You can create a CTE to calculate per-group summaries (like average sales per region) and then join that CTE back to the original data or another CTE for further analysis (like finding products selling above the regional average).

  4. Handling Hierarchical or Sequential Data: CTEs provide a clean mechanism for querying data with inherent hierarchies, like organizational charts (employee-manager relationships) or bill-of-materials structures (parts and sub-parts). They can achieve this using a specific structure that refers back to itself, allowing traversal through levels of the hierarchy. (We’ll see an example of this shortly).

  5. Creating More Organized INSERT, UPDATE, DELETE Statements: You can use CTEs to define the set of rows you want to modify or insert data from, making these DML statements clearer, especially when the criteria for selecting the target rows are complex.

Important Note on Performance: While CTEs drastically improve readability, they don’t automatically guarantee better performance compared to an equivalent subquery. Modern database optimizers are often smart enough to treat them similarly. The primary benefit is developer productivity and code clarity, although in some complex cases, structuring with CTEs might help the optimizer find a better execution plan. Don’t choose CTEs purely for perceived speed gains; choose them for structure and maintainability.

Examples: CTEs in Action

Here are two cases to illustrate the power of CTEs with two common scenarios where they significantly simplify the query:

Example 1: Finding Departments with Above-Average Salaries

Imagine we have two tables: Employees (with EmployeeID, Name, DepartmentID, Salary) and Departments (with DepartmentID, DepartmentName). We want to find all departments whose average salary is higher than the overall average salary across the entire company.

Without CTEs, this might involve nested subqueries, making it hard to follow:

-- Harder to read version using subqueries
SELECT
    d.DepartmentName
FROM
    Departments d
JOIN
    (SELECT DepartmentID, AVG(Salary) AS AvgDeptSalary FROM Employees GROUP BY DepartmentID) AS DeptAvg
    ON d.DepartmentID = DeptAvg.DepartmentID
WHERE
    DeptAvg.AvgDeptSalary > (SELECT AVG(Salary) FROM Employees); -- Subquery for overall average

Now, let’s refactor this using CTEs:

WITH
  -- CTE 1: Calculate the average salary for each department
  DepartmentAverages AS (
    SELECT
      DepartmentID,
      AVG(Salary) AS AvgSalaryForDept
    FROM Employees
    GROUP BY DepartmentID
  ),

  -- CTE 2: Calculate the overall average salary across the company
  CompanyAverage AS (
    SELECT
      AVG(Salary) AS OverallAvgSalary
    FROM Employees
  )

-- Main Query: Join department averages with company average and filter
SELECT
  d.DepartmentName,
  da.AvgSalaryForDept
FROM
  DepartmentAverages da
JOIN
  Departments d ON da.DepartmentID = d.DepartmentID
CROSS JOIN -- Join with the single-row CompanyAverage CTE
  CompanyAverage ca -- No ON clause needed for cross join to single row
WHERE
  da.AvgSalaryForDept > ca.OverallAvgSalary
ORDER BY
  d.DepartmentName;

Why the CTE version is better:

  • Clarity: The logic is broken into distinct, named steps: calculate department averages (DepartmentAverages), calculate the company average (CompanyAverage), then compare them.
  • Readability: The main query is much cleaner, simply joining the pre-calculated results and applying the final filter.
  • Maintainability: If you need to change how department averages are calculated (e.g., exclude certain employee types), you only modify the DepartmentAverages CTE. The overall structure remains clear.

Example 2: Navigating an Employee Hierarchy

Let’s say we have an Employees table with EmployeeID, Name, and ManagerID (which is the EmployeeID of the employee’s manager; it’s NULL for the CEO). We want to find a specific manager (say, Employee ID 10) and all the employees who report up to them, directly or indirectly, through the management chain.

This is a classic hierarchical data problem. CTEs provide an elegant way to traverse this structure using a self-referencing pattern.

WITH EmployeeHierarchy AS (
    -- Anchor Member: Start with the top-level manager(s) of interest
    SELECT
        EmployeeID,
        Name,
        ManagerID,
        0 AS HierarchyLevel -- Start level at 0
    FROM
        Employees
    WHERE
        EmployeeID = 10 -- Our starting manager

    UNION ALL -- Combine anchor with recursive results

    -- Recursive Member: Find employees whose manager is already in the hierarchy
    SELECT
        e.EmployeeID,
        e.Name,
        e.ManagerID,
        eh.HierarchyLevel + 1 -- Increment level for each step down
    FROM
        Employees e
    INNER JOIN
        EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID -- Join employee to manager found in previous step
)

-- Main Query: Select all employees found in the hierarchy
SELECT
    EmployeeID,
    Name,
    ManagerID,
    HierarchyLevel
FROM
    EmployeeHierarchy
ORDER BY
    HierarchyLevel, Name;

Here is how this works:

  1. WITH EmployeeHierarchy AS (...): Defines the CTE.
  2. Anchor Member: The first SELECT statement is the base case. It selects the starting point(s) – in this case, the manager with EmployeeID = 10. It initializes the HierarchyLevel to 0.
  3. UNION ALL: This operator combines the results of the anchor member with the results of the recursive member. UNION ALL is typically used here for performance as duplicates are unlikely in standard hierarchies and it avoids the overhead of checking for them.
  4. Recursive Member: The second SELECT statement is where the "recursion" happens. It selects employees (e) whose ManagerID matches the EmployeeID of someone already in the EmployeeHierarchy result set (eh). It joins the Employees table back to the CTE itself. It also increments the HierarchyLevel.
  5. Execution Flow: The database engine first executes the anchor member. Then, it repeatedly executes the recursive member, using the results from the previous iteration as the input for the EmployeeHierarchy reference in the INNER JOIN, until the recursive member returns no more rows (i.e., it has reached the bottom of the hierarchy branch).
  6. Main Query: Finally, the main SELECT * FROM EmployeeHierarchy retrieves all the rows accumulated by the anchor and recursive members.

This CTE elegantly traverses the management chain downwards from the specified manager, providing a clear list of all direct and indirect reports along with their level in the hierarchy relative to the starting manager. Achieving this with traditional SQL often involves complex self-joins or procedural code.

Conclusion

Common Table Expressions are more than just syntactic sugar; they are a fundamental tool for writing clearer, more structured, and more maintainable SQL. By allowing you to name and organize intermediate steps in your data processing logic, CTEs transform potentially convoluted queries into understandable workflows.

While they might seem a little different at first compared to traditional subqueries, the benefits in readability and organization, especially for complex analytical queries or hierarchical data traversal, are substantial. The next time you find yourself wrestling with a multi-level subquery or needing to reuse a calculation within a single statement, give CTEs a try. You might find they become an indispensable part of your SQL toolkit, helping you tame the complexity and focus on the insights hidden within your data.

Was this article helpful?
Dislike 0
Views: 1