Photo by DATAIDEA

Subqueries

Learn how to use queries within queries to solve complex data problems.

What is a Subquery?

A subquery (also called an inner query or nested query) is a query within another SQL query. Subqueries can be used in SELECT, FROM, WHERE, and HAVING clauses.

Sample Data

employees table:

+----+-----------+-----------+--------+
| id | name      | department| salary |
+----+-----------+-----------+--------+
| 1  | Alice     | IT        | 75000  |
| 2  | Bob       | Sales     | 50000  |
| 3  | Charlie   | IT        | 80000  |
| 4  | Diana     | HR        | 60000  |
| 5  | Eva       | Sales     | 55000  |
+----+-----------+-----------+--------+

Subquery in WHERE Clause

The most common use of subqueries.

Find employees earning more than the average salary

SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

Result:

+-----------+--------+
| name      | salary |
+-----------+--------+
| Alice     | 75000  |
| Charlie   | 80000  |
+-----------+--------+

How it works: 1. Inner query calculates: AVG(salary) = 64000 2. Outer query uses this value: WHERE salary > 64000

Find the highest-paid employee

SELECT name, salary
FROM employees
WHERE salary = (SELECT MAX(salary) FROM employees);

Subquery with IN Operator

Use subqueries that return multiple values with IN.

Find employees in departments that have more than one employee

SELECT name, department
FROM employees
WHERE department IN (
    SELECT department
    FROM employees
    GROUP BY department
    HAVING COUNT(*) > 1
);

Result:

+-----------+-----------+
| name      | department|
+-----------+-----------+
| Alice     | IT        |
| Bob       | Sales     |
| Charlie   | IT        |
| Eva       | Sales     |
+-----------+-----------+

Find employees in IT or Sales (simple example)

SELECT name, salary
FROM employees
WHERE department IN (
    SELECT DISTINCT department
    FROM employees
    WHERE department IN ('IT', 'Sales')
);

Subquery in FROM Clause

Use a subquery as a temporary table (also called a derived table).

Calculate department statistics

SELECT
    dept_stats.department,
    dept_stats.avg_salary,
    dept_stats.employee_count
FROM (
    SELECT
        department,
        AVG(salary) AS avg_salary,
        COUNT(*) AS employee_count
    FROM employees
    GROUP BY department
) AS dept_stats
WHERE dept_stats.avg_salary > 60000;

Result:

+-----------+------------+----------------+
| department| avg_salary | employee_count |
+-----------+------------+----------------+
| IT        | 77500      | 2              |
+-----------+------------+----------------+

!!! note "Alias Required" Subqueries in FROM clause must have an alias (e.g., AS dept_stats).

Subquery in SELECT Clause

Return a single value in the SELECT list.

Show each employee with the department average

SELECT
    name,
    salary,
    department,
    (SELECT AVG(salary)
     FROM employees e2
     WHERE e2.department = e1.department) AS dept_avg
FROM employees e1;

Result:

+-----------+--------+-----------+----------+
| name      | salary | department| dept_avg |
+-----------+--------+-----------+----------+
| Alice     | 75000  | IT        | 77500    |
| Bob       | 50000  | Sales     | 52500    |
| Charlie   | 80000  | IT        | 77500    |
| Diana     | 60000  | HR        | 60000    |
| Eva       | 55000  | Sales     | 52500    |
+-----------+--------+-----------+----------+

Correlated Subqueries

A subquery that references columns from the outer query.

Find employees earning more than their department average

SELECT name, department, salary
FROM employees e1
WHERE salary > (
    SELECT AVG(salary)
    FROM employees e2
    WHERE e2.department = e1.department
);

Result:

+-----------+-----------+--------+
| name      | department| salary |
+-----------+-----------+--------+
| Charlie   | IT        | 80000  |
| Eva       | Sales     | 55000  |
+-----------+-----------+--------+

How it works: - For each row in outer query, inner query executes - Inner query uses the current row's department - Compares the salary with that department's average

EXISTS Operator

Tests for the existence of rows in a subquery. Returns TRUE if subquery returns any rows.

Find departments that have employees

SELECT DISTINCT department
FROM employees e1
WHERE EXISTS (
    SELECT 1
    FROM employees e2
    WHERE e2.department = e1.department
    AND e2.salary > 70000
);

Result:

+-----------+
| department|
+-----------+
| IT        |
+-----------+

NOT EXISTS Example

Find departments with no high earners (salary > 70000):

SELECT DISTINCT department
FROM employees e1
WHERE NOT EXISTS (
    SELECT 1
    FROM employees e2
    WHERE e2.department = e1.department
    AND e2.salary > 70000
);

Multiple Row Subqueries

ANY Operator

Compare with any value returned by the subquery.

-- Find employees earning more than ANY Sales employee
SELECT name, salary
FROM employees
WHERE salary > ANY (
    SELECT salary
    FROM employees
    WHERE department = 'Sales'
);

This is equivalent to "greater than the minimum Sales salary."

ALL Operator

Compare with all values returned by the subquery.

-- Find employees earning more than ALL Sales employees
SELECT name, salary
FROM employees
WHERE salary > ALL (
    SELECT salary
    FROM employees
    WHERE department = 'Sales'
);

This is equivalent to "greater than the maximum Sales salary."

Nested Subqueries

Subqueries within subqueries.

Find employees in the highest-paying department

SELECT name, department, salary
FROM employees
WHERE department = (
    SELECT department
    FROM employees
    GROUP BY department
    ORDER BY AVG(salary) DESC
    LIMIT 1
);

Result:

+-----------+-----------+--------+
| name      | department| salary |
+-----------+-----------+--------+
| Alice     | IT        | 77500  |
| Charlie   | IT        | 80000  |
+-----------+-----------+--------+

Common Table Expressions (CTE)

An alternative to subqueries that's often more readable. Uses WITH clause.

Basic CTE

WITH dept_averages AS (
    SELECT
        department,
        AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department
)
SELECT
    e.name,
    e.salary,
    e.department,
    da.avg_salary
FROM employees e
JOIN dept_averages da ON e.department = da.department
WHERE e.salary > da.avg_salary;

Multiple CTEs

WITH
high_earners AS (
    SELECT *
    FROM employees
    WHERE salary > 60000
),
dept_stats AS (
    SELECT
        department,
        COUNT(*) AS emp_count
    FROM high_earners
    GROUP BY department
)
SELECT *
FROM dept_stats
WHERE emp_count > 1;

Subquery Performance Tips

Use JOINs when possible - Often faster than subqueries ```sql -- Instead of this: SELECT name FROM employees WHERE department IN (SELECT department FROM departments WHERE location = 'NY');

-- Use this: SELECT e.name FROM employees e JOIN departments d ON e.department = d.department WHERE d.location = 'NY'; ```

Use EXISTS instead of IN for large datasets sql -- Better performance with large subquery results: SELECT name FROM employees e WHERE EXISTS ( SELECT 1 FROM orders o WHERE o.employee_id = e.id );
Avoid correlated subqueries when possible - They execute once per row

Practice Exercises

Using this products and sales tables:

products:

+----+----------+-------+
| id | name     | price |
+----+----------+-------+
| 1  | Laptop   | 1000  |
| 2  | Mouse    | 25    |
| 3  | Keyboard | 75    |
+----+----------+-------+

sales:

+----+------------+----------+
| id | product_id | quantity |
+----+------------+----------+
| 1  | 1          | 5        |
| 2  | 2          | 20       |
| 3  | 1          | 3        |
+----+------------+----------+

Try these queries:

Find products with above-average price
Find products that have been sold
Find the most expensive product
Calculate total revenue per product
Find products sold more than once

??? success "Solutions" ```sql -- 1. Above-average price SELECT name, price FROM products WHERE price > (SELECT AVG(price) FROM products);

-- 2. Products that have been sold
SELECT name
FROM products
WHERE id IN (SELECT DISTINCT product_id FROM sales);

-- 3. Most expensive product
SELECT name, price
FROM products
WHERE price = (SELECT MAX(price) FROM products);

-- 4. Total revenue per product (using CTE)
WITH product_revenue AS (
    SELECT
        p.name,
        SUM(s.quantity * p.price) AS revenue
    FROM products p
    JOIN sales s ON p.id = s.product_id
    GROUP BY p.id, p.name
)
SELECT * FROM product_revenue
ORDER BY revenue DESC;

-- 5. Products sold more than once
SELECT p.name
FROM products p
WHERE (
    SELECT COUNT(*)
    FROM sales s
    WHERE s.product_id = p.id
) > 1;
```

!!! tip "When to Use Subqueries" - When you need to filter based on aggregate values - When the subquery is simple and returns few rows - When you need to reference the result multiple times (use CTE) - For readability in complex queries

Previous: SQL Joins