SQL Patterns 7–9: Self-Join, Cross Join, Market Basket
💡 Insight
Join-based patterns. The key insight: SQL can join a table to ITSELF.
Once you understand this, a whole category of questions becomes easy.
🧠 MEMORY MAP
🧠 JOIN PATTERNS = "SCM"
JOIN PATTERNS"SCM"
SSelf-Join: same table twice (hierarchy, pairs, comparisons)
CCross Join: every row × every row (combinations, grids)
MMarket Basket: self-join on order_id (items bought together)
SELF-JOIN KEY RULE:
Use TWO aliases for the same table: FROM employees e1 JOIN employees e2
The JOIN condition defines the RELATIONSHIP between the two copies:
→ e2.manager_id = e1.id (hierarchy)
→ e1.score - e2.score <= 5 (similarity)
→ e1.item_id < e2.item_id (pairs, avoid duplicates)
CROSS JOIN RULE
N rows × M rows = N×M rows (Cartesian product)
Use only when you WANT every combination (grids, matchups, combinations)
Never accidentally CROSS JOIN large tables → explodes data!
MARKET BASKET KEY RULE
Self-join order_items ON same order_id BUT different product
Use: o1.product_id < o2.product_id→prevents (A,B) AND (B,A) both appearing
PATTERN 7: SELF-JOIN
What Is It?
Join a table to itself to compare rows within the same table, traverse hierarchies, or find relationships between records of the same type.
Recognize It When You See:
- "Employee and their manager" (same employees table)
- "Find pairs that share [something]"
- "Users who both attended [same event]"
- "Salary higher than [their own manager]"
- "Students who scored within N points of each other"
- "Players who beat [same opponent] twice"
THE TEMPLATE
sql
-- TEMPLATE: Self-Join (two aliases for same table)
-- Pattern A: Hierarchy (employee → manager)
SELECT
e.name AS employee_name,
e.salary AS employee_salary,
m.name AS manager_name,
m.salary AS manager_salary
FROM employees e
JOIN employees m ON e.manager_id = m.employee_id -- join condition defines relationship
WHERE e.salary > m.salary; -- filter condition
-- Pattern B: Finding pairs (avoid duplicates with < operator)
SELECT
t1.id AS id1,
t2.id AS id2,
t1.col AS col1,
t2.col AS col2
FROM table t1
JOIN table t2
ON t1.shared_key = t2.shared_key -- what links them
AND t1.id < t2.id -- prevent (A,B) AND (B,A), prevent self-pairs
WHERE ABS(t1.value - t2.value) <= 5; -- similarity condition
SOLVED EXAMPLE — Q42 (Amazon): Employees earning more than their manager
Problem:
Table: employees (employee_id, name, salary, manager_id)
manager_id references employee_id in the same table.
Find employees who earn MORE than their direct manager.
sql
SELECT
e.name AS employee_name,
e.salary AS employee_salary,
m.name AS manager_name,
m.salary AS manager_salary
FROM employees e
JOIN employees m
ON e.manager_id = m.employee_id -- link employee to their manager
WHERE e.salary > m.salary -- filter: only employees earning more
ORDER BY e.salary DESC;
-- ⚠️ TRAP: Don't forget to handle NULLs!
-- Top-level managers have manager_id = NULL → JOIN filters them out (correct!)
-- Use INNER JOIN → rows without a manager are excluded automatically
SOLVED EXAMPLE — Q43 (Facebook): Friend recommendations via shared private events
Problem:
Table: event_rsvp (user_id, event_id, event_type)
Table: friendships (user_id_1, user_id_2)
Recommend friends: users who attended 2+ same PRIVATE events but aren't yet friends.
Return: user_id_1, user_id_2, shared_event_count
sql
WITH private_rsvp AS (
-- Step 1: Only private events
SELECT user_id, event_id
FROM event_rsvp
WHERE event_type = 'private'
),
shared_events AS (
-- Step 2: Self-join to find users who attended same events
SELECT
r1.user_id AS user_1,
r2.user_id AS user_2,
COUNT(r1.event_id) AS shared_events
FROM private_rsvp r1
JOIN private_rsvp r2
ON r1.event_id = r2.event_id -- same event
AND r1.user_id < r2.user_id -- avoid (A,B) and (B,A), avoid self-pairs
GROUP BY r1.user_id, r2.user_id
HAVING COUNT(r1.event_id) >= 2 -- attended 2+ same events
),
existing_friends AS (
-- Step 3: Normalize friendships table (ensure both directions covered)
SELECT user_id_1 AS u1, user_id_2 AS u2 FROM friendships
UNION
SELECT user_id_2, user_id_1 FROM friendships
)
-- Step 4: Exclude pairs who are already friends
SELECT s.user_1, s.user_2, s.shared_events
FROM shared_events s
LEFT JOIN existing_friends f
ON s.user_1 = f.u1 AND s.user_2 = f.u2
WHERE f.u1 IS NULL; -- not already friends
PATTERN 7 QUESTIONS
| Q# | Question | Key Insight |
|---|---|---|
| Q42 | Employees earning more than their manager | e JOIN e ON e.manager_id = m.id, WHERE e.salary > m.salary |
| Q43 | Friend recommendations via shared events | Self-join events + HAVING count >= 2 + exclude existing friends |
| Q44 | Student pairs in same class within 5 score points | Self-join ON same class + ABS(s1.score - s2.score) <= 5 |
| Q45 | Cheapest two-stop flight routes | Self-join flights ON f1.destination = f2.origin |
| Q46 | Players who beat same opponent at least twice | Self-join matches + GROUP BY player + HAVING COUNT >= 2 |
| Q47 | Managers with 5+ direct reports | GROUP BY manager_id + HAVING COUNT(*) >= 5 (no self-join needed!) |
PATTERN 8: CROSS JOIN — COMBINATION GENERATION
What Is It?
Generate every possible combination of rows from two tables. The result has N × M rows (Cartesian product). Used deliberately for: grid creation, tournament schedules, filling coverage gaps.
Recognize It When You See:
- "All possible combinations of X and Y"
- "All matchups / every team plays every other team"
- "Generate all size-color combinations"
- "Fill in missing date × product combinations"
- "Find zero-sales days per product"
THE TEMPLATE
sql
-- TEMPLATE A: Simple combination generation
SELECT
a.col1,
b.col2
FROM table_a a
CROSS JOIN table_b b; -- generates every row of a paired with every row of b
-- TEMPLATE B: Self-cross-join for pairs (e.g., tournament matchups)
SELECT
t1.team_name AS home_team,
t2.team_name AS away_team
FROM teams t1
CROSS JOIN teams t2
WHERE t1.team_id < t2.team_id; -- prevents (A vs B) AND (B vs A), prevents self-matches
-- TEMPLATE C: Fill gaps — generate full grid then LEFT JOIN to actuals
WITH all_dates AS (
SELECT generate_series(
'2024-01-01'::date,
'2024-12-31'::date,
'1 day'::interval
)::date AS date_val
-- Hive/Spark: Use recursive CTE or pre-built date table
),
all_products AS (SELECT DISTINCT product_id FROM products)
SELECT
d.date_val,
p.product_id,
COALESCE(s.revenue, 0) AS revenue -- 0 for days with no sales
FROM all_dates d
CROSS JOIN all_products p
LEFT JOIN daily_sales s
ON d.date_val = s.sale_date
AND p.product_id = s.product_id;
SOLVED EXAMPLE — Q51 (McKinsey): All 3-topping pizza combinations with total cost
Problem:
Table: toppings (topping_name, ingredient_cost)
Generate all possible 3-topping combinations.
Return: topping1, topping2, topping3, total_cost (sorted alphabetically).
No topping can be used twice in a combination.
sql
SELECT
t1.topping_name AS topping_1,
t2.topping_name AS topping_2,
t3.topping_name AS topping_3,
t1.ingredient_cost + t2.ingredient_cost + t3.ingredient_cost AS total_cost
FROM toppings t1
JOIN toppings t2 ON t1.topping_name < t2.topping_name -- ensures t1 < t2 < t3 alphabetically
JOIN toppings t3 ON t2.topping_name < t3.topping_name -- avoids all duplicate combos
ORDER BY total_cost, topping_1, topping_2, topping_3;
-- ⚠️ Using < on the name column (lexicographic order) ensures:
-- (Basil, Mushroom, Peppers) appears but NOT (Mushroom, Basil, Peppers)
-- This gives exactly C(n,3) = n!/3!(n-3)! unique combinations
PATTERN 8 QUESTIONS
| Q# | Question | Key Insight |
|---|---|---|
| Q48 | All size-color combinations | CROSS JOIN sizes × colors |
| Q49 | Full date × product grid for zero-fill reporting | CROSS JOIN dates × products + LEFT JOIN sales |
| Q50 | Round-robin tournament schedule | CROSS JOIN teams WHERE t1.id < t2.id |
| Q51 | 3-topping pizza combinations with cost | Three-way JOIN with t1 < t2 < t3 |
| Q52 | Region × category grid for monthly report | CROSS JOIN regions × categories + LEFT JOIN actuals |
PATTERN 9: MARKET BASKET / PRODUCT PAIRS
What Is It?
Find items (products, events, attributes) that frequently appear TOGETHER in the same transaction/order/session. The secret weapon: self-join on the grouping key (order_id).
Recognize It When You See:
- "Products/items frequently bought together"
- "Most common pair in same order"
- "Items purchased in the same transaction"
- "Co-occurrence count"
- "Lift / confidence / association rules"
THE TEMPLATE
sql
-- TEMPLATE: Market Basket — find co-occurring item pairs
WITH item_pairs AS (
SELECT
o1.product_id AS product_1,
o2.product_id AS product_2,
COUNT(DISTINCT o1.order_id) AS co_purchase_count
FROM order_items o1
JOIN order_items o2
ON o1.order_id = o2.order_id -- same order
AND o1.product_id < o2.product_id -- avoid (A,B) AND (B,A), avoid (A,A)
GROUP BY o1.product_id, o2.product_id
ORDER BY co_purchase_count DESC
)
SELECT
p1.product_name AS product_1,
p2.product_name AS product_2,
ip.co_purchase_count
FROM item_pairs ip
JOIN products p1 ON ip.product_1 = p1.product_id
JOIN products p2 ON ip.product_2 = p2.product_id
ORDER BY co_purchase_count DESC
LIMIT 5; -- top 5 pairs
SOLVED EXAMPLE — Q53 (Interview Query Hard): Top 5 product pairs bought together
Problem:
Table: transactions (order_id, user_id, product_id, quantity)
Table: products (product_id, product_name)
Find the 5 product pairs most frequently purchased in the same order.
Return: product1, product2, purchase_count
sql
WITH product_pairs AS (
SELECT
t1.product_id AS product_1_id,
t2.product_id AS product_2_id,
COUNT(DISTINCT t1.order_id) AS times_bought_together
FROM transactions t1
JOIN transactions t2
ON t1.order_id = t2.order_id -- same order
AND t1.product_id < t2.product_id -- canonical pair, no duplicates
GROUP BY t1.product_id, t2.product_id
)
SELECT
p1.product_name AS product_1,
p2.product_name AS product_2,
pp.times_bought_together
FROM product_pairs pp
JOIN products p1 ON pp.product_1_id = p1.product_id
JOIN products p2 ON pp.product_2_id = p2.product_id
ORDER BY pp.times_bought_together DESC
LIMIT 5;
SOLVED EXAMPLE — Q55: Product pairs with co-purchase > 100 and lift score
Problem:
Table: orders (order_id, product_id)
Find product pairs where co-purchase count > 100.
Also compute LIFT = pair_frequency / (freq_A × freq_B)
(Lift > 1 means the pair appears MORE than expected by chance)
sql
WITH total_orders AS (
SELECT COUNT(DISTINCT order_id) AS total FROM orders
),
product_freq AS (
-- How often does each product appear in ANY order?
SELECT product_id, COUNT(DISTINCT order_id) AS freq
FROM orders
GROUP BY product_id
),
pair_freq AS (
-- How often do pairs appear in the SAME order?
SELECT
o1.product_id AS prod_1,
o2.product_id AS prod_2,
COUNT(DISTINCT o1.order_id) AS pair_count
FROM orders o1
JOIN orders o2
ON o1.order_id = o2.order_id
AND o1.product_id < o2.product_id
GROUP BY o1.product_id, o2.product_id
HAVING COUNT(DISTINCT o1.order_id) > 100
)
SELECT
pf.prod_1,
pf.prod_2,
pf.pair_count,
-- LIFT = (pair_count / total) / ((freq_A / total) × (freq_B / total))
-- = pair_count × total / (freq_A × freq_B)
ROUND(
pf.pair_count * t.total * 1.0 / (f1.freq * f2.freq),
3) AS lift
FROM pair_freq pf
JOIN product_freq f1 ON pf.prod_1 = f1.product_id
JOIN product_freq f2 ON pf.prod_2 = f2.product_id
CROSS JOIN total_orders t
ORDER BY lift DESC;
-- Lift interpretation:
-- > 1.0: products bought together MORE than expected → strong association
-- = 1.0: products bought together exactly as expected → no association
-- < 1.0: products bought together LESS than expected → negative association
PATTERN 9 QUESTIONS
| Q# | Question | Key Insight |
|---|---|---|
| Q53 | Top 5 product pairs most frequently bought together | Self-join ON order_id, prod1 < prod2, GROUP BY + LIMIT |
| Q54 | Product most commonly bought alongside Product X | Same self-join but filter one side = Product X |
| Q55 | Pairs where co-purchase > 100 + lift score | Add product_freq and total_orders CTEs for lift calculation |
| Q56 | Menu items ordered together > 30% of the time | confidence = pair_count / item_count; filter HAVING conf > 0.3 |
⚠️ COMMON TRAPS IN JOIN PATTERNS
🧠 Memory Map
SELF-JOIN TRAPS:
TRAP 1: Forgetting id < id2→duplicate pairs
Without: SELECT * FROM t a JOIN t b ON a.shared = b.shared
→ Returns (A,B) AND (B,A) and (A,A) → pairs are doubled + self-pairs!
With: AND a.id < b.id→each pair appears exactly ONCE, no self-pairs
TRAP 2: Manager NULL handling
Top-level employees have manager_id = NULL
Inner JOIN on manager_id excludes them (good for "emp earns more than manager")
Use LEFT JOIN if you want to return ALL employees including top-level
CROSS JOIN TRAPS
TRAP 3: Accidental cross join on large tables
FROM table1, table2 ← old syntax for CROSS JOIN (comma = cross join!)
If you meant to write a regular JOIN and forgot the ON clause→massive data explosion
Always: write CROSS JOIN explicitly when you mean it, or INNER/LEFT JOIN with ON clause
MARKET BASKET TRAPS
TRAP 4: COUNT(order_id) vs COUNT(DISTINCT order_id)
If same product appears twice in one order (quantity > 1), order_id appears twice
Use COUNT(DISTINCT order_id) to count ORDERS (not line items)
Use COUNT(*) if you want to count total line item co-occurrences
TRAP 5: Performance on large order tables
Self-join on order_items can explode: 10M rows × 10M rows = 100 trillion pairs!
Always add PARTITION to narrow the self-join (e.g., same date or same user)
Or: pre-filter to only frequent products before the self-join