🗃️
SQL
SQL Fundamentals — The Questions Interviewers Always Ask First
🗃️
🗃️
SQL · Section 8 of 9

SQL Fundamentals — The Questions Interviewers Always Ask First

SQL Fundamentals — The Questions Interviewers Always Ask First

Pro Tip
Why this file exists: Patterns (SQL_01–SQL_06) test problem-solving. But before that, interviewers test foundational understanding. If you fumble ACID or can't explain normalization, you won't reach the coding round. Format: Every topic follows — Definition → Simple Explanation → Analogy → Code → Interview Tip → What NOT to say.

SECTION 1: ACID PROPERTIES

Definition (1 line each)

PropertyDefinition
AtomicityA transaction is all-or-nothing — every statement succeeds, or none do.
ConsistencyA transaction moves the database from one valid state to another — all rules/constraints are satisfied.
IsolationConcurrent transactions behave as if they ran one after another.
DurabilityOnce committed, the data survives crashes, power failures, and restarts.

Simple Explanation

Think of a bank transfer — you move ₹5000 from Account A to Account B.

PropertyWhat it means for the transfer
AtomicityEither BOTH the debit from A AND credit to B happen, or NEITHER happens. No half-transfers.
ConsistencyTotal money before = total money after. The system never shows ₹5000 vanished into thin air.
IsolationIf someone checks balances mid-transfer, they see either the before-state or the after-state — never the debit without the credit.
DurabilityOnce the bank says "transfer complete," even if the server crashes 1 second later, the transfer is permanent.

Real-world Analogy

ACID is like a legal contract:

  • Atomicity = The contract is either fully signed by both parties or it's void.
  • Consistency = The contract can't violate any laws.
  • Isolation = Two contracts being signed simultaneously don't interfere with each other.
  • Durability = Once signed and filed, a fire in the building doesn't erase the contract (it's backed up).

Code Example — Transaction in Action

sql
-- Bank transfer: Move ₹5000 from account 101 to account 202

BEGIN TRANSACTION;

  -- Step 1: Debit sender
  UPDATE accounts
  SET balance = balance - 5000
  WHERE account_id = 101;

  -- Step 2: Credit receiver
  UPDATE accounts
  SET balance = balance + 5000
  WHERE account_id = 202;

  -- Step 3: Log the transfer
  INSERT INTO transfer_log (from_acct, to_acct, amount, txn_time)
  VALUES (101, 202, 5000, CURRENT_TIMESTAMP);

COMMIT;
-- If ANY step fails → ROLLBACK undoes everything (Atomicity)
sql
-- What happens on failure:
BEGIN TRANSACTION;

  UPDATE accounts SET balance = balance - 5000 WHERE account_id = 101;
  -- ^^^ This succeeds

  UPDATE accounts SET balance = balance + 5000 WHERE account_id = 999;
  -- ^^^ Account 999 doesn't exist — ERROR

ROLLBACK;
-- Both updates are undone. Account 101 keeps its original balance.

Interview Tip

When asked "Explain ACID," always use one example (bank transfer) and map all 4 properties to it. Don't give 4 separate examples — it shows disconnected thinking.

What NOT to say

  • "Atomicity means the transaction is small" — No. It means indivisible, not small.
  • "Consistency means data looks the same everywhere" — That's replication consistency, not ACID consistency.
  • "Durability means data is replicated" — Replication helps, but durability fundamentally means written to non-volatile storage (disk/WAL).

SECTION 2: TRANSACTION ISOLATION LEVELS

The 3 Problems Isolation Levels Solve

ProblemWhat happensExample
Dirty ReadYou read data that another transaction hasn't committed yet. If that transaction rolls back, you read garbage.Transaction A updates a salary to 90K but hasn't committed. Transaction B reads 90K. Transaction A rolls back. B now has a value that never existed.
Non-Repeatable ReadYou read the same row twice and get different values because another transaction modified it between your two reads.You read salary = 80K. Another transaction updates it to 90K and commits. You read again — now it's 90K. Same query, different result.
Phantom ReadYou run the same query twice and get different rows because another transaction inserted/deleted rows between your two queries.You run SELECT * FROM employees WHERE dept = 'Eng' → 10 rows. Another transaction inserts a new engineer. You run the same query → 11 rows.

The 4 Isolation Levels

LevelDirty ReadNon-Repeatable ReadPhantom ReadPerformance
READ UNCOMMITTEDPossiblePossiblePossibleFastest
READ COMMITTEDPreventedPossiblePossibleFast
REPEATABLE READPreventedPreventedPossibleMedium
SERIALIZABLEPreventedPreventedPreventedSlowest

Memory trick: Each level going down adds one more guarantee. Think of it as progressively locking more things.

Simple Explanation

  • READ UNCOMMITTED — "I'll read whatever is there, even uncommitted changes." Used almost never in production.
  • READ COMMITTED — "I'll only read committed data, but if you change it after I read it, I see the new value on my next read." This is the default in PostgreSQL, Oracle, SQL Server.
  • REPEATABLE READ — "Once I read a row, its value won't change for the rest of my transaction." This is the default in MySQL/InnoDB.
  • SERIALIZABLE — "Full lockdown. Everything behaves as if transactions ran one by one." Safest but slowest.

Code Example

sql
-- Setting isolation level (PostgreSQL syntax)
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

BEGIN;
  SELECT balance FROM accounts WHERE account_id = 101;
  -- If another transaction changes this row and commits,
  -- a second SELECT in this transaction WILL see the new value.
  -- (Non-repeatable read is possible)
COMMIT;
sql
-- Setting isolation level (MySQL syntax)
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

START TRANSACTION;
  SELECT balance FROM accounts WHERE account_id = 101;
  -- Result: 50000

  -- Another session updates balance to 60000 and commits.

  SELECT balance FROM accounts WHERE account_id = 101;
  -- Result: STILL 50000 (snapshot from start of transaction)
COMMIT;
sql
-- SQL Server syntax
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

BEGIN TRANSACTION;
  SELECT * FROM orders WHERE customer_id = 42;
  -- No other transaction can INSERT/UPDATE/DELETE rows
  -- matching customer_id = 42 until this transaction ends.
COMMIT;

Interview Tip

Know your database's default isolation level. PostgreSQL and Oracle default to READ COMMITTED. MySQL/InnoDB defaults to REPEATABLE READ. Being able to say this shows real-world experience.

What NOT to say

  • "SERIALIZABLE means everything runs one at a time" — Not literally. The DB uses locking/MVCC to simulate serial execution while allowing some concurrency.
  • "READ UNCOMMITTED is never used" — It's used in analytics/reporting where speed matters and slight inconsistency is acceptable.

SECTION 3: NORMALIZATION

What is Normalization?

Definition: The process of organizing tables to minimize redundancy and eliminate update/insert/delete anomalies.

Simple Explanation: Instead of storing everything in one giant spreadsheet with tons of repeated data, you split it into smaller, related tables.

1NF — First Normal Form

Rule: Every column holds atomic (indivisible) values. No repeating groups, no arrays, no comma-separated lists.

📐 Architecture Diagram
❌ NOT in 1NF:
┌─────────┬──────────────────────┐
│ student │ courses              │
├─────────┼──────────────────────┤
│ Alice   │ Math, Physics, Chem  │  ← Multiple values in one cell
│ Bob     │ Math, English        │
└─────────┴──────────────────────┘

✅ In 1NF:
┌─────────┬──────────┐
│ student │ course   │
├─────────┼──────────┤
│ Alice   │ Math     │
│ Alice   │ Physics  │
│ Alice   │ Chem     │
│ Bob     │ Math     │
│ Bob     │ English  │
└─────────┴──────────┘

2NF — Second Normal Form

Rule: Must be in 1NF + every non-key column depends on the ENTIRE primary key (no partial dependencies).

This only matters when you have a composite primary key.

📐 Architecture Diagram
❌ In 1NF but NOT 2NF:
PK = (student_id, course_id)

┌────────────┬───────────┬──────────────┬──────────────────┐
│ student_id │ course_id │ student_name │ course_name      │
├────────────┼───────────┼──────────────┼──────────────────┤
│ 1          │ C101      │ Alice        │ Mathematics      │
│ 1          │ C102      │ Alice        │ Physics          │
│ 2          │ C101      │ Bob          │ Mathematics      │
└────────────┴───────────┴──────────────┴──────────────────┘

Problem: student_name depends ONLY on student_id (partial dependency)
         course_name depends ONLY on course_id (partial dependency)

✅ In 2NF — Split into 3 tables:

students:          courses:            enrollments:
┌────┬───────┐    ┌───────┬─────────┐  ┌────────────┬───────────┐
│ id │ name  │    │ id    │ name    │  │ student_id │ course_id │
├────┼───────┤    ├───────┼─────────┤  ├────────────┼───────────┤
│ 1  │ Alice │    │ C101  │ Math    │  │ 1          │ C101      │
│ 2  │ Bob   │    │ C102  │ Physics │  │ 1          │ C102      │
└────┴───────┘    └───────┴─────────┘  │ 2          │ C101      │
                                       └────────────┴───────────┘

3NF — Third Normal Form

Rule: Must be in 2NF + no transitive dependencies (non-key column depends on another non-key column).

📐 Architecture Diagram
❌ In 2NF but NOT 3NF:

┌─────────────┬───────────┬────────────────┬──────────────┐
│ employee_id │ dept_id   │ dept_name      │ dept_head    │
├─────────────┼───────────┼────────────────┼──────────────┤
│ 1           │ D10       │ Engineering    │ Alice        │
│ 2           │ D10       │ Engineering    │ Alice        │  ← Redundancy!
│ 3           │ D20       │ Marketing      │ Bob          │
└─────────────┴───────────┴────────────────┴──────────────┘

Problem: dept_name and dept_head depend on dept_id, NOT on employee_id.
         employee_id → dept_id → dept_name (transitive dependency)

✅ In 3NF — Split:

employees:                    departments:
┌─────────────┬───────────┐  ┌─────────┬─────────────┬───────────┐
│ employee_id │ dept_id   │  │ dept_id │ dept_name   │ dept_head │
├─────────────┼───────────┤  ├─────────┼─────────────┼───────────┤
│ 1           │ D10       │  │ D10     │ Engineering │ Alice     │
│ 2           │ D10       │  │ D20     │ Marketing   │ Bob       │
│ 3           │ D20       │  └─────────┴─────────────┴───────────┘
└─────────────┴───────────┘

BCNF — Boyce-Codd Normal Form

Definition: A stricter version of 3NF — every determinant must be a candidate key.

When it matters: When a table has multiple overlapping candidate keys. In practice, if your table is in 3NF, it's usually in BCNF too. Know it exists and can explain it if asked, but don't overthink it.

Denormalization — When and Why

Definition: Intentionally adding redundancy back into normalized tables to improve read performance.

Use Denormalization When...Don't Denormalize When...
Read-heavy workloads (dashboards, reports)Write-heavy OLTP systems
Query requires many JOINs that slow down readsData integrity is critical
Data warehouse / analytics layerSmall tables where JOINs are cheap
Pre-aggregated tables for BI toolsFrequently updated columns
sql
-- Normalized: Requires JOIN every time
SELECT o.order_id, c.customer_name, o.total
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id;

-- Denormalized: customer_name stored directly in orders table
-- Faster reads, but you must update it everywhere if the name changes
SELECT order_id, customer_name, total
FROM orders_denormalized;

Interview Tip

Say: "In OLTP systems I normalize to 3NF to prevent anomalies. In data warehouses and analytics, I denormalize (star schema / snowflake schema) because reads vastly outnumber writes and JOINs are expensive at scale."

What NOT to say

  • "Always normalize" — Shows no practical experience with data warehousing.
  • "1NF means each cell has one value, and that's basically it" — You must also mention no duplicate rows (need a primary key).

SECTION 4: INDEXES AND QUERY OPTIMIZATION

What is an Index?

Definition: A data structure (typically a B-Tree) that allows the database to find rows without scanning the entire table.

Simple Explanation: An index is like the index at the back of a textbook. Instead of reading every page to find "ACID properties," you look up "ACID" in the index → it says "page 47" → you go directly there.

How B-Tree Index Works (Simplified)

Without index: Full Table Scan — reads ALL 10 million rows
With index: B-Tree lookup — reads ~20 nodes to find the row
B-Tree structure (balanced tree):
[M]
/ \
[D,H] [R,V]
/ | \ / | \
[A-C][E-G][I-L][N-Q][S-U][W-Z]
Points to actual rows on disk

Clustered vs Non-Clustered Index

FeatureClustered IndexNon-Clustered Index
What it doesSorts and stores the actual table data in orderCreates a separate structure pointing to table data
How many per tableOnly 1 (because data can only be physically sorted one way)Multiple (as many as needed)
SpeedFaster for range queries on the indexed columnSlightly slower (extra lookup to actual data)
DefaultPrimary key creates a clustered index by default (in SQL Server, MySQL/InnoDB)Manually created indexes are non-clustered
AnalogyA phone book sorted by last name (the data IS the index)An index at the back of a book (separate from the content)

When to Create an Index / When NOT to

Create Index WhenDon't Create Index When
Column is used in WHERE, JOIN, ORDER BY frequentlyTable is small (< few thousand rows) — full scan is faster
Column has high cardinality (many distinct values)Column has low cardinality (e.g., gender: M/F)
Table is read-heavyTable is write-heavy (indexes slow down INSERT/UPDATE/DELETE)
Query returns a small % of rowsQuery returns most rows (index won't help)

Code Examples

sql
-- Basic index
CREATE INDEX idx_customers_email
ON customers (email);

-- Composite index (multi-column — order matters!)
CREATE INDEX idx_orders_cust_date
ON orders (customer_id, order_date);
-- ✅ This helps: WHERE customer_id = 42 AND order_date > '2025-01-01'
-- ✅ This helps: WHERE customer_id = 42 (leftmost prefix)
-- ❌ This does NOT help: WHERE order_date > '2025-01-01' (skips first column)

-- Covering index: includes all columns the query needs
-- The DB can answer the query entirely from the index without touching the table
CREATE INDEX idx_orders_covering
ON orders (customer_id, order_date)
INCLUDE (total_amount, status);

-- Unique index
CREATE UNIQUE INDEX idx_users_email
ON users (email);

-- Drop an index
DROP INDEX idx_customers_email;

How to Read EXPLAIN PLAN

sql
-- PostgreSQL
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE customer_id = 42
  AND order_date > '2025-01-01';

Key things to look for in the output:

What you seeWhat it meansGood or Bad?
Seq ScanFull table scan — reading every rowBad for large tables
Index ScanUsing an index to find rowsGood
Index Only ScanAnswered entirely from index (covering index)Best
Bitmap Index ScanIndex scan + bitmap for multiple conditionsGood
Nested LoopFor each row in table A, scan table BBad for large tables
Hash JoinBuild hash table on smaller table, probe with largerGood for equi-joins
SortSorting in memory or on diskWatch for Sort Method: external merge (disk sort = slow)
actual timeReal execution time in millisecondsLower = better
rowsEstimated vs actual rows returnedLarge mismatch = stale statistics, run ANALYZE
sql
-- Example EXPLAIN output (PostgreSQL):
-- Index Scan using idx_orders_cust_date on orders
--   Index Cond: (customer_id = 42)
--   Filter: (order_date > '2025-01-01')
--   Rows Removed by Filter: 12
--   actual time=0.045..0.089 rows=38 loops=1

-- Translation: Used the index on customer_id, then filtered by date.
-- Very fast (0.089 ms). Returned 38 rows.

Interview Tip

💡 Interview Tip
If an interviewer asks "How would you optimize this slow query?", follow this framework:
  1. Run EXPLAIN — find the bottleneck (Seq Scan? Sort on disk? Nested Loop?)
  2. Check indexes — is the WHERE/JOIN column indexed?
  3. Check statisticsANALYZE table_name to update stats
  4. Check query rewrite — Can you avoid SELECT *? Can you push filters earlier?

What NOT to say

  • "Just add an index on every column" — Too many indexes slow down writes and waste storage.
  • "I'd look at the query and guess what's slow" — Always say you'd check EXPLAIN first.

SECTION 5: IN vs EXISTS vs JOIN — PERFORMANCE

Quick Comparison

FeatureINEXISTSJOIN
Use whenChecking against a small list or subquery with few resultsChecking existence in a correlated subqueryYou need columns from both tables
ReturnsMatches from the listTRUE/FALSE (stops at first match)All matching rows (including duplicates if not careful)
NULL handlingDangerousIN (NULL) never matchesSafe — handles NULLs properlyDepends on join type
PerformanceGood for small lists, bad for large subqueriesGood for large subqueries (short-circuits)Usually best for large tables (optimizer handles well)

Code Examples — Same Question, 3 Approaches

Question: Find customers who have placed at least one order.

sql
-- Approach 1: IN
SELECT customer_name
FROM customers
WHERE customer_id IN (
    SELECT customer_id FROM orders
);
-- Works fine for small orders table
-- PROBLEM: If orders.customer_id has NULLs, IN can behave unexpectedly

-- Approach 2: EXISTS (generally preferred for existence checks)
SELECT customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o
    WHERE o.customer_id = c.customer_id
);
-- Stops scanning orders as soon as it finds ONE match (short-circuit)
-- Handles NULLs safely

-- Approach 3: JOIN
SELECT DISTINCT c.customer_name
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;
-- Need DISTINCT because a customer with 5 orders appears 5 times
-- Best when you also need order columns in the output

The NULL Trap with IN and NOT IN

sql
-- This is the classic trap interviewers love:

-- Suppose orders.customer_id has values: (1, 2, NULL)

-- NOT IN with NULLs: RETURNS NOTHING!
SELECT * FROM customers
WHERE customer_id NOT IN (SELECT customer_id FROM orders);
-- SQL evaluates: customer_id != 1 AND customer_id != 2 AND customer_id != NULL
-- Anything compared to NULL → UNKNOWN → entire WHERE clause → UNKNOWN → row excluded
-- Result: ZERO rows returned, regardless of data!

-- NOT EXISTS: Works correctly
SELECT * FROM customers c
WHERE NOT EXISTS (
    SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id
);
-- Returns customers who genuinely have no orders

Performance Rules of Thumb

ScenarioBest ChoiceWhy
Small static listIN ('A', 'B', 'C')Simple, readable
"Does a matching row exist?"EXISTSShort-circuits, NULL-safe
Large tables, need columns from bothJOINOptimizer can use hash/merge join
Anti-join (rows that DON'T match)NOT EXISTS or LEFT JOIN ... IS NULLNOT IN fails with NULLs

Interview Tip

If asked "IN or EXISTS?", say: "For existence checks, I default to EXISTS because it short-circuits and handles NULLs safely. For small literal lists, IN is fine. But honestly, modern optimizers often generate the same execution plan for both — I'd verify with EXPLAIN."

What NOT to say

  • "IN and EXISTS are the same thing" — They handle NULLs differently, and EXISTS short-circuits.
  • "JOIN is always faster" — Not always. For a simple existence check, EXISTS avoids the need for DISTINCT.

SECTION 6: NULL HANDLING

What is NULL?

Definition: NULL means unknown or missing — it is NOT zero, NOT an empty string, NOT false.

Key rule: Any operation involving NULL produces NULL (with a few exceptions).

NULL Behavior Cheat Sheet

OperationResultWhy
NULL = NULLNULL (not TRUE!)Unknown = Unknown → Unknown
NULL != NULLNULL (not TRUE!)Same reason
NULL > 5NULLCan't compare unknown to 5
NULL + 10NULLUnknown + 10 = Unknown
NULL AND TRUENULLUnknown AND TRUE = Unknown
NULL OR TRUETRUEEven if unknown is FALSE, TRUE OR FALSE = TRUE
NULL AND FALSEFALSEEven if unknown is TRUE, TRUE AND FALSE = FALSE

IS NULL vs = NULL

sql
-- ❌ WRONG — this will NEVER find NULL rows
SELECT * FROM employees WHERE manager_id = NULL;
-- Evaluates to NULL (not TRUE), so no rows returned

-- ✅ CORRECT
SELECT * FROM employees WHERE manager_id IS NULL;

-- ✅ Also correct
SELECT * FROM employees WHERE manager_id IS NOT NULL;

NULL in Aggregations

sql
-- Sample data:
-- sales: (100, 200, NULL, 300, NULL)

SELECT
    COUNT(*)          AS total_rows,      -- 5 (counts ALL rows including NULLs)
    COUNT(amount)     AS non_null_count,  -- 3 (skips NULLs)
    SUM(amount)       AS total,           -- 600 (NULLs ignored)
    AVG(amount)       AS average,         -- 200 (600/3, NOT 600/5!)
    MIN(amount)       AS minimum,         -- 100 (NULLs ignored)
    MAX(amount)       AS maximum          -- 300 (NULLs ignored)
FROM sales;

-- KEY INSIGHT: AVG ignores NULLs.
-- If you want NULLs treated as 0:
SELECT AVG(COALESCE(amount, 0)) FROM sales;  -- 600/5 = 120

NULL in JOINs

sql
-- NULLs NEVER match in JOINs
-- If orders.customer_id = NULL and customers.customer_id = NULL,
-- they will NOT join together because NULL = NULL → NULL (not TRUE)

SELECT *
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id;
-- Rows with NULL customer_id are silently dropped from both sides

COALESCE, NULLIF, IFNULL

sql
-- COALESCE: Returns the first non-NULL value
SELECT COALESCE(phone, mobile, email, 'No Contact')
FROM employees;
-- If phone is NULL → try mobile → try email → fallback to 'No Contact'

-- COALESCE for safe calculations
SELECT
    employee_name,
    base_salary + COALESCE(bonus, 0) + COALESCE(commission, 0) AS total_comp
FROM employees;
-- Without COALESCE: if bonus is NULL, total_comp becomes NULL

-- NULLIF: Returns NULL if the two values are equal
SELECT NULLIF(actual_count, 0)  -- Returns NULL if actual_count is 0
FROM inventory;
-- Useful to prevent division-by-zero:
SELECT total / NULLIF(count, 0) AS average  -- Returns NULL instead of error
FROM summary;

-- IFNULL (MySQL) / ISNULL (SQL Server): Two-argument COALESCE
SELECT IFNULL(phone, 'N/A') FROM employees;        -- MySQL
SELECT ISNULL(phone, 'N/A') FROM employees;        -- SQL Server
SELECT COALESCE(phone, 'N/A') FROM employees;      -- Standard SQL (use this)

Interview Tip

⚠️ Common Trap
The #1 NULL trap interviewers set: COUNT(*) vs COUNT(column). Always clarify: "COUNT(*) counts all rows; COUNT(column) skips NULLs." Then mention that AVG also skips NULLs, which changes the denominator.

What NOT to say

  • "NULL means zero" or "NULL means empty string" — NULL means unknown. '' != NULL in most databases (except Oracle, where '' = NULL).
  • "I just use WHERE column = NULL" — Must use IS NULL. This is a dealbreaker mistake.

SECTION 7: COMMON STRAIGHT QUESTIONS (Quick-Fire Q&A)

DELETE vs TRUNCATE vs DROP

FeatureDELETETRUNCATEDROP
What it doesRemoves specific rows (with WHERE)Removes all rowsRemoves the entire table (structure + data)
WHERE clauseYesNoN/A
LoggedFully logged (row by row)Minimally logged (deallocates pages)Fully logged
RollbackYes (within transaction)Depends on DB (Yes in PostgreSQL, No in Oracle)Depends on DB
Triggers firedYesNoNo
Resets identity/auto-incrementNoYesN/A
SpeedSlowestFastFast
sql
DELETE FROM orders WHERE order_date < '2020-01-01';  -- Remove old orders
TRUNCATE TABLE temp_staging;                          -- Empty staging table
DROP TABLE IF EXISTS old_backup;                      -- Remove table entirely

WHERE vs HAVING

FeatureWHEREHAVING
FiltersIndividual rows BEFORE groupingGroups AFTER GROUP BY
Can use aggregates?NoYes
Execution orderRuns firstRuns after GROUP BY
sql
-- WHERE filters rows, HAVING filters groups
SELECT department, COUNT(*) AS emp_count
FROM employees
WHERE status = 'active'          -- Filter rows BEFORE grouping
GROUP BY department
HAVING COUNT(*) > 5;             -- Filter groups AFTER grouping

Execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT

UNION vs UNION ALL

FeatureUNIONUNION ALL
DuplicatesRemoves duplicatesKeeps all rows (including duplicates)
PerformanceSlower (needs sort/distinct)Faster (no dedup step)
When to useWhen you truly need unique resultsDefault choice — use this unless you need dedup
sql
-- UNION ALL is almost always what you want in data pipelines
SELECT name FROM employees_us
UNION ALL
SELECT name FROM employees_eu;

-- UNION only when dedup is needed
SELECT email FROM customers
UNION
SELECT email FROM newsletter_subscribers;

CHAR vs VARCHAR

FeatureCHAR(n)VARCHAR(n)
StorageFixed-length (always uses n bytes, padded with spaces)Variable-length (uses only what's needed + overhead)
PerformanceSlightly faster for fixed-length dataBetter for variable-length data
Use forCountry codes (CHAR(2)), state codes (CHAR(2)), flags (CHAR(1))Names, emails, addresses — anything variable
sql
CREATE TABLE users (
    country_code  CHAR(2),        -- Always exactly 2 chars: 'US', 'IN', 'UK'
    email         VARCHAR(255)    -- Could be 5 chars or 200 chars
);

PRIMARY KEY vs UNIQUE KEY

FeaturePRIMARY KEYUNIQUE KEY
NULLsNot allowedAllowed (one NULL in most DBs, multiple in PostgreSQL)
How many per tableOnly 1Multiple
Creates indexYes (clustered by default in SQL Server/MySQL)Yes (non-clustered)
PurposeUniquely identifies each rowEnforces uniqueness on alternate columns
sql
CREATE TABLE employees (
    employee_id   INT PRIMARY KEY,             -- One per table, no NULLs
    email         VARCHAR(255) UNIQUE,         -- Must be unique, but can be NULL
    ssn           VARCHAR(11) UNIQUE           -- Another unique constraint
);

View vs Materialized View

FeatureViewMaterialized View
Stores data?No (it's a saved query)Yes (stores query results on disk)
PerformanceRe-runs query every timeFast reads (pre-computed)
FreshnessAlways currentStale until refreshed
Use caseAccess control, simplify complex queriesDashboards, expensive aggregations
sql
-- Regular View (no data stored)
CREATE VIEW active_customers AS
SELECT customer_id, name, email
FROM customers
WHERE status = 'active';

-- Materialized View (PostgreSQL)
CREATE MATERIALIZED VIEW monthly_sales AS
SELECT DATE_TRUNC('month', order_date) AS month,
       SUM(total) AS revenue
FROM orders
GROUP BY 1;

-- Must manually refresh:
REFRESH MATERIALIZED VIEW monthly_sales;

Stored Procedure vs Function

FeatureStored ProcedureFunction
ReturnsZero, one, or multiple result setsMust return a single value or table
Used in SELECT?NoYes (SELECT my_function(col))
Side effectsCan INSERT, UPDATE, DELETEUsually read-only (varies by DB)
Transaction controlCan use BEGIN/COMMIT/ROLLBACKCannot (in most DBs)
Use caseComplex business logic, ETL stepsCalculations, transformations
sql
-- Stored Procedure (PostgreSQL)
CREATE PROCEDURE transfer_funds(sender INT, receiver INT, amount DECIMAL)
LANGUAGE plpgsql AS $$
BEGIN
    UPDATE accounts SET balance = balance - amount WHERE id = sender;
    UPDATE accounts SET balance = balance + amount WHERE id = receiver;
    COMMIT;
END;
$$;

CALL transfer_funds(101, 202, 5000);

-- Function (PostgreSQL)
CREATE FUNCTION get_full_name(first_name TEXT, last_name TEXT)
RETURNS TEXT AS $$
BEGIN
    RETURN first_name || ' ' || last_name;
END;
$$ LANGUAGE plpgsql;

SELECT get_full_name(first_name, last_name) FROM employees;

Trigger — What Is It?

Definition: A trigger is a block of code that automatically executes when a specific event (INSERT, UPDATE, DELETE) happens on a table.

sql
-- Audit trigger: Log every salary change
CREATE TRIGGER log_salary_change
AFTER UPDATE OF salary ON employees
FOR EACH ROW
BEGIN
    INSERT INTO salary_audit (employee_id, old_salary, new_salary, changed_at)
    VALUES (OLD.employee_id, OLD.salary, NEW.salary, CURRENT_TIMESTAMP);
END;
📝 Note
Interview note: Be ready to say when NOT to use triggers: "Triggers add hidden logic that's hard to debug and can cause cascading performance issues. I prefer application-level logic or CDC (Change Data Capture) in data engineering pipelines."

Temp Table vs CTE vs Subquery

FeatureTemp TableCTE (WITH ... AS)Subquery
ScopeEntire session (until dropped or session ends)Single query onlySingle query only
Indexed?Yes (you can add indexes)NoNo
Materialized?Yes (data stored in tempdb)Usually not (inlined by optimizer)Usually not
Reusable in same query?YesYes (reference multiple times)No (must repeat)
Use caseComplex ETL, need intermediate results across multiple queriesBreaking complex queries into readable stepsSimple one-off filters
sql
-- Temp Table
CREATE TEMPORARY TABLE high_value_orders AS
SELECT * FROM orders WHERE total > 10000;
-- Can now run multiple queries against it, add indexes, etc.

-- CTE (preferred for interview coding)
WITH high_value AS (
    SELECT * FROM orders WHERE total > 10000
),
customer_totals AS (
    SELECT customer_id, SUM(total) AS total_spent
    FROM high_value
    GROUP BY customer_id
)
SELECT * FROM customer_totals WHERE total_spent > 50000;

-- Subquery (avoid for complex logic — hard to read)
SELECT *
FROM (
    SELECT customer_id, SUM(total) AS total_spent
    FROM orders
    WHERE total > 10000
    GROUP BY customer_id
) sub
WHERE total_spent > 50000;

Interview Tip

💡 Interview Tip
In interviews, always use CTEs. They show structured thinking, are easy to explain step by step, and make your SQL readable. Say: "I prefer CTEs for clarity, but in production ETL I might use temp tables if I need to index intermediate results."

QUICK REFERENCE CARD — Print This Page

🧠 READ UNCOMMITTED → READ COMMITTED → REPEATABLE READ → SERIALIZABLE
ACID: Atomicity (all-or-nothing) | Consistency (valid state) | Isolation (no interference) | Durability (survives crash)
ISOLATION LEVELS (low → high protection)
READ UNCOMMITTEDREAD COMMITTED → REPEATABLE READ → SERIALIZABLE
NORMAL FORMS
1NF: Atomic values, no repeating groups
2NF: 1NF + no partial dependencies
3NF: 2NF + no transitive dependencies
INDEX RULES
Create: High cardinality + used in WHERE/JOIN + read-heavy table
Skip: Low cardinality + write-heavy table + small table
NULL RULES
NULLNULL → NULL (not TRUE!)
Use IS NULL, never = NULL
COUNT(*) counts NULLs, COUNT(col) skips them
AVG skips NULLs (changes denominator!)
NOT IN with NULLsreturns NOTHING. Use NOT EXISTS.
QUICK PAIRS
DELETE (rows, logged) vs TRUNCATE (all rows, fast) vs DROP (table gone)
WHERE (before GROUP BY) vs HAVING (after GROUP BY)
UNION (dedup) vs UNION ALL (keep all — faster, default choice)
View (saved query) vs Materialized View (saved result)
CTE (readable, single query) vs Temp Table (persistent, indexable)