SQL Fundamentals — The Questions Interviewers Always Ask First
SECTION 1: ACID PROPERTIES
Definition (1 line each)
| Property | Definition |
|---|---|
| Atomicity | A transaction is all-or-nothing — every statement succeeds, or none do. |
| Consistency | A transaction moves the database from one valid state to another — all rules/constraints are satisfied. |
| Isolation | Concurrent transactions behave as if they ran one after another. |
| Durability | Once committed, the data survives crashes, power failures, and restarts. |
Simple Explanation
Think of a bank transfer — you move ₹5000 from Account A to Account B.
| Property | What it means for the transfer |
|---|---|
| Atomicity | Either BOTH the debit from A AND credit to B happen, or NEITHER happens. No half-transfers. |
| Consistency | Total money before = total money after. The system never shows ₹5000 vanished into thin air. |
| Isolation | If someone checks balances mid-transfer, they see either the before-state or the after-state — never the debit without the credit. |
| Durability | Once the bank says "transfer complete," even if the server crashes 1 second later, the transfer is permanent. |
Real-world Analogy
ACID is like a legal contract:
- Atomicity = The contract is either fully signed by both parties or it's void.
- Consistency = The contract can't violate any laws.
- Isolation = Two contracts being signed simultaneously don't interfere with each other.
- Durability = Once signed and filed, a fire in the building doesn't erase the contract (it's backed up).
Code Example — Transaction in Action
-- Bank transfer: Move ₹5000 from account 101 to account 202
BEGIN TRANSACTION;
-- Step 1: Debit sender
UPDATE accounts
SET balance = balance - 5000
WHERE account_id = 101;
-- Step 2: Credit receiver
UPDATE accounts
SET balance = balance + 5000
WHERE account_id = 202;
-- Step 3: Log the transfer
INSERT INTO transfer_log (from_acct, to_acct, amount, txn_time)
VALUES (101, 202, 5000, CURRENT_TIMESTAMP);
COMMIT;
-- If ANY step fails → ROLLBACK undoes everything (Atomicity)
-- What happens on failure:
BEGIN TRANSACTION;
UPDATE accounts SET balance = balance - 5000 WHERE account_id = 101;
-- ^^^ This succeeds
UPDATE accounts SET balance = balance + 5000 WHERE account_id = 999;
-- ^^^ Account 999 doesn't exist — ERROR
ROLLBACK;
-- Both updates are undone. Account 101 keeps its original balance.
Interview Tip
When asked "Explain ACID," always use one example (bank transfer) and map all 4 properties to it. Don't give 4 separate examples — it shows disconnected thinking.
What NOT to say
- "Atomicity means the transaction is small" — No. It means indivisible, not small.
- "Consistency means data looks the same everywhere" — That's replication consistency, not ACID consistency.
- "Durability means data is replicated" — Replication helps, but durability fundamentally means written to non-volatile storage (disk/WAL).
SECTION 2: TRANSACTION ISOLATION LEVELS
The 3 Problems Isolation Levels Solve
| Problem | What happens | Example |
|---|---|---|
| Dirty Read | You read data that another transaction hasn't committed yet. If that transaction rolls back, you read garbage. | Transaction A updates a salary to 90K but hasn't committed. Transaction B reads 90K. Transaction A rolls back. B now has a value that never existed. |
| Non-Repeatable Read | You read the same row twice and get different values because another transaction modified it between your two reads. | You read salary = 80K. Another transaction updates it to 90K and commits. You read again — now it's 90K. Same query, different result. |
| Phantom Read | You run the same query twice and get different rows because another transaction inserted/deleted rows between your two queries. | You run SELECT * FROM employees WHERE dept = 'Eng' → 10 rows. Another transaction inserts a new engineer. You run the same query → 11 rows. |
The 4 Isolation Levels
| Level | Dirty Read | Non-Repeatable Read | Phantom Read | Performance |
|---|---|---|---|---|
| READ UNCOMMITTED | Possible | Possible | Possible | Fastest |
| READ COMMITTED | Prevented | Possible | Possible | Fast |
| REPEATABLE READ | Prevented | Prevented | Possible | Medium |
| SERIALIZABLE | Prevented | Prevented | Prevented | Slowest |
Memory trick: Each level going down adds one more guarantee. Think of it as progressively locking more things.
Simple Explanation
- READ UNCOMMITTED — "I'll read whatever is there, even uncommitted changes." Used almost never in production.
- READ COMMITTED — "I'll only read committed data, but if you change it after I read it, I see the new value on my next read." This is the default in PostgreSQL, Oracle, SQL Server.
- REPEATABLE READ — "Once I read a row, its value won't change for the rest of my transaction." This is the default in MySQL/InnoDB.
- SERIALIZABLE — "Full lockdown. Everything behaves as if transactions ran one by one." Safest but slowest.
Code Example
-- Setting isolation level (PostgreSQL syntax)
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
BEGIN;
SELECT balance FROM accounts WHERE account_id = 101;
-- If another transaction changes this row and commits,
-- a second SELECT in this transaction WILL see the new value.
-- (Non-repeatable read is possible)
COMMIT;
-- Setting isolation level (MySQL syntax)
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION;
SELECT balance FROM accounts WHERE account_id = 101;
-- Result: 50000
-- Another session updates balance to 60000 and commits.
SELECT balance FROM accounts WHERE account_id = 101;
-- Result: STILL 50000 (snapshot from start of transaction)
COMMIT;
-- SQL Server syntax
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
SELECT * FROM orders WHERE customer_id = 42;
-- No other transaction can INSERT/UPDATE/DELETE rows
-- matching customer_id = 42 until this transaction ends.
COMMIT;
Interview Tip
Know your database's default isolation level. PostgreSQL and Oracle default to READ COMMITTED. MySQL/InnoDB defaults to REPEATABLE READ. Being able to say this shows real-world experience.
What NOT to say
- "SERIALIZABLE means everything runs one at a time" — Not literally. The DB uses locking/MVCC to simulate serial execution while allowing some concurrency.
- "READ UNCOMMITTED is never used" — It's used in analytics/reporting where speed matters and slight inconsistency is acceptable.
SECTION 3: NORMALIZATION
What is Normalization?
Definition: The process of organizing tables to minimize redundancy and eliminate update/insert/delete anomalies.
Simple Explanation: Instead of storing everything in one giant spreadsheet with tons of repeated data, you split it into smaller, related tables.
1NF — First Normal Form
Rule: Every column holds atomic (indivisible) values. No repeating groups, no arrays, no comma-separated lists.
❌ NOT in 1NF: ┌─────────┬──────────────────────┐ │ student │ courses │ ├─────────┼──────────────────────┤ │ Alice │ Math, Physics, Chem │ ← Multiple values in one cell │ Bob │ Math, English │ └─────────┴──────────────────────┘ ✅ In 1NF: ┌─────────┬──────────┐ │ student │ course │ ├─────────┼──────────┤ │ Alice │ Math │ │ Alice │ Physics │ │ Alice │ Chem │ │ Bob │ Math │ │ Bob │ English │ └─────────┴──────────┘
2NF — Second Normal Form
Rule: Must be in 1NF + every non-key column depends on the ENTIRE primary key (no partial dependencies).
This only matters when you have a composite primary key.
❌ In 1NF but NOT 2NF:
PK = (student_id, course_id)
┌────────────┬───────────┬──────────────┬──────────────────┐
│ student_id │ course_id │ student_name │ course_name │
├────────────┼───────────┼──────────────┼──────────────────┤
│ 1 │ C101 │ Alice │ Mathematics │
│ 1 │ C102 │ Alice │ Physics │
│ 2 │ C101 │ Bob │ Mathematics │
└────────────┴───────────┴──────────────┴──────────────────┘
Problem: student_name depends ONLY on student_id (partial dependency)
course_name depends ONLY on course_id (partial dependency)
✅ In 2NF — Split into 3 tables:
students: courses: enrollments:
┌────┬───────┐ ┌───────┬─────────┐ ┌────────────┬───────────┐
│ id │ name │ │ id │ name │ │ student_id │ course_id │
├────┼───────┤ ├───────┼─────────┤ ├────────────┼───────────┤
│ 1 │ Alice │ │ C101 │ Math │ │ 1 │ C101 │
│ 2 │ Bob │ │ C102 │ Physics │ │ 1 │ C102 │
└────┴───────┘ └───────┴─────────┘ │ 2 │ C101 │
└────────────┴───────────┘
3NF — Third Normal Form
Rule: Must be in 2NF + no transitive dependencies (non-key column depends on another non-key column).
❌ In 2NF but NOT 3NF:
┌─────────────┬───────────┬────────────────┬──────────────┐
│ employee_id │ dept_id │ dept_name │ dept_head │
├─────────────┼───────────┼────────────────┼──────────────┤
│ 1 │ D10 │ Engineering │ Alice │
│ 2 │ D10 │ Engineering │ Alice │ ← Redundancy!
│ 3 │ D20 │ Marketing │ Bob │
└─────────────┴───────────┴────────────────┴──────────────┘
Problem: dept_name and dept_head depend on dept_id, NOT on employee_id.
employee_id → dept_id → dept_name (transitive dependency)
✅ In 3NF — Split:
employees: departments:
┌─────────────┬───────────┐ ┌─────────┬─────────────┬───────────┐
│ employee_id │ dept_id │ │ dept_id │ dept_name │ dept_head │
├─────────────┼───────────┤ ├─────────┼─────────────┼───────────┤
│ 1 │ D10 │ │ D10 │ Engineering │ Alice │
│ 2 │ D10 │ │ D20 │ Marketing │ Bob │
│ 3 │ D20 │ └─────────┴─────────────┴───────────┘
└─────────────┴───────────┘
BCNF — Boyce-Codd Normal Form
Definition: A stricter version of 3NF — every determinant must be a candidate key.
When it matters: When a table has multiple overlapping candidate keys. In practice, if your table is in 3NF, it's usually in BCNF too. Know it exists and can explain it if asked, but don't overthink it.
Denormalization — When and Why
Definition: Intentionally adding redundancy back into normalized tables to improve read performance.
| Use Denormalization When... | Don't Denormalize When... |
|---|---|
| Read-heavy workloads (dashboards, reports) | Write-heavy OLTP systems |
| Query requires many JOINs that slow down reads | Data integrity is critical |
| Data warehouse / analytics layer | Small tables where JOINs are cheap |
| Pre-aggregated tables for BI tools | Frequently updated columns |
-- Normalized: Requires JOIN every time
SELECT o.order_id, c.customer_name, o.total
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id;
-- Denormalized: customer_name stored directly in orders table
-- Faster reads, but you must update it everywhere if the name changes
SELECT order_id, customer_name, total
FROM orders_denormalized;
Interview Tip
Say: "In OLTP systems I normalize to 3NF to prevent anomalies. In data warehouses and analytics, I denormalize (star schema / snowflake schema) because reads vastly outnumber writes and JOINs are expensive at scale."
What NOT to say
- "Always normalize" — Shows no practical experience with data warehousing.
- "1NF means each cell has one value, and that's basically it" — You must also mention no duplicate rows (need a primary key).
SECTION 4: INDEXES AND QUERY OPTIMIZATION
What is an Index?
Definition: A data structure (typically a B-Tree) that allows the database to find rows without scanning the entire table.
Simple Explanation: An index is like the index at the back of a textbook. Instead of reading every page to find "ACID properties," you look up "ACID" in the index → it says "page 47" → you go directly there.
How B-Tree Index Works (Simplified)
Clustered vs Non-Clustered Index
| Feature | Clustered Index | Non-Clustered Index |
|---|---|---|
| What it does | Sorts and stores the actual table data in order | Creates a separate structure pointing to table data |
| How many per table | Only 1 (because data can only be physically sorted one way) | Multiple (as many as needed) |
| Speed | Faster for range queries on the indexed column | Slightly slower (extra lookup to actual data) |
| Default | Primary key creates a clustered index by default (in SQL Server, MySQL/InnoDB) | Manually created indexes are non-clustered |
| Analogy | A phone book sorted by last name (the data IS the index) | An index at the back of a book (separate from the content) |
When to Create an Index / When NOT to
| Create Index When | Don't Create Index When |
|---|---|
| Column is used in WHERE, JOIN, ORDER BY frequently | Table is small (< few thousand rows) — full scan is faster |
| Column has high cardinality (many distinct values) | Column has low cardinality (e.g., gender: M/F) |
| Table is read-heavy | Table is write-heavy (indexes slow down INSERT/UPDATE/DELETE) |
| Query returns a small % of rows | Query returns most rows (index won't help) |
Code Examples
-- Basic index
CREATE INDEX idx_customers_email
ON customers (email);
-- Composite index (multi-column — order matters!)
CREATE INDEX idx_orders_cust_date
ON orders (customer_id, order_date);
-- ✅ This helps: WHERE customer_id = 42 AND order_date > '2025-01-01'
-- ✅ This helps: WHERE customer_id = 42 (leftmost prefix)
-- ❌ This does NOT help: WHERE order_date > '2025-01-01' (skips first column)
-- Covering index: includes all columns the query needs
-- The DB can answer the query entirely from the index without touching the table
CREATE INDEX idx_orders_covering
ON orders (customer_id, order_date)
INCLUDE (total_amount, status);
-- Unique index
CREATE UNIQUE INDEX idx_users_email
ON users (email);
-- Drop an index
DROP INDEX idx_customers_email;
How to Read EXPLAIN PLAN
-- PostgreSQL
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE customer_id = 42
AND order_date > '2025-01-01';
Key things to look for in the output:
| What you see | What it means | Good or Bad? |
|---|---|---|
Seq Scan | Full table scan — reading every row | Bad for large tables |
Index Scan | Using an index to find rows | Good |
Index Only Scan | Answered entirely from index (covering index) | Best |
Bitmap Index Scan | Index scan + bitmap for multiple conditions | Good |
Nested Loop | For each row in table A, scan table B | Bad for large tables |
Hash Join | Build hash table on smaller table, probe with larger | Good for equi-joins |
Sort | Sorting in memory or on disk | Watch for Sort Method: external merge (disk sort = slow) |
actual time | Real execution time in milliseconds | Lower = better |
rows | Estimated vs actual rows returned | Large mismatch = stale statistics, run ANALYZE |
-- Example EXPLAIN output (PostgreSQL):
-- Index Scan using idx_orders_cust_date on orders
-- Index Cond: (customer_id = 42)
-- Filter: (order_date > '2025-01-01')
-- Rows Removed by Filter: 12
-- actual time=0.045..0.089 rows=38 loops=1
-- Translation: Used the index on customer_id, then filtered by date.
-- Very fast (0.089 ms). Returned 38 rows.
Interview Tip
- Run EXPLAIN — find the bottleneck (Seq Scan? Sort on disk? Nested Loop?)
- Check indexes — is the WHERE/JOIN column indexed?
- Check statistics —
ANALYZE table_nameto update stats - Check query rewrite — Can you avoid
SELECT *? Can you push filters earlier?
What NOT to say
- "Just add an index on every column" — Too many indexes slow down writes and waste storage.
- "I'd look at the query and guess what's slow" — Always say you'd check EXPLAIN first.
SECTION 5: IN vs EXISTS vs JOIN — PERFORMANCE
Quick Comparison
| Feature | IN | EXISTS | JOIN |
|---|---|---|---|
| Use when | Checking against a small list or subquery with few results | Checking existence in a correlated subquery | You need columns from both tables |
| Returns | Matches from the list | TRUE/FALSE (stops at first match) | All matching rows (including duplicates if not careful) |
| NULL handling | Dangerous — IN (NULL) never matches | Safe — handles NULLs properly | Depends on join type |
| Performance | Good for small lists, bad for large subqueries | Good for large subqueries (short-circuits) | Usually best for large tables (optimizer handles well) |
Code Examples — Same Question, 3 Approaches
Question: Find customers who have placed at least one order.
-- Approach 1: IN
SELECT customer_name
FROM customers
WHERE customer_id IN (
SELECT customer_id FROM orders
);
-- Works fine for small orders table
-- PROBLEM: If orders.customer_id has NULLs, IN can behave unexpectedly
-- Approach 2: EXISTS (generally preferred for existence checks)
SELECT customer_name
FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.customer_id
);
-- Stops scanning orders as soon as it finds ONE match (short-circuit)
-- Handles NULLs safely
-- Approach 3: JOIN
SELECT DISTINCT c.customer_name
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;
-- Need DISTINCT because a customer with 5 orders appears 5 times
-- Best when you also need order columns in the output
The NULL Trap with IN and NOT IN
-- This is the classic trap interviewers love:
-- Suppose orders.customer_id has values: (1, 2, NULL)
-- NOT IN with NULLs: RETURNS NOTHING!
SELECT * FROM customers
WHERE customer_id NOT IN (SELECT customer_id FROM orders);
-- SQL evaluates: customer_id != 1 AND customer_id != 2 AND customer_id != NULL
-- Anything compared to NULL → UNKNOWN → entire WHERE clause → UNKNOWN → row excluded
-- Result: ZERO rows returned, regardless of data!
-- NOT EXISTS: Works correctly
SELECT * FROM customers c
WHERE NOT EXISTS (
SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id
);
-- Returns customers who genuinely have no orders
Performance Rules of Thumb
| Scenario | Best Choice | Why |
|---|---|---|
| Small static list | IN ('A', 'B', 'C') | Simple, readable |
| "Does a matching row exist?" | EXISTS | Short-circuits, NULL-safe |
| Large tables, need columns from both | JOIN | Optimizer can use hash/merge join |
| Anti-join (rows that DON'T match) | NOT EXISTS or LEFT JOIN ... IS NULL | NOT IN fails with NULLs |
Interview Tip
If asked "IN or EXISTS?", say: "For existence checks, I default to EXISTS because it short-circuits and handles NULLs safely. For small literal lists, IN is fine. But honestly, modern optimizers often generate the same execution plan for both — I'd verify with EXPLAIN."
What NOT to say
- "IN and EXISTS are the same thing" — They handle NULLs differently, and EXISTS short-circuits.
- "JOIN is always faster" — Not always. For a simple existence check, EXISTS avoids the need for DISTINCT.
SECTION 6: NULL HANDLING
What is NULL?
Definition: NULL means unknown or missing — it is NOT zero, NOT an empty string, NOT false.
Key rule: Any operation involving NULL produces NULL (with a few exceptions).
NULL Behavior Cheat Sheet
| Operation | Result | Why |
|---|---|---|
NULL = NULL | NULL (not TRUE!) | Unknown = Unknown → Unknown |
NULL != NULL | NULL (not TRUE!) | Same reason |
NULL > 5 | NULL | Can't compare unknown to 5 |
NULL + 10 | NULL | Unknown + 10 = Unknown |
NULL AND TRUE | NULL | Unknown AND TRUE = Unknown |
NULL OR TRUE | TRUE | Even if unknown is FALSE, TRUE OR FALSE = TRUE |
NULL AND FALSE | FALSE | Even if unknown is TRUE, TRUE AND FALSE = FALSE |
IS NULL vs = NULL
-- ❌ WRONG — this will NEVER find NULL rows
SELECT * FROM employees WHERE manager_id = NULL;
-- Evaluates to NULL (not TRUE), so no rows returned
-- ✅ CORRECT
SELECT * FROM employees WHERE manager_id IS NULL;
-- ✅ Also correct
SELECT * FROM employees WHERE manager_id IS NOT NULL;
NULL in Aggregations
-- Sample data:
-- sales: (100, 200, NULL, 300, NULL)
SELECT
COUNT(*) AS total_rows, -- 5 (counts ALL rows including NULLs)
COUNT(amount) AS non_null_count, -- 3 (skips NULLs)
SUM(amount) AS total, -- 600 (NULLs ignored)
AVG(amount) AS average, -- 200 (600/3, NOT 600/5!)
MIN(amount) AS minimum, -- 100 (NULLs ignored)
MAX(amount) AS maximum -- 300 (NULLs ignored)
FROM sales;
-- KEY INSIGHT: AVG ignores NULLs.
-- If you want NULLs treated as 0:
SELECT AVG(COALESCE(amount, 0)) FROM sales; -- 600/5 = 120
NULL in JOINs
-- NULLs NEVER match in JOINs
-- If orders.customer_id = NULL and customers.customer_id = NULL,
-- they will NOT join together because NULL = NULL → NULL (not TRUE)
SELECT *
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id;
-- Rows with NULL customer_id are silently dropped from both sides
COALESCE, NULLIF, IFNULL
-- COALESCE: Returns the first non-NULL value
SELECT COALESCE(phone, mobile, email, 'No Contact')
FROM employees;
-- If phone is NULL → try mobile → try email → fallback to 'No Contact'
-- COALESCE for safe calculations
SELECT
employee_name,
base_salary + COALESCE(bonus, 0) + COALESCE(commission, 0) AS total_comp
FROM employees;
-- Without COALESCE: if bonus is NULL, total_comp becomes NULL
-- NULLIF: Returns NULL if the two values are equal
SELECT NULLIF(actual_count, 0) -- Returns NULL if actual_count is 0
FROM inventory;
-- Useful to prevent division-by-zero:
SELECT total / NULLIF(count, 0) AS average -- Returns NULL instead of error
FROM summary;
-- IFNULL (MySQL) / ISNULL (SQL Server): Two-argument COALESCE
SELECT IFNULL(phone, 'N/A') FROM employees; -- MySQL
SELECT ISNULL(phone, 'N/A') FROM employees; -- SQL Server
SELECT COALESCE(phone, 'N/A') FROM employees; -- Standard SQL (use this)
Interview Tip
COUNT(*) vs COUNT(column). Always clarify: "COUNT(*) counts all rows; COUNT(column) skips NULLs." Then mention that AVG also skips NULLs, which changes the denominator.What NOT to say
- "NULL means zero" or "NULL means empty string" — NULL means unknown.
'' != NULLin most databases (except Oracle, where'' = NULL). - "I just use
WHERE column = NULL" — Must useIS NULL. This is a dealbreaker mistake.
SECTION 7: COMMON STRAIGHT QUESTIONS (Quick-Fire Q&A)
DELETE vs TRUNCATE vs DROP
| Feature | DELETE | TRUNCATE | DROP |
|---|---|---|---|
| What it does | Removes specific rows (with WHERE) | Removes all rows | Removes the entire table (structure + data) |
| WHERE clause | Yes | No | N/A |
| Logged | Fully logged (row by row) | Minimally logged (deallocates pages) | Fully logged |
| Rollback | Yes (within transaction) | Depends on DB (Yes in PostgreSQL, No in Oracle) | Depends on DB |
| Triggers fired | Yes | No | No |
| Resets identity/auto-increment | No | Yes | N/A |
| Speed | Slowest | Fast | Fast |
DELETE FROM orders WHERE order_date < '2020-01-01'; -- Remove old orders
TRUNCATE TABLE temp_staging; -- Empty staging table
DROP TABLE IF EXISTS old_backup; -- Remove table entirely
WHERE vs HAVING
| Feature | WHERE | HAVING |
|---|---|---|
| Filters | Individual rows BEFORE grouping | Groups AFTER GROUP BY |
| Can use aggregates? | No | Yes |
| Execution order | Runs first | Runs after GROUP BY |
-- WHERE filters rows, HAVING filters groups
SELECT department, COUNT(*) AS emp_count
FROM employees
WHERE status = 'active' -- Filter rows BEFORE grouping
GROUP BY department
HAVING COUNT(*) > 5; -- Filter groups AFTER grouping
Execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT
UNION vs UNION ALL
| Feature | UNION | UNION ALL |
|---|---|---|
| Duplicates | Removes duplicates | Keeps all rows (including duplicates) |
| Performance | Slower (needs sort/distinct) | Faster (no dedup step) |
| When to use | When you truly need unique results | Default choice — use this unless you need dedup |
-- UNION ALL is almost always what you want in data pipelines
SELECT name FROM employees_us
UNION ALL
SELECT name FROM employees_eu;
-- UNION only when dedup is needed
SELECT email FROM customers
UNION
SELECT email FROM newsletter_subscribers;
CHAR vs VARCHAR
| Feature | CHAR(n) | VARCHAR(n) |
|---|---|---|
| Storage | Fixed-length (always uses n bytes, padded with spaces) | Variable-length (uses only what's needed + overhead) |
| Performance | Slightly faster for fixed-length data | Better for variable-length data |
| Use for | Country codes (CHAR(2)), state codes (CHAR(2)), flags (CHAR(1)) | Names, emails, addresses — anything variable |
CREATE TABLE users (
country_code CHAR(2), -- Always exactly 2 chars: 'US', 'IN', 'UK'
email VARCHAR(255) -- Could be 5 chars or 200 chars
);
PRIMARY KEY vs UNIQUE KEY
| Feature | PRIMARY KEY | UNIQUE KEY |
|---|---|---|
| NULLs | Not allowed | Allowed (one NULL in most DBs, multiple in PostgreSQL) |
| How many per table | Only 1 | Multiple |
| Creates index | Yes (clustered by default in SQL Server/MySQL) | Yes (non-clustered) |
| Purpose | Uniquely identifies each row | Enforces uniqueness on alternate columns |
CREATE TABLE employees (
employee_id INT PRIMARY KEY, -- One per table, no NULLs
email VARCHAR(255) UNIQUE, -- Must be unique, but can be NULL
ssn VARCHAR(11) UNIQUE -- Another unique constraint
);
View vs Materialized View
| Feature | View | Materialized View |
|---|---|---|
| Stores data? | No (it's a saved query) | Yes (stores query results on disk) |
| Performance | Re-runs query every time | Fast reads (pre-computed) |
| Freshness | Always current | Stale until refreshed |
| Use case | Access control, simplify complex queries | Dashboards, expensive aggregations |
-- Regular View (no data stored)
CREATE VIEW active_customers AS
SELECT customer_id, name, email
FROM customers
WHERE status = 'active';
-- Materialized View (PostgreSQL)
CREATE MATERIALIZED VIEW monthly_sales AS
SELECT DATE_TRUNC('month', order_date) AS month,
SUM(total) AS revenue
FROM orders
GROUP BY 1;
-- Must manually refresh:
REFRESH MATERIALIZED VIEW monthly_sales;
Stored Procedure vs Function
| Feature | Stored Procedure | Function |
|---|---|---|
| Returns | Zero, one, or multiple result sets | Must return a single value or table |
| Used in SELECT? | No | Yes (SELECT my_function(col)) |
| Side effects | Can INSERT, UPDATE, DELETE | Usually read-only (varies by DB) |
| Transaction control | Can use BEGIN/COMMIT/ROLLBACK | Cannot (in most DBs) |
| Use case | Complex business logic, ETL steps | Calculations, transformations |
-- Stored Procedure (PostgreSQL)
CREATE PROCEDURE transfer_funds(sender INT, receiver INT, amount DECIMAL)
LANGUAGE plpgsql AS $$
BEGIN
UPDATE accounts SET balance = balance - amount WHERE id = sender;
UPDATE accounts SET balance = balance + amount WHERE id = receiver;
COMMIT;
END;
$$;
CALL transfer_funds(101, 202, 5000);
-- Function (PostgreSQL)
CREATE FUNCTION get_full_name(first_name TEXT, last_name TEXT)
RETURNS TEXT AS $$
BEGIN
RETURN first_name || ' ' || last_name;
END;
$$ LANGUAGE plpgsql;
SELECT get_full_name(first_name, last_name) FROM employees;
Trigger — What Is It?
Definition: A trigger is a block of code that automatically executes when a specific event (INSERT, UPDATE, DELETE) happens on a table.
-- Audit trigger: Log every salary change
CREATE TRIGGER log_salary_change
AFTER UPDATE OF salary ON employees
FOR EACH ROW
BEGIN
INSERT INTO salary_audit (employee_id, old_salary, new_salary, changed_at)
VALUES (OLD.employee_id, OLD.salary, NEW.salary, CURRENT_TIMESTAMP);
END;
Temp Table vs CTE vs Subquery
| Feature | Temp Table | CTE (WITH ... AS) | Subquery |
|---|---|---|---|
| Scope | Entire session (until dropped or session ends) | Single query only | Single query only |
| Indexed? | Yes (you can add indexes) | No | No |
| Materialized? | Yes (data stored in tempdb) | Usually not (inlined by optimizer) | Usually not |
| Reusable in same query? | Yes | Yes (reference multiple times) | No (must repeat) |
| Use case | Complex ETL, need intermediate results across multiple queries | Breaking complex queries into readable steps | Simple one-off filters |
-- Temp Table
CREATE TEMPORARY TABLE high_value_orders AS
SELECT * FROM orders WHERE total > 10000;
-- Can now run multiple queries against it, add indexes, etc.
-- CTE (preferred for interview coding)
WITH high_value AS (
SELECT * FROM orders WHERE total > 10000
),
customer_totals AS (
SELECT customer_id, SUM(total) AS total_spent
FROM high_value
GROUP BY customer_id
)
SELECT * FROM customer_totals WHERE total_spent > 50000;
-- Subquery (avoid for complex logic — hard to read)
SELECT *
FROM (
SELECT customer_id, SUM(total) AS total_spent
FROM orders
WHERE total > 10000
GROUP BY customer_id
) sub
WHERE total_spent > 50000;