PySpark Interview — Question Bank
Structured coding questions with full PySpark solutions Pattern: B=Basic | M=Medium | H=Hard | S=Scenario Add new questions at the bottom — never renumber existing ones
MASTER TRACKING TABLE
| Q# | Question | Company | Level | Topic | Solved |
|---|---|---|---|---|---|
| Q01 | Filter employees earning > 50k | General | B | filter/select | ✓ |
| Q02 | Count orders per customer | General | B | groupBy/agg | ✓ |
| Q03 | Find duplicate rows by email | General | B | dropDuplicates/groupBy | ✓ |
| Q04 | Total sales by date | General | B | groupBy/sum | ✓ |
| Q05 | Add a new derived column | General | B | withColumn | ✓ |
| Q06 | Read CSV + handle nulls | General | B | na.fill/isNull | ✓ |
| Q07 | Word count in text column | General | B | flatMap/reduceByKey | ✓ |
| Q08 | Find max salary per department | Amazon/Google | M | groupBy/max | ✓ |
| Q09 | Rank employees by salary per dept | Amazon | M | window/dense_rank | ✓ |
| Q10 | Top 3 salaries per department | Amazon | M | window/dense_rank | ✓ |
| Q11 | Second highest salary | Amazon | M | window/dense_rank | ✓ |
| Q12 | Running total of revenue | General | M | window/sum | ✓ |
| Q13 | 7-day rolling average | General | M | window/avg | ✓ |
| Q14 | Employees earning more than their manager | M | self-join | ✓ | |
| Q15 | Customers with no orders (NOT IN) | General | M | left_anti join | ✓ |
| Q16 | Deduplicate — keep latest record | General | M | window/row_number | ✓ |
| Q17 | MoM revenue change (%) | General | M | window/lag | ✓ |
| Q18 | Pivot: rows to columns | General | M | pivot | ✓ |
| Q19 | Explode array column + count tags | General | M | explode/groupBy | ✓ |
| Q20 | Find consecutive purchase days | Amazon | H | window/lag+filter | ✓ |
| Q21 | Session ID assignment (30-min gap) | General | H | window/lag+sum | ✓ |
| Q22 | Temperature rise from previous day | Amazon/Google | H | window/lag | ✓ |
| Q23 | Longest streak of active |