PySpark Interview — Question Bank

🔒

This section is locked

Unlock every deep-dive, lab, mock interview, and memory map across all 10 topics.

View Plans — from ₹299/month

Already have a plan? Sign in

PySpark Interview — Question Bank

Structured coding questions with full PySpark solutions Pattern: B=Basic | M=Medium | H=Hard | S=Scenario Add new questions at the bottom — never renumber existing ones

MASTER TRACKING TABLE

Q#	Question	Company	Level	Topic	Solved
Q01	Filter employees earning > 50k	General	B	filter/select	✓
Q02	Count orders per customer	General	B	groupBy/agg	✓
Q03	Find duplicate rows by email	General	B	dropDuplicates/groupBy	✓
Q04	Total sales by date	General	B	groupBy/sum	✓
Q05	Add a new derived column	General	B	withColumn	✓
Q06	Read CSV + handle nulls	General	B	na.fill/isNull	✓
Q07	Word count in text column	General	B	flatMap/reduceByKey	✓
Q08	Find max salary per department	Amazon/Google	M	groupBy/max	✓
Q09	Rank employees by salary per dept	Amazon	M	window/dense_rank	✓
Q10	Top 3 salaries per department	Amazon	M	window/dense_rank	✓
Q11	Second highest salary	Amazon	M	window/dense_rank	✓
Q12	Running total of revenue	General	M	window/sum	✓
Q13	7-day rolling average	General	M	window/avg	✓
Q14	Employees earning more than their manager	Google	M	self-join	✓
Q15	Customers with no orders (NOT IN)	General	M	left_anti join	✓
Q16	Deduplicate — keep latest record	General	M	window/row_number	✓
Q17	MoM revenue change (%)	General	M	window/lag	✓
Q18	Pivot: rows to columns	General	M	pivot	✓
Q19	Explode array column + count tags	General	M	explode/groupBy	✓
Q20	Find consecutive purchase days	Amazon	H	window/lag+filter	✓
Q21	Session ID assignment (30-min gap)	General	H	window/lag+sum	✓
Q22	Temperature rise from previous day	Amazon/Google	H	window/lag	✓
Q23	Longest streak of active

← PySpark — Confusions, Labs, Gotchas & Mock InterviewPrevious Practice Questions 📋