4-Day Azure Databricks Interview Prep
AMADEUS CONTEXT (Frame Every Answer With This)
- Domain: Travel industry — flight bookings, pricing, passenger data (PII/GDPR)
- Scale: Billions of transactions/day, 200+ airlines, 100+ countries
- Cloud: Microsoft Azure (strategic partnership with Amadeus)
- Legacy: Oracle databases, Java-based core systems → migrating to modern data stack
- Key Concerns: Data governance (GDPR), low-latency pricing, CDC from legacy systems, cost at scale
- Stack from JD: Databricks, Azure Synapse, Azure DevOps, Kafka, Hadoop/Hive, BigQuery, Sqoop
Tip: Always use travel domain examples — bookings, passenger records, flight schedules, fare pricing, loyalty programs
4-DAY SCHEDULE
Memory Map
DAY 16-7 hoursDELTA LAKE + LAKEHOUSE FOUNDATION
Delta Lake Internals (transaction log, checkpoints, ACID)
MERGE INTO — all scenarios + optimization
OPTIMIZE / VACUUM / Z-ORDER / Liquid Clustering
Deletion Vectors, Schema Evolution, Table Properties
Time Travel & Recovery
Lakehouse Architecture (vs Data Lake vs Warehouse)
NEW 2025-2026: Delta Lake 4.x features, Predictive Optimization
DAY 26-7 hoursETL PATTERNS + PIPELINE DESIGN
Medallion Architecture (Bronze/Silver/Gold) — design decisions
SCD Type 1, 2, 3 implementation in Databricks SQL
CDC patterns (Oracle → Kafka → Delta Lake)
Auto Loader (modes, schema evolution, vs COPY INTO)
Lakeflow Declarative Pipelines (formerly DLT)
Scenario-Based Pipeline Design (Amadeus-style)
Pipeline Monitoring & Data Quality Framework
DAY 36-7 hoursAZURE DATABRICKS PLATFORM + GOVERNANCE
Unity Catalog (hierarchy, RBAC, row/column security, lineage)
NEW: ABAC (Attribute-Based Access Control)
NEW: Predictive Optimization
Delta Sharing (open protocol, cross-org)
Photon Engine (when to use, when not)
NEW: Serverless Compute & Serverless Workspaces
Azure-Specific: ADLS Gen2, Key Vault, Service Principals
Managed vs External Tables in Unity Catalog
Data Governance at Scale (GDPR, PII masking)
DAY 45-6 hoursPRODUCTION, CI/CD, COST + MOCK INTERVIEWS
Databricks Workflows & Orchestration (vs Airflow)
Declarative Automation Bundles (formerly Asset Bundles) — CI/CD
Azure DevOps + Databricks integration
Job Cluster vs All-Purpose vs Serverless cost comparison
Cost Management Strategies
NEW: Multi-table Transactions, Lakebase, Compatibility Mode
System Design Questions (leadership-level)
Production Debugging Scenarios
MOCK: 10 most-likely Amadeus interview questions
PRIORITY MATRIX
MUST KNOW (Will definitely be asked — 60%)
- Delta Lake MERGE + SCD Type 2 (write code)
- Medallion Architecture — design decisions per layer
- Unity Catalog — 3-level namespace, security model
- Auto Loader + Lakeflow Declarative Pipelines basics
- CDC pipeline design (Oracle → Kafka → Delta Lake)
- OPTIMIZE / VACUUM / Z-ORDER vs Liquid Clustering
- Performance Tuning — Spark UI, partition tuning, data skew
SHOULD KNOW (High probability — 25%)
- Photon Engine — when it helps/doesn't
- Databricks Workflows vs Airflow
- Job Cluster vs All-Purpose vs Serverless
- Azure integration (ADLS Gen2, Key Vault, Service Principals)
- Data Governance / GDPR handling
- CI/CD with Declarative Automation Bundles
- Schema Evolution, Change Data Feed
NICE TO KNOW (Differentiators — 15%)
- ABAC (Attribute-Based Access Control) — new 2025
- Serverless Workspaces (GA Jan 2026)
- Lakebase (GA Azure Mar 2026)
- Multi-table Transactions (BEGIN ATOMIC)
- Delta Lake 4.x features (Variant type, Type Widening)
- Predictive Optimization
- Iceberg interoperability via Unity Catalog
FILES
| Day | File | Hours |
|---|---|---|
| 1 | DB_01_Delta_Lake_Deep_Dive.md | 6-7h |
| 2 | DB_02_ETL_Pipelines_Databricks.md | 6-7h |
| 3 | DB_03_Azure_Platform_Governance.md | 6-7h |
| 4 | DB_04_Production_CICD_MockInterview.md | 5-6h |
LEARNING APPROACH
Since you are NEW to Databricks but experienced in data engineering, every question follows this pattern:
🧠 INTERVIEW TIP → How to answer this confidently
WHAT IS IT?→Simple 2-3 line explanation in plain English
WHY DO WE NEED IT?→Real problem it solves (with travel/booking example)
HOW DOES IT WORK?→Technical details + code with comments on every line
WHEN TO USE / NOT USE?→Practical decision guide
INTERVIEW TIPHow to answer this confidently
You learn the basics WHILE studying the interview questions — no separate basics doc needed.
Every code block has comments explaining WHAT each line does and WHY. Every concept has a real-world analogy (like a travel booking system).
HOW TO USE
- Read the DB_ file for each day — basics are embedded inside each question
- Every code block has comments — read the comments to understand the basics
- For deeper PySpark details, refer to your existing files (01, 02)
- Practice answering OUT LOUD — senior interviews test communication
- Frame every answer with Amadeus context: "In a travel booking pipeline..."
- Day 4 Mock questions — time yourself, 3-5 minutes per answer
- Key interview pattern: SCENARIO > DEFINITION. They won't ask "What is Unity Catalog?" — they'll ask "How would you govern PII data across 50 teams?"