ETL Patterns — Quick Recall
🗺️ Memory Map
How to use this file:
- ⚡ = Must remember (95% chance of being asked)
- 🔑 = Key concept (core understanding needed)
- ⚠️ = Common trap (interviewers love to test this)
- 🧠 = Memory Map (mnemonic/acronym — memorize this!)
- 📝 = One-liner (flash-card style — cover answer, test yourself)
🧠 MASTER MEMORY MAP — Day 2
🧠 ETL PIPELINE DESIGN = "MACS-L"
ETL PIPELINE DESIGN"MACS-L"
MMedallion Architecture (Bronze → Silver → Gold)
AAuto Loader (streaming file ingestion)
CCDC (Change Data Capture — Oracle → Kafka → Delta)
SSCD Type 2 (Slowly Changing Dimensions — history tracking)
LLakeflow Declarative Pipelines (formerly DLT)
MEDALLION"BSG" (like the TV show Battlestar Galactica)
BBronze (raw, as-is, append-only)
SSilver (clean, deduplicated, validated)
GGold (aggregated, business-ready)
AUTO LOADER"DNS"
DDirectory listing mode (small folders, simple)
NNotification mode (huge folders, uses Azure Event Grid)
SSchema evolution (auto-detects new columns → rescue)
SECTION 1: MEDALLION ARCHITECTURE
🧠 Memory Map: Medallion Layers
🧠 BRONZE = Raw ingredients from market (dirty, uncut)
Think: COOKING ANALOGY
BRONZERaw ingredients from market (dirty, uncut)
→ Append only, never update
→ Add metadata: _ingested_at, _source_file, _batch_id
→ Engineers read this for debugging
SILVERWashed, cut, measured ingredients
→ Deduplicated (MERGE/upsert)
→ Schema validated, nulls handled
→ PII tagged (Unity Catalog)
→ Engineers + Scientists read this
GOLDFinal dish served to customer
→ Aggregated (daily revenue, route stats)
→ Star schema for BI tools
→ Power BI, Tableau read this
KEY DESIGN DECISIONS per layer:
Bronze: partition by ingestion_date (when data arrived)
Silver: cluster by busine