ETL Patterns — Quick Recall

🔒

Unlock every deep-dive, lab, mock interview, and memory map across all 10 topics.

Already have a plan? Sign in

🗺️ Memory Map

How to use this file:

Reading strategy: Read Memory Maps FIRST → then Direct Questions → then Mid-Level.

🧠 MASTER MEMORY MAP — Day 2

🧠 ETL PIPELINE DESIGN = "MACS-L"

ETL PIPELINE DESIGN"MACS-L"

MMedallion Architecture (Bronze → Silver → Gold)

AAuto Loader (streaming file ingestion)

CCDC (Change Data Capture — Oracle → Kafka → Delta)

SSCD Type 2 (Slowly Changing Dimensions — history tracking)

LLakeflow Declarative Pipelines (formerly DLT)

MEDALLION"BSG" (like the TV show Battlestar Galactica)

BBronze (raw, as-is, append-only)

SSilver (clean, deduplicated, validated)

GGold (aggregated, business-ready)

AUTO LOADER"DNS"

DDirectory listing mode (small folders, simple)

NNotification mode (huge folders, uses Azure Event Grid)

SSchema evolution (auto-detects new columns → rescue)

🧠 BRONZE = Raw ingredients from market (dirty, uncut)

Think: COOKING ANALOGY

BRONZERaw ingredients from market (dirty, uncut)

→ Append only, never update

→ Add metadata: _ingested_at, _source_file, _batch_id

→ Engineers read this for debugging

SILVERWashed, cut, measured ingredients

→ Deduplicated (MERGE/upsert)

→ Schema validated, nulls handled

→ PII tagged (Unity Catalog)

→ Engineers + Scientists read this

GOLDFinal dish served to customer

→ Aggregated (daily revenue, route stats)

→ Star schema for BI tools

→ Power BI, Tableau read this

KEY DESIGN DECISIONS per layer:

Bronze: partition by ingestion_date (when data arrived)

Silver: cluster by busine