Production & Cost — Quick Recall

🔒

Unlock every deep-dive, lab, mock interview, and memory map across all 10 topics.

Already have a plan? Sign in

🗺️ Memory Map

How to use this file:

Reading strategy: Read Memory Maps FIRST → then Direct Questions → then Mid-Level.

🧠 MASTER MEMORY MAP — Day 4

🧠 PRODUCTION DATABRICKS = "WCCDS"

PRODUCTION DATABRICKS"WCCDS"

WWorkflows (job scheduling + orchestration)

CCI/CD (Declarative Automation Bundles + Azure DevOps)

CCost Management (5 areas to optimize)

DDebugging (Spark UI + execution plans)

SSystem Design (end-to-end platform architecture)

COST MANAGEMENT"CCSQM" (think: Cost Control Saves Quite Much)

CCompute (right cluster type, auto-terminate)

CCluster Policies (enforce limits on team spending)

SStorage (OPTIMIZE + VACUUM + lifecycle rules)

QQuery Optimization (Photon, caching, predicate pushdown)

MMonitoring + Chargeback (tag everything, show cost per team)

CI/CD = "DAB→Git → Azure DevOps → Deploy"

DABDeclarative Automation Bundles (YAML config files in Git)

Was called: Asset Bundles→renamed 2025

🧠 WORKFLOW = "Scheduled pipeline = series of tasks with dependencies"

WORKFLOW"Scheduled pipeline → series of tasks with dependencies"

Example daily pipeline:

Task 1: Bronze ingestion (6:00 AM)

↓ (depends on Task 1)

Task 2: Silver transformation (after Task 1)

↓ (depends on Task 2)

Task 3: Gold aggregation (after Task 2)

↓ (depends on Task 3)

Task 4: Data quality check (after Task 3)

TASK TYPES: Notebook, Python script, SQL, Lakeflow pipeline, dbt, JAR

KEY FEATURES

Repair Run→re-run ONLY failed tasks (don't restart everything!)

Task Values→pass data between tasks (dbutils.jobs.taskValues)

Table-Triggered→start