Production, CI/CD, Cost Management & Mock Interview
MEMORY MAP: PRODUCTION & CI/CD → DWCCM
How to use: Before answering ANY production/CI/CD question, mentally walk through DWCCM. Most questions touch 2-3 of these areas. Mentioning all five shows breadth.
SECTION 1: DATABRICKS WORKFLOWS & ORCHESTRATION (1 hour)
Q1: What is Databricks Workflows? How does it compare to Apache Airflow?
Simple Explanation: A workflow is a scheduled pipeline — a series of tasks that run in order. For example: "Every day at 6am, run bronze ingestion → then silver transformation → then gold aggregation."
Databricks Workflows is Databricks' built-in job scheduler. You define tasks (notebooks, SQL, Python scripts), set dependencies (task B runs after task A), and schedule them.
Apache Airflow is a separate open-source tool that does the same thing but works across ANY platform (not just Databricks). It's more flexible but requires more setup and maintenance.
Real-world analogy:
- Databricks Workflows = Your company's internal task management tool (simple, built-in, works only within your company)
- Apache Airflow = A universal project management tool (powerful, works everywhere, but you need to install and maintain it yourself)
| Aspect | Databricks Workflows | Apache Airflow |
|---|---|---|
| Setup | Zero — already built into Databricks | Re |