🧱
Databricks
Azure Databricks — Question Bank (L1/L2/L3)
🧱
🧱
Databricks · Section 16 of 18

Azure Databricks — Question Bank (L1/L2/L3)

🔒

This section is locked

Unlock every deep-dive, lab, mock interview, and memory map across all 10 topics.

View Plans — from ₹299/month

Already have a plan? Sign in

Azure Databricks — Question Bank (L1/L2/L3)

TOPIC 1: DELTA LAKE (Internals, Transaction Log, ACID, MERGE, OPTIMIZE, VACUUM, Z-ORDER, Liquid Clustering)

L1 — Direct / Simple Questions

  1. What is Delta Lake and why was it created?

    Open-source storage layer that brings ACID transactions, schema enforcement, and time travel to data lakes. Created to solve the reliability problems of raw data lakes (no transactions, no schema control, corrupt reads from concurrent writes).

  2. What file format does Delta Lake use under the hood?

    Apache Parquet files plus a JSON-based transaction log (_delta_log). Data is stored as Parquet; the log tracks which Parquet files are valid for each table version.

  3. What is the _delta_log directory and what does it contain?

    A directory inside every Delta table that stores the transaction log — a sequence of JSON files (one per commit) recording every add/remove of Parquet files. It is the single source of truth for the table's state.

  4. What are the four ACID properties and how does Delta Lake guarantee them?

    Atomicity (commits are all-or-nothing via the transaction log), Consistency (schema enforcement rejects bad writes), Isolation (optimistic concurrency control with conflict detection), Durability (data stored on cloud storage like ADLS Gen2).

  5. What is a checkpoint file in the Delta transaction log?

    A Parquet file created every 10 commits that snapshots the entire table state. It avoids reading all previous JSON commits from scratch — readers start from the latest checkpoint and replay only newer commits.

  6. What is schema enforcement in Delta Lake?

    Delta Lake rejects writes that don't match the table's schema (wrong column names, types, or missing required columns). It prevents silent data corruption by failing the write immedia