Day 2: Pipelines & Performance — Quick Recall Guide
Must rememberKey conceptCommon trapMemory MapOne-liner
🧠 MASTER MEMORY MAP — Day 2
🧠 DATA LOADING = "CSP" (Cheap→Smart→Push)
DATA LOADING"CSP" (Cheap → Smart → Push)
CCOPY INTO (batch, cheapest, manual trigger)
SSnowpipe (auto-ingest, ~1 min latency, event-driven)
PSnowpipe Streaming (push via API, sub-second, Kafka)
CDC INSIDE SNOWFLAKE"Streams + Tasks" OR "Dynamic Tables"
Streams = track what CHANGED in a table (INSERT/UPDATE/DELETE)
Tasks = scheduled SQL (run stream processor every N min)
Dynamic Tables = newer, declarative, replaces most Streams+Tasks
STREAMS"A camera watching your table"
3 metadata columns:
METADATA$ACTION = 'INSERT' or 'DELETE'
METADATA$ISUPDATE = TRUE if row is part of an UPDATE
METADATA$ROW_ID = unique row identifier
UPDATE = DELETE old + INSERT new (two rows in stream!)
TASKS"Cron jobs inside Snowflake"
Root task: has SCHEDULE
Child tasks: have AFTER parent_task
Only RESUME root task — children auto-follow
Serverless tasks: no WAREHOUSE needed (10% cheaper)
DYNAMIC TABLES"Declarative ELT — you say WHAT, Snowflake does HOW"
CREATE DYNAMIC TABLE ... TARGET_LAG = '5 minutes' AS SELECT ...
Min lag = 1 minute
INCREMENTAL refresh by default (only processes changes)
Cannot do SCD Type 2 (use Streams+Tasks for that)
PERFORMANCE"CQMWS"
CClustering key (organize micro-partitions by filter columns)
QQuery Profile (find bottleneck: pruning? memory? compute?)
MMaterialized Views (pre-compute heavy aggregations)
WWarehouse sizing (scale UP for slow, scale OUT for concurrency)
SSearch Optimization (point lookups: WHERE id = 'X')
SECTION 1: DATA LOADING
⚡ MUST KNOW DIRECT QUESTIONS
Q1What is COPY INTO?
Snowflake's bulk loading command — reads files from a stage and loads into a table in parallel. Much faster than INSERT