Day 2: Data Loading, Pipelines & Performance
✅ Pro Tip
Time: 5-6 hours | Priority: HIGH — Pipelines & performance are 30-35% of interviews
Context: Real-time booking ingestion, CDC from Oracle, BI dashboard performance
Approach: Simple explanation → analogy → technical details → code → interview tip
SECTION 1: DATA LOADING (1 hour)
Q1: What is COPY INTO? How is it different from INSERT?
Simple Explanation: COPY INTO is Snowflake's bulk loading command — it reads files from a stage (ADLS, S3, internal) and loads them into a table. It is MUCH faster than INSERT for large volumes because it runs in parallel across the warehouse.
Why do we need it? INSERT processes rows one by one — fine for 100 rows, terrible for 100 million rows. COPY INTO parallelizes across the warehouse and processes files in parallel batches.
Real-world analogy: INSERT = one person typing data manually. COPY INTO = 100 workers simultaneously moving pallets from a loading dock into the warehouse.
sql
-- Basic COPY INTO from external stage (Azure ADLS)
COPY INTO raw_bookings
FROM @azure_booking_stage/2026/03/25/ -- Read from this ADLS path
FILE_FORMAT = (
TYPE = 'PARQUET' -- File type
)
ON_ERROR = 'CONTINUE'; -- Skip bad files, continue loading
-- Options: ABORT_STATEMENT (default), CONTINUE, SKIP_FILE
-- COPY INTO with transformation (load + clean in one step)
COPY INTO silver_bookings (booking_id, booking_date, amount, airline_code)
FROM (
SELECT
$1:booking_id::VARCHAR, -- Cast from VARIANT
$1:booking_date::DATE,
$1:fare.amount::NUMBER(10,2),
UPPER($1:airline_code::VARCHAR) -- Transform during load
FROM @azure_booking_stage/bookings.json
)
FILE_FORMAT = (TYPE = 'JSON');
-- COPY INTO with error handling — see what failed
COPY INTO raw_bookings FROM @stage/
ON_ERROR = 'CONTINUE';
-- After loading, check what errors occurred:
SE