Day 3: Security, Data Sharing, Cost & New Features 2026
SECTION 1: RBAC — ROLE-BASED ACCESS CONTROL (1 hour)
Q1: What is RBAC in Snowflake? What are the system roles?
Simple Explanation: RBAC (Role-Based Access Control) is how Snowflake controls who can access what. Instead of giving permissions directly to users, you give permissions to roles, and then assign roles to users. This makes it easy to manage 1000 users — change the role once, all users inheriting it are updated.
Real-world analogy: In a hotel, instead of giving each employee their own set of physical keys, you give them a "key card type" (role). Reception key cards open lobby, rooms, storage. Manager cards open everything. If you change what a Reception card can open, ALL receptionists are automatically updated.
The System Roles (Snowflake's built-in roles):
-- CREATING A CUSTOM ROLE HIERARCHY (best practice)
USE ROLE USERADMIN;
-- Create domain roles
CREATE ROLE bookings_read_role; -- Can only read booking data
CREATE ROLE bookings_write_role; -- Can read + write
CREATE ROLE analyst_role; -- Cross-domain read access
-- Grant privileges to roles (done by SECURITYADMIN)
USE ROLE SECURITYADMIN;
GRANT USAGE ON DATABASE analytics_db TO ROLE bookings_read_role;
GRANT USAGE ON SCHEMA analytics_db.bookings TO ROLE bookings_read_role;
GRANT SELECT ON ALL TABLES IN SCHEMA analytics_db.bookings TO ROLE bookings_read_role;
-- Future tables: automatically grant when new tables are created
GRANT SELECT ON FUTURE TABLES IN SCHEMA analytics_db.bookings TO ROLE bookings_read_role;
-- Hierarchy: write role inherits from read role
GRANT ROLE bookings_read_role TO ROLE bookings_write_role;
GRANT INSERT, UPDATE ON ALL TABLES IN SCHEMA analytics_db.bookings TO ROLE bookings_write_role;
-- Analyst role gets read across schemas
GRANT ROLE bookings_read_role TO ROLE analyst_role;
-- Best practice: assign custom roles UP to SYSADMIN (so SYSADMIN owns them)
GRANT ROLE bookings_read_role TO ROLE SYSADMIN;
GRANT ROLE bookings_write_role TO ROLE SYSADMIN;
-- Assign roles to users
GRANT ROLE analyst_role TO USER krishna_yadav;
GRANT ROLE bookings_write_role TO USER data_pipeline_sp; -- Service Principal
-- User switches roles
USE ROLE analyst_role;
SELECT * FROM analytics_db.bookings.fact_flights; -- Works!
Interview tip: "At Amadeus with 200+ engineers, I'd design 3 tiers of roles: functional roles (bookings_reader, flights_writer), environment roles (prod_access, staging_access), and team roles (data_engineering_role, analytics_role). Team roles inherit functional roles. Engineers are assigned team roles. Separation of duties: SECURITYADMIN manages roles, SYSADMIN manages objects."
Q2: What is Data Masking? How does it work?
Simple Explanation: Dynamic Data Masking hides sensitive column values from users who shouldn't see them — WITHOUT changing the actual stored data. The mask is applied at query time based on the user's role.
Real-world analogy: Imagine a table with passenger emails. You put a one-way mirror on the email column. Most people see a reflection (masked value). HR managers see through the mirror (real value). The data itself is unchanged — the mirror changes based on who's looking.
-- STEP 1: Create a masking policy
CREATE MASKING POLICY email_mask AS (val STRING) RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN ('HR_ROLE', 'PRIVACY_OFFICER_ROLE')
THEN val -- Show full email to privileged roles
WHEN CURRENT_ROLE() IN ('ANALYST_ROLE')
THEN REGEXP_REPLACE(val, '.+\@', '***@') -- Show ***@domain.com
ELSE
'**REDACTED**' -- Everyone else: fully redacted
END;
-- This policy is a SQL function — runs at query time for every row
-- STEP 2: Apply masking policy to a column
ALTER TABLE passengers
MODIFY COLUMN email
SET MASKING POLICY email_mask;
-- Now the email column is masked based on the querying role
-- Test: as ANALYST_ROLE
USE ROLE analyst_role;
SELECT email FROM passengers;
-- Returns: ***@gmail.com (masked)
-- Test: as HR_ROLE
USE ROLE hr_role;
SELECT email FROM passengers;
-- Returns: krishna@gmail.com (real value)
-- MULTIPLE PII COLUMNS:
ALTER TABLE passengers MODIFY COLUMN phone SET MASKING POLICY phone_mask;
ALTER TABLE passengers MODIFY COLUMN passport_no SET MASKING POLICY passport_mask;
-- VIEW: See all masking policies applied
SELECT * FROM INFORMATION_SCHEMA.POLICY_REFERENCES
WHERE POLICY_KIND = 'MASKING_POLICY';
Q3: What are Row Access Policies?
Simple Explanation: Row Access Policies control which ROWS a user can see — different users see different rows from the same table. Column masking = show/hide columns. Row Access = show/hide rows.
Real-world analogy: At Amadeus, 200+ airline partners all share the same bookings table. Lufthansa should ONLY see their own bookings, not Air India's. Row Access Policy = a security gate on every row that checks "is this your booking?" before letting you through.
-- STEP 1: Create a mapping table (user role → airline they can see)
CREATE TABLE airline_access_mapping (
role_name VARCHAR,
airline_code VARCHAR
);
INSERT INTO airline_access_mapping VALUES
('LUFTHANSA_ROLE', 'LH'),
('AIR_INDIA_ROLE', 'AI'),
('EMIRATES_ROLE', 'EK'),
('ADMIN_ROLE', NULL); -- NULL = see ALL airlines
-- STEP 2: Create Row Access Policy
CREATE ROW ACCESS POLICY airline_row_policy
AS (airline_code VARCHAR) RETURNS BOOLEAN ->
EXISTS (
SELECT 1
FROM airline_access_mapping m
WHERE m.role_name = CURRENT_ROLE()
AND (m.airline_code = airline_code -- Match role to airline
OR m.airline_code IS NULL) -- Or admin (sees all)
);
-- STEP 3: Apply to table
ALTER TABLE bookings
ADD ROW ACCESS POLICY airline_row_policy ON (airline_code);
-- Test: Lufthansa user
USE ROLE LUFTHANSA_ROLE;
SELECT * FROM bookings; -- Returns ONLY LH bookings, even though table has all airlines
-- Test: Admin
USE ROLE ADMIN_ROLE;
SELECT * FROM bookings; -- Returns ALL bookings
SECTION 2: DATA SHARING (1 hour)
Q4: What is Secure Data Sharing? How does it work?
Simple Explanation: Secure Data Sharing lets you share your Snowflake data with another Snowflake account — WITHOUT copying data. The recipient queries your tables directly, always getting live data. They pay for their own compute; you pay for your storage.
Real-world analogy: Instead of emailing a spreadsheet copy to a partner airline (stale data, security risk, multiple copies), you give them a secure link to YOUR Google Sheet. They always see the latest version. You control what they can access. No copy exists.
-- PROVIDER SIDE (Amadeus shares booking summary with Lufthansa)
-- Step 1: Create a share object
CREATE SHARE lufthansa_share
COMMENT = 'Lufthansa booking analytics share';
-- Step 2: Grant database and schema to share
GRANT USAGE ON DATABASE analytics_db TO SHARE lufthansa_share;
GRANT USAGE ON SCHEMA analytics_db.bookings TO SHARE lufthansa_share;
-- Step 3: Grant specific tables/views to share (NOT entire database)
GRANT SELECT ON VIEW analytics_db.bookings.lufthansa_bookings_vw TO SHARE lufthansa_share;
-- Share a VIEW (not the raw table) so you control exactly what they see
-- View can filter: WHERE airline_code = 'LH' — Lufthansa sees only their data
-- Step 4: Add recipient (Lufthansa's Snowflake account)
ALTER SHARE lufthansa_share ADD ACCOUNTS = 'lufthansa.eu-west.snowflakecomputing.com';
-- RECIPIENT SIDE (Lufthansa's Snowflake account)
-- Step 1: Create database from the share
CREATE DATABASE amadeus_share FROM SHARE amadeus.lufthansa_share;
-- Step 2: Query it like a normal database!
SELECT * FROM amadeus_share.bookings.lufthansa_bookings_vw;
-- Returns Amadeus's live data directly — no copy!
-- Lufthansa pays their own compute, Amadeus pays storage
Data Sharing types:
| Type | Use Case |
|---|---|
| Direct Share | Both parties have Snowflake accounts |
| Reader Account | Recipient doesn't have Snowflake — Snowflake creates a managed account for them |
| Data Marketplace | Publish datasets publicly for any Snowflake customer to discover |
| Data Clean Room | Share data for joint analysis WITHOUT revealing raw records (GDPR safe) |
SECTION 3: COST MANAGEMENT
Q5: How do you manage and optimize Snowflake costs?
The COST CONTROL FRAMEWORK ("WSCRA"):
-- W: WAREHOUSE OPTIMIZATION
ALTER WAREHOUSE etl_wh SET
AUTO_SUSPEND = 120 -- Suspend after 2 minutes idle (default is 10 min)
AUTO_RESUME = TRUE; -- Auto-start on query (don't forget this!)
-- Create right-sized warehouses per workload
CREATE WAREHOUSE dev_wh SIZE = 'XSMALL' AUTO_SUSPEND = 60; -- Dev: tiny, fast suspend
CREATE WAREHOUSE bi_wh SIZE = 'MEDIUM' AUTO_SUSPEND = 300; -- BI: keep warm longer (cache!)
CREATE WAREHOUSE etl_wh SIZE = 'LARGE' AUTO_SUSPEND = 120; -- ETL: large but fast suspend
-- S: STORAGE OPTIMIZATION
-- Transient tables: NO fail-safe (7-day window removed), cheaper storage
-- Use for: staging tables, temp tables, data you can reload
CREATE TRANSIENT TABLE staging_bookings (...); -- No fail-safe period
CREATE TRANSIENT SCHEMA staging; -- All tables in schema are transient
-- Reduce time travel for non-critical tables (reduce storage cost)
ALTER TABLE staging_bookings SET DATA_RETENTION_TIME_IN_DAYS = 0;
ALTER TABLE audit_log SET DATA_RETENTION_TIME_IN_DAYS = 7; -- Only 7 days
ALTER TABLE fact_bookings SET DATA_RETENTION_TIME_IN_DAYS = 30; -- 30 days for prod
-- C & R: RESOURCE MONITORS (spend limits with alerts)
CREATE RESOURCE MONITOR monthly_cap
WITH CREDIT_QUOTA = 1000 -- 1000 credits per month max
FREQUENCY = MONTHLY
START_TIMESTAMP = IMMEDIATELY
TRIGGERS
ON 75 PERCENT DO NOTIFY -- Email alert at 75% usage
ON 90 PERCENT DO NOTIFY -- Email alert at 90%
ON 100 PERCENT DO SUSPEND; -- Auto-suspend all warehouses at 100%!
ALTER WAREHOUSE bi_wh SET RESOURCE_MONITOR = monthly_cap;
-- Now BI warehouse auto-suspends if team exceeds 1000 credits/month
Cost breakdown reference:
SECTION 4: NEW FEATURES 2025-2026
Q6: What are the major Snowflake features from 2025-2026?
Simple Explanation: Snowflake had a massive product year in 2025-2026, moving hard into AI and open formats. Key themes: AI-native (Cortex AI everywhere), open format (Iceberg/Polaris), and OLTP (Hybrid Tables). Mentioning these in interviews shows you're current.
Gen 2 Warehouses (2025)
Cortex AI (GA 2025 → expanded 2026)
-- Use Cortex AI in a SQL query
SELECT
booking_id,
customer_feedback,
SNOWFLAKE.CORTEX.SENTIMENT(customer_feedback) AS sentiment_score,
-- Returns -1 (negative) to +1 (positive)
SNOWFLAKE.CORTEX.SUMMARIZE(customer_feedback) AS feedback_summary,
SNOWFLAKE.CORTEX.CLASSIFY_TEXT(customer_feedback,
['complaint', 'compliment', 'inquiry']) AS feedback_category
FROM customer_feedback_table
WHERE booking_date > '2026-01-01';
-- ALL processing happens inside Snowflake — no API calls, no data leaving
Apache Iceberg Tables + Polaris Open Catalog (GA 2025-2026)
-- Create an Iceberg table (data in YOUR ADLS, Snowflake manages catalog)
CREATE ICEBERG TABLE iceberg_bookings (
booking_id VARCHAR,
booking_date DATE,
airline_code VARCHAR,
amount NUMBER(10,2)
)
CATALOG = 'SNOWFLAKE' -- Snowflake is the catalog
EXTERNAL_VOLUME = 'my_adls_vol' -- Where files stored in ADLS
BASE_LOCATION = 'bookings/'; -- Path within volume
-- Now both Snowflake AND Databricks can read the same Parquet+Iceberg files!
Hybrid Tables / Unistore (GA 2025)
-- Create a Hybrid Table (OLTP+OLAP)
CREATE HYBRID TABLE live_bookings (
booking_id VARCHAR PRIMARY KEY, -- Primary key (enforced, indexed)
passenger_id VARCHAR,
booking_date DATE,
amount NUMBER(10,2),
status VARCHAR,
INDEX idx_passenger (passenger_id) -- Secondary index (fast lookup)
);
-- Fast single-row lookup (OLTP-style)
SELECT * FROM live_bookings WHERE booking_id = 'BK12345678';
-- Uses row store → milliseconds response
-- Analytics (OLAP-style)
SELECT DATE_TRUNC('month', booking_date), SUM(amount)
FROM live_bookings
GROUP BY 1;
-- Uses column store → fast scan
Snowflake Postgres (GA February 2026)
SECTION 5: MOCK INTERVIEW — TOP 10 QUESTIONS
Q7: 10 Most-Likely Snowflake Interview Questions
Mock Q1: "How is Snowflake different from Redshift or Synapse?"
Mock Q2: "How would you design a data sharing solution for 200 airline partners?"
"I'd use Snowflake Secure Data Sharing with a dedicated view per airline:
- Create row-access-filtered views:
WHERE airline_code = 'LH'— each airline sees only their data - Create a Share per airline:
CREATE SHARE lh_share; GRANT SELECT ON VIEW lh_view TO SHARE - For airlines without Snowflake: create Reader Accounts (Snowflake manages a free account for them)
- For cross-airline analysis (anonymized): use Data Clean Rooms (GDPR-safe joint analytics) No data copies — always live. Revoke access instantly by removing from Share."
Mock Q3: "How do you handle GDPR right to be forgotten?"
"Two approaches: Hard delete or pseudonymization.
Hard delete: DELETE FROM table WHERE passenger_id = 'P123' on all tables, then wait for natural time travel expiry (or set DATA_RETENTION = 0 temporarily).
⚠️ In Snowflake you CANNOT force time travel data deletion like Databricks VACUUM — you must wait for the retention period to expire. So set retention to 1 day or 0 days for GDPR-sensitive tables.
Pseudonymization (preferred): Replace real PII with a hash/UUID at Silver layer. Analytics still work, but the original identity is gone. Easier to reverse pseudonymization if needed."
Mock Q4: "A query takes 10 minutes, how do you debug it?"
"Systematic approach using Snowflake's tools:
- Query Profile: identify slowest operator — table scan? join? aggregation?
- Check partition pruning:
partitions_scannedvspartitions_total— high ratio = need clustering key - Check for spill to disk: means warehouse too small → scale up
- Check bytes from cache: low cache hit = warehouse suspended recently → increase auto-suspend
- Result cache: if same query runs often → should be using result cache (check why not)"
Mock Q5: "What's your experience with Streams and Tasks?"
"I've used Streams + Tasks for incremental ELT pipelines. A stream on Bronze table captures CDC events from our Oracle→Snowpipe pipeline. A Task runs every 5 minutes, reads the stream, and applies a MERGE to Silver table. For SCD Type 2 dimensions like passenger address history, I use Standard Streams (need both INSERT + DELETE for UPDATE operations to track old/new values). However, for simpler transformations, I now prefer Dynamic Tables — same result, one-tenth the code. I still use Streams+Tasks specifically for SCD Type 2 and sub-minute latency requirements."
Mock Q6: "How do Snowflake Virtual Warehouses scale?"
"Two axes: Scale UP (bigger size: M→L→XL) for slow individual queries — doubles compute each step. Scale OUT (multi-cluster warehouses: MIN=1, MAX=5) for concurrency — handles many users queuing. For BI dashboards with 500 analysts, I'd use multi-cluster with ECONOMY scaling policy (adds clusters at sustained load, removes quickly). For overnight ETL, single XL warehouse is more cost-efficient. Key: don't mix ETL and BI on same warehouse — workload isolation with separate warehouses."
Mock Q7: "What are Snowflake's new features you're excited about?"
"Three stand out for me in 2026:
- Gen 2 Warehouses — 2.1x performance on same pricing. Especially for MERGE operations on large tables, this directly cuts my ETL runtime without cost increase.
- Dynamic Tables — replaces complex Streams+Tasks patterns with declarative SQL. Reduced our Silver layer pipeline from 150 lines of procedural code to 3
CREATE DYNAMIC TABLEstatements. - Iceberg + Polaris Open Catalog with ADLS Gen2 support — This is huge for Azure-first companies. Our data lives in ADLS; now both Snowflake AND Databricks Spark can read the same Iceberg files without ETL between platforms."
Mock Q8: "How would you implement column-level security for PII?"
-- Create masking policy
CREATE MASKING POLICY pii_email_mask AS (val STRING) RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN ('PRIVACY_OFFICER', 'DATA_STEWARD') THEN val
WHEN CURRENT_ROLE() IN ('ANALYST_ROLE') THEN REGEXP_REPLACE(val, '.+@', '***@')
ELSE '**REDACTED**'
END;
-- Apply to column
ALTER TABLE passengers MODIFY COLUMN email SET MASKING POLICY pii_email_mask;
-- Combine with Row Access Policy (airlines see only their rows + masked PII)
Mock Q9: "COPY INTO vs INSERT — when to use which?"
"COPY INTO for bulk loading files (100K+ rows from a stage) — parallelized, fast, idempotent. INSERT for small transactional inserts (1-1000 rows), or for Hybrid Tables (OLTP operations). Never use INSERT to load a CSV file — it's 100x slower than COPY INTO for large files."
Mock Q10: "How do you reduce Snowflake costs for a team of 200 engineers?"
"Three immediate wins: First, auto-suspend all warehouses aggressively (60-120 seconds for dev, 5 minutes for BI). Second, create separate warehouses per workload and right-size each (dev=XS, BI=M, ETL=L) — don't let everyone share one large warehouse. Third, use transient tables and zero-day time travel for staging data — 20-30% storage savings. Longer term: resource monitors with hard credit limits per team, tagging all warehouses with team+project for chargeback, and reviewing Query History weekly for poorly-written queries that consume excessive credits."