Day 3: Security, Sharing & New Features — Quick Recall Guide
🧠 MASTER MEMORY MAP — Day 3
SECTION 1: RBAC
🧠 Memory Map: Role Hierarchy
ACCOUNTADMIN ← Top of pyramid, reserved for 2-3 people
│
┌──────┴──────┐
SECURITYADMIN SYSADMIN ← Most important admin roles
│ │
USERADMIN Custom Role Hierarchy:
SYSADMIN
└── team_role (e.g., data_engineering)
└── functional_role (e.g., bookings_reader)
└── privileges on objects
RULE: Custom roles float UP to SYSADMIN
So SYSADMIN (and ACCOUNTADMIN above it) always has oversight
WHO DOES WHAT:
ACCOUNTADMIN → billing, account settings, emergency only
SECURITYADMIN → grant/revoke privileges on objects
USERADMIN → create users, create roles (no data privileges)
SYSADMIN → create databases, schemas, tables, warehouses
PUBLIC → minimum, given to all users automatically
⚡ MUST KNOW DIRECT QUESTIONS
Role-Based Access Control — grant permissions to ROLES (not individual users), then assign roles to users. One role change updates all users who have it.
ACCOUNTADMIN, SECURITYADMIN, USERADMIN, SYSADMIN, PUBLIC. ("ASUSP")
- ACCOUNTADMIN: see billing, all objects, account settings (use sparingly!)
- SECURITYADMIN: manage grants (GRANT/REVOKE on all objects)
- USERADMIN: create users and roles only (no data access)
- SYSADMIN: create databases, schemas, warehouses, tables
- PUBLIC: minimum role, auto-assigned to everyone
Using ACCOUNTADMIN as the default role for data engineers. Best practice: grant ACCOUNTADMIN to 2-3 senior admins, set their DEFAULT ROLE to SYSADMIN. ACCOUNTADMIN should only be used when explicitly needed.
GRANT SELECT ON FUTURE TABLES IN SCHEMA bookings TO ROLE analyst_role; — Automatically grants SELECT on any NEW tables created in that schema. Without this, every new table needs a manual grant.
USE ROLE analyst_role; — Switch to a different role in your session. Your access changes immediately.
🔑 MID-LEVEL QUESTIONS
SECTION 2: DATA MASKING & ROW ACCESS
🧠 Memory Map: Security Policies
⚡ MUST KNOW DIRECT QUESTIONS
A column-level security feature that shows different values to different roles. HR sees real email, analysts see ***@domain.com. Data is unchanged in storage — masking applies at query time.
Controls which rows different roles can see. Lufthansa sees only their bookings, Air India sees only theirs, even from the same table. Applied via: ALTER TABLE t ADD ROW ACCESS POLICY p ON (airline_code).
- Column masking: controls WHAT VALUE you see in a specific column (hide PII)
- Row Access Policy: controls WHICH ROWS you see (multi-tenant data isolation)
CURRENT_ROLE() returns the active role of the user running the query. Used inside masking and row access policies to make decisions dynamically at query time.
SECTION 3: DATA SHARING
🧠 Memory Map: Secure Data Sharing
⚡ MUST KNOW DIRECT QUESTIONS
Share live Snowflake data with another Snowflake account WITHOUT copying. Recipient queries your data directly using their own warehouse. No ETL, no copies, always fresh.
A Snowflake-managed account created for a data recipient who doesn't have their own Snowflake. Snowflake creates and manages it; the recipient connects and queries shared data.
A secure environment for joint data analysis without either party seeing the other's raw data. Example: Amadeus + airline analyze booking patterns without Amadeus seeing airline's customer list or vice versa. GDPR-safe.
Provider pays for storage. Recipient pays for their own compute (virtual warehouse). Provider's compute is NOT used for recipient queries.
SECTION 4: COST MANAGEMENT
⚡ MUST KNOW DIRECT QUESTIONS
A Snowflake object that sets credit limits on warehouses. Can trigger: notify at 75%, suspend at 100%. Prevents surprise bills. Apply per warehouse or account-wide.
A table with NO fail-safe period (saves 7 days × storage cost). Use for staging/temp data that can be reloaded. CREATE TRANSIENT TABLE staging_data (...).
Not setting auto-suspend on warehouses. Warehouses running idle 24/7 consume credits even with no queries. Set auto-suspend to 60-120 seconds for dev, 5-10 minutes for BI.
- Transient tables for staging (no fail-safe)
- Reduce time travel on non-critical tables (set to 0-1 day)
- Zero-copy clone for dev environments (no data duplication)
- Monitor:
SNOWFLAKE.ACCOUNT_USAGE.TABLE_STORAGE_METRICS
SECTION 5: NEW 2025-2026 FEATURES
🧠 Memory Map: New Features
⚡ MUST KNOW DIRECT QUESTIONS
Next-generation virtual warehouses with 2.1x better performance for analytics, 1.9x faster for Iceberg table scans. Same price as Gen 1. All new warehouses default to Gen 2.
Snowflake's managed AI service. Run LLM functions directly in SQL: SNOWFLAKE.CORTEX.SENTIMENT(text), CORTEX.SUMMARIZE(text), CORTEX.COMPLETE('llama3', prompt). No data leaves Snowflake.
Tables where data lives in YOUR cloud storage (ADLS/S3) in open Apache Iceberg format. Snowflake manages the catalog. Both Snowflake AND external engines (Databricks, Spark, Trino) read the same files.
Snowflake's managed implementation of Apache Polaris — an open-source Iceberg REST catalog. Any Iceberg-compatible engine connects to it. One catalog, many engines, no data copies.
Snowflake tables that combine a row store (for fast OLTP point lookups) with a column store (for OLAP analytics). Supports primary keys, indexes, foreign keys. Use case: operational + analytics in one table.
Managed PostgreSQL databases running inside Snowflake (GA February 2026). Each instance is a dedicated Postgres server VM managed by Snowflake. Application DB + Snowflake analytics = one unified platform.
🧠 FINAL REVISION — Day 3 Summary Card
┌──────────────────────────────────────────────────────────────────┐ │ DAY 3: SECURITY, SHARING & NEW FEATURES │ ├──────────────────────────────────────────────────────────────────┤ │ │ │ RBAC = roles, not users. 5 system roles: "ASUSP" │ │ ACCOUNTADMIN = most powerful, use sparingly (⚠️ not daily use) │ │ SECURITYADMIN = manage grants | SYSADMIN = create objects │ │ Custom roles should inherit UP to SYSADMIN │ │ │ │ MASKING POLICY = different roles see different values │ │ ROW ACCESS POLICY = different roles see different rows │ │ Combine both = full data isolation (GDPR + multi-tenant) │ │ │ │ DATA SHARING = PSR (Provider, Share, Recipient) │ │ No data copy — recipient reads live from provider storage │ │ Provider pays storage, Recipient pays compute │ │ Reader Account = for recipients without Snowflake │ │ Data Clean Room = joint analytics without raw PII exposure │ │ │ │ COST = "WSCRA": Warehouses, Storage, Credits, Resource, Arch │ │ ⚠️ Biggest mistake: forgetting auto-suspend! │ │ Transient tables = no fail-safe (cheaper storage) │ │ Resource Monitors = hard credit limits per warehouse │ │ │ │ NEW 2025-2026 = "GCIHP": │ │ Gen2 (2.1x faster) | Cortex AI (LLM in SQL) │ │ Iceberg+Polaris (open format, cross-engine) │ │ Hybrid Tables (OLTP+OLAP) | Postgres (GA Feb 2026) │ │ Iceberg on ADLS Gen2 (Preview March 2026 — great for Azure!) │ │ │ │ TOP 5 THINGS TO SAY IN INTERVIEW: │ │ 1. "RBAC: custom roles inherit up to SYSADMIN" │ │ 2. "Masking = hide values; Row Access = hide rows (combine!)" │ │ 3. "Secure Sharing: live data, no copy, provider pays storage" │ │ 4. "Resource monitors with hard limits prevent surprise bills" │ │ 5. "Gen2, Dynamic Tables, Iceberg — shows I'm current in 2026" │ │ │ └──────────────────────────────────────────────────────────────────┘
🧠 SNOWFLAKE ULTRA CHEAT SHEET (Read 10 min before interview)
┌──────────────────────────────────────────────────────────────────┐ │ ALL 3 DAYS — SNOWFLAKE LAST-MINUTE RECALL │ ├──────────────────────────────────────────────────────────────────┤ │ │ │ ARCHITECTURE = 3 layers "BMS": Brain(Cloud Svcs) + Muscles(WH) │ │ + Storage. Key: compute & storage SEPARATE → scale each alone │ │ Micro-partitions: AUTO, 50-500 MB, min/max metadata → pruning │ │ Virtual WH: scale UP(slow query) vs OUT(many users) │ │ Cache "RLS": Result(24h,free) → Local(SSD,cleared on suspend) │ │ Time Travel: 0-90 days | Fail-safe: 7 days (Snowflake only) │ │ Clone: instant copy, no data duplicated, shared micro-parts │ │ VARIANT: any JSON, access with col:key::VARCHAR, FLATTEN arrays │ │ │ │ PIPELINES: "CSP" = COPY INTO → Snowpipe → Snowpipe Streaming │ │ Streams: track INSERT/UPDATE/DELETE (UPDATE=2 rows!) │ │ Tasks: cron jobs, chain with AFTER, RESUME only root task │ │ Dynamic Tables: declarative ELT (replaces most Streams+Tasks) │ │ TARGET_LAG min = 1 minute, INCREMENTAL by default │ │ Snowpark: Python DataFrames run INSIDE Snowflake │ │ Performance: "CQMWS" = Cluster, Query profile, MV, WH, Search │ │ │ │ SECURITY: "RARM" = RBAC, ACCOUNTADMIN hierarchy, Row, Mask │ │ 5 roles "ASUSP": ACCOUNTADMIN, SEC, USERADMIN, SYS, PUBLIC │ │ Masking = hide column values | Row Access = hide rows │ │ Sharing: PSR, no copy, provider=storage, recipient=compute │ │ Cost: auto-suspend!, transient tables, resource monitors │ │ │ │ NEW 2026 "GCIHP": Gen2, Cortex AI, Iceberg/Polaris, │ │ Hybrid Tables (OLTP+OLAP), Postgres (GA Feb 2026) │ │ Iceberg on ADLS Gen2 = open format for Azure-first teams │ │ │ │ SNOWFLAKE vs DATABRICKS: │ │ Snowflake = SQL warehouse, BI, sharing, minimal code │ │ Databricks = ETL/ELT, Python, ML, streaming, open formats │ │ BOTH TOGETHER: Databricks (ETL) → Snowflake (analytics/BI) │ │ │ │ FRAME WITH TRAVEL DOMAIN: │ │ "At Amadeus scale with 10B bookings/year, 200+ airlines..." │ │ "For GDPR with passenger PII across 100+ countries..." │ │ "For real-time fare pricing with sub-second latency..." │ │ │ └──────────────────────────────────────────────────────────────────┘