❄️
Snowflake
Day 3: Security, Sharing & New Features — Quick Recall Guide
❄️
❄️
Snowflake · Section 6 of 7

Day 3: Security, Sharing & New Features — Quick Recall Guide

Day 3: Security, Sharing & New Features — Quick Recall Guide

Must remember🔑Key concept⚠️Common trap🧠Memory Map📝One-liner

🧠 MASTER MEMORY MAP — Day 3

🧠 SECURITY = "RARM" (Role, Access, Row, Mask)
SECURITY"RARM" (Role, Access, Row, Mask)
RRBAC (Role-Based Access Control — roles, not users)
AACCOUNTADMIN hierarchy (5 system roles)
RRow Access Policies (filter rows per role)
MMasking Policies (hide column values per role)
5 SYSTEM ROLES = "ASUSP" (think: A Snowflake User Should Pick roles)
AACCOUNTADMIN (most powerful, use RARELY)
SSECURITYADMIN (manage roles + grants)
UUSERADMIN (create users + roles only)
SSYSADMIN (create DB/schema/table/warehouse)
PPUBLIC (everyone, minimum access)
COST CONTROL"WSCRA"
WWarehouses (right-size + auto-suspend)
SStorage (transient tables, reduce time travel)
CCredit monitoring (resource monitors)
RResource Monitors (hard spend limits)
AArchitecture (serverless where possible)
NEW 2025-2026 = "GCIHP"
GGen 2 Warehouses (2.1x faster, same price)
CCortex AI (LLMs in SQL, Cortex Search GA March 2026)
IIceberg + Polaris (open format, cross-engine)
HHybrid Tables (OLTP + OLAP = Unistore)
PPostgres (managed Postgres inside Snowflake, GA Feb 2026)

SECTION 1: RBAC

🧠 Memory Map: Role Hierarchy

📐 Architecture Diagram
          ACCOUNTADMIN  ← Top of pyramid, reserved for 2-3 people
               │
        ┌──────┴──────┐
   SECURITYADMIN   SYSADMIN   ← Most important admin roles
        │               │
   USERADMIN        Custom Role Hierarchy:
                         SYSADMIN
                           └── team_role (e.g., data_engineering)
                                 └── functional_role (e.g., bookings_reader)
                                       └── privileges on objects

RULE: Custom roles float UP to SYSADMIN
      So SYSADMIN (and ACCOUNTADMIN above it) always has oversight

WHO DOES WHAT:
    ACCOUNTADMIN → billing, account settings, emergency only
    SECURITYADMIN → grant/revoke privileges on objects
    USERADMIN → create users, create roles (no data privileges)
    SYSADMIN → create databases, schemas, tables, warehouses
    PUBLIC → minimum, given to all users automatically

⚡ MUST KNOW DIRECT QUESTIONS

Q1What is RBAC in Snowflake?

Role-Based Access Control — grant permissions to ROLES (not individual users), then assign roles to users. One role change updates all users who have it.

Q2What are the 5 system-defined roles?

ACCOUNTADMIN, SECURITYADMIN, USERADMIN, SYSADMIN, PUBLIC. ("ASUSP")

Q3What does each system role do?

  • ACCOUNTADMIN: see billing, all objects, account settings (use sparingly!)
  • SECURITYADMIN: manage grants (GRANT/REVOKE on all objects)
  • USERADMIN: create users and roles only (no data access)
  • SYSADMIN: create databases, schemas, warehouses, tables
  • PUBLIC: minimum role, auto-assigned to everyone

⚠️ Q4What is the #1 RBAC mistake?

Using ACCOUNTADMIN as the default role for data engineers. Best practice: grant ACCOUNTADMIN to 2-3 senior admins, set their DEFAULT ROLE to SYSADMIN. ACCOUNTADMIN should only be used when explicitly needed.

Q5What is FUTURE GRANTS?

GRANT SELECT ON FUTURE TABLES IN SCHEMA bookings TO ROLE analyst_role; — Automatically grants SELECT on any NEW tables created in that schema. Without this, every new table needs a manual grant.

Q6How do you switch roles?

USE ROLE analyst_role; — Switch to a different role in your session. Your access changes immediately.

🔑 MID-LEVEL QUESTIONS

Q7Design a role hierarchy for 200 engineers at Amadeus
🗺️Memory Map
```
ACCOUNTADMIN (2-3 people: CTO, Lead Architect)
SYSADMIN
data_engineering_role
bookings_writer_role (read+write bookings schema)
flights_writer_role (read+write flights schema)
analytics_role
bookings_reader_role (read-only bookings)
flights_reader_role (read-only flights)
airline_partner_role
lufthansa_role (row-filtered to LH data only)
air_india_role (row-filtered to AI data only)
Engineers → assigned data_engineering_role
Analysts → assigned analytics_role
Airline partners → assigned their airline-specific role
```

SECTION 2: DATA MASKING & ROW ACCESS

🧠 Memory Map: Security Policies

🧠 MASKING POLICY = "Who sees WHAT value in a column"
MASKING POLICY"Who sees WHAT value in a column"
Different rolessee different values from SAME column
Example: email column →
HR_ROLE sees: krishna@gmail.com (real)
ANALYST_ROLE sees: ***@gmail.com (partial mask)
PUBLIC sees: **REDACTED** (fully hidden)
CREATE MASKING POLICY p AS (val STRING) RETURNS STRING ->
CASE WHEN CURRENT_ROLE() IN ('HR') THEN val ELSE '***' END;
ALTER TABLE t MODIFY COLUMN email SET MASKING POLICY p;
ROW ACCESS POLICY"Who sees WHICH ROWS in a table"
Different rolessee different rows from SAME table
Example: bookings table →
LH_ROLE sees: only rows WHERE airline_code = 'LH'
AI_ROLE sees: only rows WHERE airline_code = 'AI'
ADMIN sees: ALL rows
CREATE ROW ACCESS POLICY p AS (airline VARCHAR) RETURNS BOOLEAN ->
EXISTS (SELECT 1 FROM mapping WHERE role = CURRENT_ROLE() AND airline = airline);
ALTER TABLE bookings ADD ROW ACCESS POLICY p ON (airline_code);
COMBINED: Apply BOTHairline partners see only their rows + masked PII

⚡ MUST KNOW DIRECT QUESTIONS

Q8What is Dynamic Data Masking?

A column-level security feature that shows different values to different roles. HR sees real email, analysts see ***@domain.com. Data is unchanged in storage — masking applies at query time.

Q9What is a Row Access Policy?

Controls which rows different roles can see. Lufthansa sees only their bookings, Air India sees only theirs, even from the same table. Applied via: ALTER TABLE t ADD ROW ACCESS POLICY p ON (airline_code).

⚠️ Q10Column masking vs Row Access — what's the difference?

  • Column masking: controls WHAT VALUE you see in a specific column (hide PII)
  • Row Access Policy: controls WHICH ROWS you see (multi-tenant data isolation)

Q11How does CURRENT_ROLE() work in policies?

CURRENT_ROLE() returns the active role of the user running the query. Used inside masking and row access policies to make decisions dynamically at query time.

SECTION 3: DATA SHARING

🧠 Memory Map: Secure Data Sharing

🧠 DATA SHARING = "PSR model" (Provider, Share, Recipient)
DATA SHARING"PSR model" (Provider, Share, Recipient)
PProvider: owns the data (Amadeus)
SShare object: defines what to share (tables/views)
RRecipient: Snowflake account that reads the data
No data is COPIED — recipient reads live from provider's storage!
STEPS
1. CREATE SHARE my_share;
2. GRANT USAGE ON DATABASE/SCHEMA/VIEW TO SHARE my_share;
3. ALTER SHARE my_share ADD ACCOUNTS = 'recipient.snowflake.com';
4. Recipient: CREATE DATABASE from SHARE;
TYPES
Direct Shareboth have Snowflake accounts
Reader Accountrecipient has NO Snowflake, Snowflake creates one
Data Marketplacepublic listing any Snowflake user can discover
Data Clean Roomjoint analytics without sharing raw PII (GDPR-safe)
KEY FACT: Provider pays STORAGE. Recipient pays COMPUTE.

⚡ MUST KNOW DIRECT QUESTIONS

Q12What is Secure Data Sharing?

Share live Snowflake data with another Snowflake account WITHOUT copying. Recipient queries your data directly using their own warehouse. No ETL, no copies, always fresh.

Q13What is a Reader Account?

A Snowflake-managed account created for a data recipient who doesn't have their own Snowflake. Snowflake creates and manages it; the recipient connects and queries shared data.

Q14What is a Data Clean Room?

A secure environment for joint data analysis without either party seeing the other's raw data. Example: Amadeus + airline analyze booking patterns without Amadeus seeing airline's customer list or vice versa. GDPR-safe.

⚠️ Q15Who pays for what in Data Sharing?

Provider pays for storage. Recipient pays for their own compute (virtual warehouse). Provider's compute is NOT used for recipient queries.

SECTION 4: COST MANAGEMENT

⚡ MUST KNOW DIRECT QUESTIONS

Q16What is a Resource Monitor?

A Snowflake object that sets credit limits on warehouses. Can trigger: notify at 75%, suspend at 100%. Prevents surprise bills. Apply per warehouse or account-wide.

Q17What is a Transient Table?

A table with NO fail-safe period (saves 7 days × storage cost). Use for staging/temp data that can be reloaded. CREATE TRANSIENT TABLE staging_data (...).

⚠️ Q18What is the biggest Snowflake cost mistake?

Not setting auto-suspend on warehouses. Warehouses running idle 24/7 consume credits even with no queries. Set auto-suspend to 60-120 seconds for dev, 5-10 minutes for BI.

Q19How do you optimize storage cost?

  1. Transient tables for staging (no fail-safe)
  2. Reduce time travel on non-critical tables (set to 0-1 day)
  3. Zero-copy clone for dev environments (no data duplication)
  4. Monitor: SNOWFLAKE.ACCOUNT_USAGE.TABLE_STORAGE_METRICS

SECTION 5: NEW 2025-2026 FEATURES

🧠 Memory Map: New Features

🧠 NEW FEATURES = "GCIHP"
NEW FEATURES"GCIHP"
GGEN 2 WAREHOUSES (2025):
2.1x faster analytics, 1.9x faster Iceberg scans
Same price as Gen 1pure performance gain
New warehouses default to Gen 2 automatically
CCORTEX AI (GA 2025, expanded 2026):
SNOWFLAKE.CORTEX.COMPLETE/SUMMARIZE/SENTIMENT/TRANSLATE/CLASSIFY
Cortex Search GA March 12, 2026 (multi-column, custom embeddings)
Cortex Analyst: natural languageSQL → results
IICEBERG + POLARIS (2025 GA):
Iceberg Tables: data in YOUR cloud storage, Snowflake queries it
Polaris Open Catalog: ANY Iceberg engine can connect
March 2026: Iceberg on Azure ADLS Gen2 (Preview)
Result: Snowflake + Databricks read same files, no ETL!
HHYBRID TABLES (GA 2025):
Row store (OLTP speed) + Column store (OLAP speed) in ONE table
Supports: primary keys, secondary indexes, foreign keys
Use: operational apps + analytics on same data without ETL
PPOSTGRES (GA February 2026):
Managed PostgreSQL inside Snowflake account
Application database (Postgres) + Analytics (Snowflake) = one platform
Snowflake manages the VM, OS, backups

⚡ MUST KNOW DIRECT QUESTIONS

Q20What are Gen 2 Warehouses?

Next-generation virtual warehouses with 2.1x better performance for analytics, 1.9x faster for Iceberg table scans. Same price as Gen 1. All new warehouses default to Gen 2.

Q21What is Cortex AI?

Snowflake's managed AI service. Run LLM functions directly in SQL: SNOWFLAKE.CORTEX.SENTIMENT(text), CORTEX.SUMMARIZE(text), CORTEX.COMPLETE('llama3', prompt). No data leaves Snowflake.

Q22What is Cortex Search? What's new in 2026?
Pro Tip
Hybrid search service (keyword + semantic vector search) on Snowflake text columns. GA March 12, 2026: now supports multiple searchable columns per service + pre-computed custom embeddings.
Q23What are Iceberg Tables?

Tables where data lives in YOUR cloud storage (ADLS/S3) in open Apache Iceberg format. Snowflake manages the catalog. Both Snowflake AND external engines (Databricks, Spark, Trino) read the same files.

Q24What is Polaris Open Catalog?

Snowflake's managed implementation of Apache Polaris — an open-source Iceberg REST catalog. Any Iceberg-compatible engine connects to it. One catalog, many engines, no data copies.

Q25What are Hybrid Tables?

Snowflake tables that combine a row store (for fast OLTP point lookups) with a column store (for OLAP analytics). Supports primary keys, indexes, foreign keys. Use case: operational + analytics in one table.

Q26What is Snowflake Postgres?

Managed PostgreSQL databases running inside Snowflake (GA February 2026). Each instance is a dedicated Postgres server VM managed by Snowflake. Application DB + Snowflake analytics = one unified platform.

🧠 FINAL REVISION — Day 3 Summary Card

📐 Architecture Diagram
┌──────────────────────────────────────────────────────────────────┐
│               DAY 3: SECURITY, SHARING & NEW FEATURES             │
├──────────────────────────────────────────────────────────────────┤
│                                                                  │
│  RBAC = roles, not users. 5 system roles: "ASUSP"               │
│  ACCOUNTADMIN = most powerful, use sparingly (⚠️ not daily use)  │
│  SECURITYADMIN = manage grants | SYSADMIN = create objects       │
│  Custom roles should inherit UP to SYSADMIN                     │
│                                                                  │
│  MASKING POLICY = different roles see different values           │
│  ROW ACCESS POLICY = different roles see different rows          │
│  Combine both = full data isolation (GDPR + multi-tenant)       │
│                                                                  │
│  DATA SHARING = PSR (Provider, Share, Recipient)                 │
│  No data copy — recipient reads live from provider storage       │
│  Provider pays storage, Recipient pays compute                  │
│  Reader Account = for recipients without Snowflake              │
│  Data Clean Room = joint analytics without raw PII exposure     │
│                                                                  │
│  COST = "WSCRA": Warehouses, Storage, Credits, Resource, Arch   │
│  ⚠️ Biggest mistake: forgetting auto-suspend!                    │
│  Transient tables = no fail-safe (cheaper storage)              │
│  Resource Monitors = hard credit limits per warehouse           │
│                                                                  │
│  NEW 2025-2026 = "GCIHP":                                        │
│  Gen2 (2.1x faster) | Cortex AI (LLM in SQL)                   │
│  Iceberg+Polaris (open format, cross-engine)                    │
│  Hybrid Tables (OLTP+OLAP) | Postgres (GA Feb 2026)            │
│  Iceberg on ADLS Gen2 (Preview March 2026 — great for Azure!)  │
│                                                                  │
│  TOP 5 THINGS TO SAY IN INTERVIEW:                               │
│  1. "RBAC: custom roles inherit up to SYSADMIN"                 │
│  2. "Masking = hide values; Row Access = hide rows (combine!)"  │
│  3. "Secure Sharing: live data, no copy, provider pays storage" │
│  4. "Resource monitors with hard limits prevent surprise bills" │
│  5. "Gen2, Dynamic Tables, Iceberg — shows I'm current in 2026" │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘

🧠 SNOWFLAKE ULTRA CHEAT SHEET (Read 10 min before interview)

📐 Architecture Diagram
┌──────────────────────────────────────────────────────────────────┐
│             ALL 3 DAYS — SNOWFLAKE LAST-MINUTE RECALL             │
├──────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ARCHITECTURE = 3 layers "BMS": Brain(Cloud Svcs) + Muscles(WH) │
│  + Storage. Key: compute & storage SEPARATE → scale each alone  │
│  Micro-partitions: AUTO, 50-500 MB, min/max metadata → pruning   │
│  Virtual WH: scale UP(slow query) vs OUT(many users)            │
│  Cache "RLS": Result(24h,free) → Local(SSD,cleared on suspend)  │
│  Time Travel: 0-90 days | Fail-safe: 7 days (Snowflake only)   │
│  Clone: instant copy, no data duplicated, shared micro-parts    │
│  VARIANT: any JSON, access with col:key::VARCHAR, FLATTEN arrays │
│                                                                  │
│  PIPELINES: "CSP" = COPY INTO → Snowpipe → Snowpipe Streaming   │
│  Streams: track INSERT/UPDATE/DELETE (UPDATE=2 rows!)           │
│  Tasks: cron jobs, chain with AFTER, RESUME only root task      │
│  Dynamic Tables: declarative ELT (replaces most Streams+Tasks)  │
│  TARGET_LAG min = 1 minute, INCREMENTAL by default             │
│  Snowpark: Python DataFrames run INSIDE Snowflake               │
│  Performance: "CQMWS" = Cluster, Query profile, MV, WH, Search │
│                                                                  │
│  SECURITY: "RARM" = RBAC, ACCOUNTADMIN hierarchy, Row, Mask    │
│  5 roles "ASUSP": ACCOUNTADMIN, SEC, USERADMIN, SYS, PUBLIC    │
│  Masking = hide column values | Row Access = hide rows          │
│  Sharing: PSR, no copy, provider=storage, recipient=compute     │
│  Cost: auto-suspend!, transient tables, resource monitors       │
│                                                                  │
│  NEW 2026 "GCIHP": Gen2, Cortex AI, Iceberg/Polaris,           │
│  Hybrid Tables (OLTP+OLAP), Postgres (GA Feb 2026)              │
│  Iceberg on ADLS Gen2 = open format for Azure-first teams       │
│                                                                  │
│  SNOWFLAKE vs DATABRICKS:                                        │
│  Snowflake = SQL warehouse, BI, sharing, minimal code           │
│  Databricks = ETL/ELT, Python, ML, streaming, open formats     │
│  BOTH TOGETHER: Databricks (ETL) → Snowflake (analytics/BI)    │
│                                                                  │
│  FRAME WITH TRAVEL DOMAIN:                                       │
│  "At Amadeus scale with 10B bookings/year, 200+ airlines..."   │
│  "For GDPR with passenger PII across 100+ countries..."        │
│  "For real-time fare pricing with sub-second latency..."        │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘