Day 3: Performance, Security + Cloud Migration — Deep Interview Guide

🔒

Unlock every deep-dive, lab, mock interview, and memory map across all 10 topics.

Already have a plan? Sign in

🧠 MASTER MEMORY MAP — Day 3

🧠 HADOOP SECURITY = "KRK" (Kerberos → Ranger → Knox):

HADOOP SECURITY"KRK" (Kerberos → Ranger → Knox):

KKerberos: AUTHENTICATION (who are you?)

RRanger: AUTHORIZATION (what can you do?)

KKnox: GATEWAY (how do you get in? SSL + API proxy)

PERFORMANCE TUNING LAYERS"JYH" (JVM → YARN → Hadoop):

JJVM tuning (heap sizes, GC, JVM reuse)

YYARN tuning (container sizes, scheduler config)

HHadoop-level (block size, replication, compression)

CLOUD MIGRATION PATTERNS"LRR" (Lift → Replatform → Refactor):

LLift-and-Shift: HDFS → S3/ADLS (same MapReduce/Hive, just different storage)

RReplatform: MapReduce → Spark (same data in cloud, better processing)

RRefactor: Hive → Delta Lake / Snowflake (rebuild for cloud-native)

HADOOP vs SPARK = "DISK vs RAM":

Hadoop MapReduce: disk-based, fault-tolerant, Java-only

Spark: in-memory (100x faster), Python/SQL/Scala, streaming + batch

When to still use Hadoop: legacy code, can't migrate budget, HBase is on-prem

KERBEROSMIT protocol for distributed authentication

WITHOUT KERBEROS (plain Hadoop)

User: "I am root"

Hadoop: "OK, here's all the data" ← takes your word for it!

Security: ZERO

WITH KERBEROS

User: must have valid Kerberos ticket (cryptographically signed by KDC)

Hadoop: verifies ticket with KDC before granting access

Can't fake identity→proper enterprise security

HOW KERBEROS WORKS (simplified)

KDCKey Distribution Center (the central auth server)

Authentication Service (AS): verifies your password

Ticket Granti