Day 3: Performance, Security + Cloud Migration — Quick Recall Guide

🔒

This section is locked

Unlock every deep-dive, lab, mock interview, and memory map across all 10 topics.

View Plans — from ₹299/month

Already have a plan? Sign in

Day 3: Performance, Security + Cloud Migration — Quick Recall Guide

⚡Must remember🔑Key concept⚠️Common trap🧠Memory Map📝One-liner

🧠 MASTER MEMORY MAP — Day 3

🧠 HADOOP SECURITY = "KRK" (Kerberos → Ranger → Knox):

HADOOP SECURITY"KRK" (Kerberos → Ranger → Knox):

KKerberos: AUTHENTICATION (who are you? → TGT from KDC)

RRanger: AUTHORIZATION (what can you do? → table/column/row policies)

KKnox: GATEWAY (how do you get in? → SSL + single entry point)

PERFORMANCE TUNING"JYH-COMP":

JJVM: heap sizes (NN heap = 1 GB per million files!), G1GC

YYARN: container sizes, node memory, scheduler config

HHadoop: block size, handler count, short-circuit reads

CCompression: Snappy shuffle, GZIP/ZLIB archive

OOptimize sort buffer: 100 MB → 512 MB (mapreduce.task.io.sort.mb)

MMore reducers: 1 → nodes×containers×0.95

PParallel shuffle copies: parallelcopies default 5 → 50

CLOUD MIGRATION"LRR":

LLift-and-Shift: HDFS → S3/ADLS + same code (fastest, least benefit)

RReplatform: MapReduce → Spark (same data, better processing)

RRefactor: Hive → Delta Lake + Databricks (full modernization)

HADOOP vs SPARK = "Disk vs RAM":

MapReduce: disk-based, slow, Java only, fault-tolerant

Spark: in-memory, 100x faster, Python/SQL/Scala, streaming+batch

SECTION 1: SECURITY — DIRECT QUESTIONS

⚡ Q1What is Kerberos in Hadoop?

Authentication protocol. Users/services prove identity to KDC (Key Distribution Center) and get cryptographic tickets. Without Kerberos, Hadoop accepts any claimed identity — zero security.

Q2What is a keytab file?

File containing pre-stored Kerberos credentials for a service account. Used by automated processes (Oozie jobs, cron) to authenticate without a password prompt. kinit -kt /etc/keytabs/hive.keytab hive/host@REALM

⚠️ Q3What happens if a Kerberos ticket expires during a long-running job?

Job fails with "Au

← Day 2: Hive + Hadoop Ecosystem — Deep Interview GuidePrevious Day 3: Performance, Security + Cloud Migration — Deep Interview GuideNext →