🐍

Python

Python Strings & Puzzles — Data Engineer Interview Prep

🐍

Python · Section 1 of 5

Python Strings & Puzzles — Data Engineer Interview Prep

💡 Interview Tip

Not testing you as a Python developer — testing if you THINK in Python. Data engineers use PySpark, but interviewers check: can you solve a quick string problem without Google?

Memory Map

🧠 STRING PUZZLES → DRALF

STRING PUZZLESDRALF

─────────────────────────

DDuplicates (find repeated chars)

RReverse (string, words, sentence)

AAnagram (same letters, diff order)

LLongest (substring without repeating)

FFrequency (char count, most common)

Q01 — Find Duplicate Characters in a String

Question: Given a string, find all characters that appear more than once.

Sample Input: "programming" Expected Output: ['g', 'm', 'r'] (sorted)

Solution 1: Brute Force — Nested Loop (Easy to Understand)

python — editable

def find_duplicates_brute(s):
    # Compare every character with every other character
    # If they match and we haven't recorded it yet, it's a duplicate
    duplicates = []
    for i in range(len(s)):
        for j in range(i + 1, len(s)):
            if s[i] == s[j] and s[i] not in duplicates:
                duplicates.append(s[i])
    return sorted(duplicates)

print(find_duplicates_brute("programming"))
# Output: ['g', 'm', 'r']

Solution 2: Using a Set to Track Seen Characters (Intermediate)

python — editable

def find_duplicates_set(s):
    # Walk through the string one character at a time
    # Keep a 'seen' set — if a char is already in seen, it's a duplicate
    seen = set()
    duplicates = set()
    for char in s:
        if char in seen:
            duplicates.add(char)
        seen.add(char)
    return sorted(duplicates)

print(find_duplicates_set("programming"))
# Output: ['g', 'm', 'r']

Solution 3: Counter — Optimal / Pythonic (Interview Answer)

python — editable

from collections import Counter

def find_duplicates(s):
    # Step 1: Counter scans the string once and builds a dict of {char: count}
    counts = Counter(s)
    print("Counter output:", counts)
    # Output: Counter output: Counter({'g': 2, 'r': 2, 'm': 2, 'p': 1, 'o': 1, 'a': 1, 'i': 1, 'n': 1})
    # ↑ Now we can SEE: 'g', 'r', 'm' each appear 2 times, rest appear 1 time

    # Step 2: Filter — keep only chars where count > 1
    dupes = [char for char, count in counts.items() if count > 1]
    print("Before sorting:", dupes)
    # Output: Before sorting: ['g', 'r', 'm']

    # Step 3: Sort for consistent output (dicts don't guarantee order)
    return sorted(dupes)

print("Result:", find_duplicates("programming"))
# Output: Result: ['g', 'm', 'r']

Interview Tip: "Start with Counter for clarity, then offer the set-based approach — shows you know both library tools AND manual logic. Mention that Counter is O(n) time and O(k) space where k is unique characters."

What NOT to Say: "I'll use a nested loop to compare each char with every other char" — That's O(n^2). Use a hash set or Counter for O(n).

Q02 — Check if Two Strings Are Anagrams

Question: Two strings are anagrams if they contain the exact same characters in the same frequency, just rearranged. Check if two given strings are anagrams.

Sample Input: s1 = "listen", s2 = "silent" Expected Output: True

Sample Input: s1 = "hello", s2 = "world" Expected Output: False

Solution 1: Sort and Compare (Easy to Understand)

python — editable

def is_anagram_sort(s1, s2):
    # If both strings, when sorted, produce the same sequence
    # then they must have the same characters in the same frequency
    return sorted(s1.lower()) == sorted(s2.lower())

print(is_anagram_sort("listen", "silent"))  # Output: True
print(is_anagram_sort("hello", "world"))    # Output: False

Solution 2: Manual Frequency Count with a Dictionary (Intermediate)

python — editable

def is_anagram_manual(s1, s2):
    # Build frequency maps for both strings manually
    # Then compare the two maps
    s1, s2 = s1.lower(), s2.lower()
    if len(s1) != len(s2):
        return False

    freq = {}
    for char in s1:
        freq[char] = freq.get(char, 0) + 1
    for char in s2:
        freq[char] = freq.get(char, 0) - 1

    # If all counts are zero, strings are anagrams
    return all(v == 0 for v in freq.values())

print(is_anagram_manual("listen", "silent"))  # Output: True
print(is_anagram_manual("hello", "world"))    # Output: False

Solution 3: Counter — Optimal / Pythonic (Interview Answer)

python — editable

from collections import Counter

def is_anagram(s1, s2):
    # Counter builds a frequency dict — comparing two Counters is O(n)
    c1 = Counter(s1.lower())
    c2 = Counter(s2.lower())
    print("Step 1 - Counter(s1):", c1)
    # Output: Step 1 - Counter(s1): Counter({'l': 1, 'i': 1, 's': 1, 't': 1, 'e': 1, 'n': 1})
    print("Step 2 - Counter(s2):", c2)
    # Output: Step 2 - Counter(s2): Counter({'s': 1, 'i': 1, 'l': 1, 'e': 1, 'n': 1, 't': 1})
    # ↑ Same keys, same counts — just different order. So they're equal!
    return c1 == c2

print(is_anagram("listen", "silent"))  # Output: True
print(is_anagram("hello", "world"))    # Output: False

Bonus: Group Anagrams (Common Follow-Up)

Question: Given a list of words, group them so that all anagrams are together.

Sample Input: ["eat", "tea", "tan", "ate", "nat", "bat"] Expected Output: [['ate', 'eat', 'tea'], ['bat'], ['nat', 'tan']]

Solution 1: Brute Force — Compare Every Pair (Easy to Understand)

python — editable

def group_anagrams_brute(words):
    # For each word, compare with every other word to see if they're anagrams
    # Two words are anagrams if sorting their characters gives the same result
    used = [False] * len(words)
    result = []

    for i in range(len(words)):
        if used[i]:
            continue
        group = [words[i]]
        used[i] = True

        for j in range(i + 1, len(words)):
            if not used[j] and sorted(words[i].lower()) == sorted(words[j].lower()):
                group.append(words[j])
                used[j] = True

        result.append(sorted(group))

    print(sorted(result))
    # Output: [['ate', 'eat', 'tea'], ['bat'], ['nat', 'tan']]
    return sorted(result)

group_anagrams_brute(["eat", "tea", "tan", "ate", "nat", "bat"])

Solution 2: Plain Dict — Manual Key Check (Intermediate)

python — editable

def group_anagrams_dict(words):
    # Same idea as brute force BUT smarter:
    # Sort each word → use it as a dictionary key → group words with same key
    groups = {}

    for word in words:
        key = ''.join(sorted(word.lower()))
        # Manual check: if key doesn't exist yet, create empty list
        if key not in groups:
            groups[key] = []
        groups[key].append(word)

    print("Groups dict:", groups)
    # Output: Groups dict: {'aet': ['eat', 'tea', 'ate'], 'ant': ['tan', 'nat'], 'abt': ['bat']}

    return sorted([sorted(g) for g in groups.values()])

print(group_anagrams_dict(["eat", "tea", "tan", "ate", "nat", "bat"]))
# Output: [['ate', 'eat', 'tea'], ['bat'], ['nat', 'tan']]

Solution 3: defaultdict — Optimal / Pythonic (Interview Answer)

python — editable

from collections import defaultdict

def group_anagrams(words):
    # defaultdict(list) auto-creates empty list for new keys — no if-check needed
    groups = defaultdict(list)

    for word in words:
        # Step 1: sorted() splits word into individual chars and sorts them
        sorted_chars = sorted(word.lower())
        print(f"  sorted('{word}') → {sorted_chars}")
        # Output:   sorted('eat') → ['a', 'e', 't']
        # Output:   sorted('tea') → ['a', 'e', 't']   ← same as 'eat'!
        # Output:   sorted('tan') → ['a', 'n', 't']
        # Output:   sorted('ate') → ['a', 'e', 't']   ← same as 'eat'!
        # Output:   sorted('nat') → ['a', 'n', 't']   ← same as 'tan'!
        # Output:   sorted('bat') → ['a', 'b', 't']

        # Step 2: join() glues the sorted chars back into one string — becomes dict KEY
        key = ''.join(sorted_chars)
        print(f"  ''.join({sorted_chars}) → '{key}'")
        # Output:   ''.join(['a', 'e', 't']) → 'aet'

        # Step 3: Words with the same key land in the same bucket
        groups[key].append(word)

    # Let's see what the groups dict looks like inside
    print("\nGroups dict:", dict(sorted(groups.items())))
    # Output: Groups dict: {'abt': ['bat'], 'aet': ['eat', 'tea', 'ate'], 'ant': ['tan', 'nat']}
    # ↑ 'eat', 'tea', 'ate' all sorted to 'aet' — so they grouped together!

    # Step 4: Extract the grouped lists, sort each group and the overall result
    result = sorted([sorted(g) for g in groups.values()])
    print("Final result:", result)
    # Output: Final result: [['ate', 'eat', 'tea'], ['bat'], ['nat', 'tan']]
    return result

group_anagrams(["eat", "tea", "tan", "ate", "nat", "bat"])

Interview Tip: "Start with the sorted-key insight, then show defaultdict for clean grouping. Sorting each word is O(k log k) where k is word length, total O(n × k log k). Mention that Counter as key works too but sorted string is simpler to explain."

What NOT to Say: "I'll check if every character in s1 exists in s2" — This doesn't handle frequency. "aab" and "abb" would wrongly match.

Q03 — Longest Substring Without Repeating Characters

Question: Given a string, find the length of the longest contiguous substring where no character appears more than once.

Sample Input: "abcabcbb" Expected Output: 3 (the substring is "abc")

Sample Input: "bbbbb" Expected Output: 1 (the substring is "b")

Sample Input: "pwwkew" Expected Output: 3 (the substring is "wke")

Solution 1: Brute Force — Check All Substrings (Easy to Understand)

python — editable

def longest_unique_brute(s):
    # Generate every possible substring
    # Check if it has all unique characters
    # Track the longest one
    max_len = 0
    for i in range(len(s)):
        for j in range(i + 1, len(s) + 1):
            substring = s[i:j]
            if len(substring) == len(set(substring)):
                max_len = max(max_len, len(substring))
    return max_len

print(longest_unique_brute("abcabcbb"))  # Output: 3
print(longest_unique_brute("bbbbb"))     # Output: 1
print(longest_unique_brute("pwwkew"))    # Output: 3

Solution 2: Sliding Window with a Set (Intermediate)

python — editable

def longest_unique_set(s):
    # Use two pointers (left, right) to define a window
    # Expand right — if duplicate found, shrink from left
    char_set = set()
    left = 0
    max_len = 0

    for right in range(len(s)):
        # If char already in window, remove from left until it's gone
        while s[right] in char_set:
            char_set.remove(s[left])
            left += 1
        char_set.add(s[right])
        max_len = max(max_len, right - left + 1)

    return max_len

print(longest_unique_set("abcabcbb"))  # Output: 3
print(longest_unique_set("bbbbb"))     # Output: 1
print(longest_unique_set("pwwkew"))    # Output: 3

Solution 3: Sliding Window with a HashMap — Optimal (Interview Answer)

python — editable

def longest_unique_substring(s):
    # Store the last index of each character in a dict
    # When we see a duplicate, jump left pointer past the previous occurrence
    # This avoids the inner while-loop of Solution 2 — true O(n)
    char_index = {}
    left = 0
    max_len = 0

    for right, char in enumerate(s):
        if char in char_index and char_index[char] >= left:
            left = char_index[char] + 1
        char_index[char] = right
        max_len = max(max_len, right - left + 1)

    print("Step 1 - char_index map:", char_index)
    # Output: Step 1 - char_index map: {'a': 3, 'b': 7, 'c': 5}
    # ↑ Each char stores its LAST seen index — this is the jump-to table
    print("Step 2 - max window length:", max_len)
    # Output: Step 2 - max window length: 3

    return max_len

print(longest_unique_substring("abcabcbb"))  # Output: 3
print(longest_unique_substring("bbbbb"))     # Output: 1
print(longest_unique_substring("pwwkew"))    # Output: 3

Interview Tip: "This is the classic sliding window pattern. O(n) time, O(min(n, alphabet_size)) space. Mention 'two pointer' — interviewers love that term."

What NOT to Say: "I'll check every possible substring" — That's O(n^3). Sliding window is O(n).

Q04 — Reverse a String / Reverse Words in a Sentence

Question: Given a string, reverse its characters. Also given a sentence, reverse the order of words while keeping each word's characters intact.

Sample Input (reverse characters): "hello" Expected Output: "olleh"

Sample Input (reverse words): "Data Engineering is fun" Expected Output: "fun is Engineering Data"

Solution 1: Two-Pointer Swap — Manual Approach (Easy to Understand)

python — editable

def reverse_string_manual(s):
    # Convert to list since strings are immutable in Python
    # Swap characters from both ends moving inward
    chars = list(s)
    left, right = 0, len(chars) - 1
    while left < right:
        chars[left], chars[right] = chars[right], chars[left]
        left += 1
        right -= 1
    return ''.join(chars)

print(reverse_string_manual("hello"))
# Output: olleh

def reverse_words_manual(sentence):
    # Split into words, reverse the list, join back
    words = sentence.split()
    reversed_words = []
    for i in range(len(words) - 1, -1, -1):
        reversed_words.append(words[i])
    return ' '.join(reversed_words)

print(reverse_words_manual("Data Engineering is fun"))
# Output: fun is Engineering Data

Solution 2: Using Built-in reversed() Function (Intermediate)

python — editable

def reverse_string_builtin(s):
    # reversed() returns an iterator — join it back into a string
    return ''.join(reversed(s))

print(reverse_string_builtin("hello"))
# Output: olleh

def reverse_words_builtin(sentence):
    # Split into words, reverse the list with reversed(), rejoin
    return ' '.join(reversed(sentence.split()))

print(reverse_words_builtin("Data Engineering is fun"))
# Output: fun is Engineering Data

Solution 3: Slice Notation — Pythonic (Interview Answer)

python — editable

# Reverse characters — [::-1] is the idiomatic Python way
s = "hello"
print("Step 1 - Original:", s)
# Output: Step 1 - Original: hello
print("Step 2 - Reversed:", s[::-1])
# Output: Step 2 - Reversed: olleh

# Reverse words in a sentence
sentence = "Data Engineering is fun"
words = sentence.split()
print("Step 1 - split():", words)
# Output: Step 1 - split(): ['Data', 'Engineering', 'is', 'fun']
print("Step 2 - Reversed:", words[::-1])
# Output: Step 2 - Reversed: ['fun', 'is', 'Engineering', 'Data']
print("Step 3 - Joined:", ' '.join(words[::-1]))
# Output: Step 3 - Joined: fun is Engineering Data

# Bonus: reverse each word individually (keep word order)
print(' '.join(word[::-1] for word in sentence.split()))
# Output: ataD gnireenignE si nuf

Interview Tip: "Show the Pythonic way first (slicing), then the two-pointer approach to prove you understand the underlying algorithm. Mention that strings are immutable in Python so you must convert to a list for in-place operations."

What NOT to Say: "Strings are mutable in Python so I'll swap in-place" — Strings are IMMUTABLE in Python. You must convert to a list first.

Q05 — Count Character Frequency / Most Common Character

Question: Given a string, count how many times each character appears and find the most frequent character(s).

Sample Input: "data engineering" Expected Output (frequency): {'a': 2, 'd': 1, 'e': 2, 'g': 2, 'i': 2, 'n': 2, 'r': 1, 't': 1} (spaces excluded, sorted) Expected Output (most common): 'a' (first non-repeating: 'd')

Solution 1: Manual Dictionary Count (Easy to Understand)

python — editable

def char_frequency_manual(s):
    # Walk through each character, tally in a dictionary
    # Skip spaces since we only care about letters
    freq = {}
    for char in s:
        if char != ' ':
            freq[char] = freq.get(char, 0) + 1
    return dict(sorted(freq.items()))

print(char_frequency_manual("data engineering"))
# Output: {'a': 2, 'd': 1, 'e': 2, 'g': 2, 'i': 2, 'n': 2, 'r': 1, 't': 1}

# Find most common character manually
def most_common_manual(s):
    freq = char_frequency_manual(s)
    return sorted(freq.items(), key=lambda x: (-x[1], x[0]))[0]

print(most_common_manual("data engineering"))
# Output: ('a', 2)

Solution 2: Using defaultdict (Intermediate)

python — editable

from collections import defaultdict

def char_frequency_dd(s):
    # defaultdict(int) initializes missing keys to 0
    # Cleaner than using .get() with a default
    freq = defaultdict(int)
    for char in s:
        if char != ' ':
            freq[char] += 1
    return dict(sorted(freq.items()))

print(char_frequency_dd("data engineering"))
# Output: {'a': 2, 'd': 1, 'e': 2, 'g': 2, 'i': 2, 'n': 2, 'r': 1, 't': 1}

Solution 3: Counter — Optimal / Pythonic (Interview Answer)

python — editable

from collections import Counter

s = "data engineering"
counts = Counter(s.replace(' ', ''))
print("Step 1 - Raw Counter:", counts)
# Output: Step 1 - Raw Counter: Counter({'a': 2, 'n': 2, 'e': 2, 'g': 2, 'i': 2, 'd': 1, 't': 1, 'r': 1})
# ↑ Now we can SEE which chars appear once vs multiple times

# Full frequency sorted by key
print("Step 2 - Sorted:", dict(sorted(counts.items())))
# Output: Step 2 - Sorted: {'a': 2, 'd': 1, 'e': 2, 'g': 2, 'i': 2, 'n': 2, 'r': 1, 't': 1}

# Top 3 most common characters
print("Step 3 - Top 3:", counts.most_common(3))
# Output: Step 3 - Top 3: [('a', 2), ('n', 2), ('e', 2)]

# First non-repeating character
# Iterate the original string (preserves order) but look up counts in Counter
first_unique = next((c for c in s.replace(' ', '') if counts[c] == 1), None)
print("Step 4 - First unique:", first_unique)
# Output: Step 4 - First unique: d

Interview Tip: "Counter.most_common() is the go-to. For 'first non-repeating character', iterate the original string to preserve insertion order, but look up counts in the Counter. This is O(n)."

What NOT to Say: "I'll use s.count(c) inside a loop" — That's O(n^2) because count() scans the entire string each time. Build a Counter first (O(n)), then query it.

Q06 — Check if a String Is a Palindrome

Question: A palindrome reads the same forwards and backwards. Check if a given string is a palindrome, ignoring case and non-alphanumeric characters.

Sample Input: "A man, a plan, a canal: Panama" Expected Output: True

Sample Input: "race a car" Expected Output: False

Solution 1: Clean and Reverse with a Loop (Easy to Understand)

python — editable

def is_palindrome_loop(s):
    # First strip out anything that isn't a letter or digit
    # Then compare character by character from both ends
    cleaned = ''
    for c in s:
        if c.isalnum():
            cleaned += c.lower()

    for i in range(len(cleaned) // 2):
        if cleaned[i] != cleaned[len(cleaned) - 1 - i]:
            return False
    return True

print(is_palindrome_loop("A man, a plan, a canal: Panama"))  # Output: True
print(is_palindrome_loop("race a car"))                      # Output: False

Solution 2: Two-Pointer Without Extra String (Intermediate)

python — editable

def is_palindrome_twoptr(s):
    # Use two pointers from both ends, skip non-alphanumeric chars
    # This avoids creating a cleaned copy of the string
    left, right = 0, len(s) - 1
    while left < right:
        while left < right and not s[left].isalnum():
            left += 1
        while left < right and not s[right].isalnum():
            right -= 1
        if s[left].lower() != s[right].lower():
            return False
        left += 1
        right -= 1
    return True

print(is_palindrome_twoptr("A man, a plan, a canal: Panama"))  # Output: True
print(is_palindrome_twoptr("race a car"))                      # Output: False

Solution 3: Slice Comparison — Pythonic (Interview Answer)

python — editable

def is_palindrome(s):
    # Clean: keep only alphanumeric, lowercase everything
    # Then compare string with its reverse using slicing
    cleaned = ''.join(c.lower() for c in s if c.isalnum())
    print("Step 1 - Cleaned:", cleaned)
    # Output: Step 1 - Cleaned: amanaplanacanalpanama
    print("Step 2 - Reversed:", cleaned[::-1])
    # Output: Step 2 - Reversed: amanaplanacanalpanama
    # ↑ Same forwards and backwards — it's a palindrome!
    return cleaned == cleaned[::-1]

print(is_palindrome("A man, a plan, a canal: Panama"))  # Output: True
print(is_palindrome("race a car"))                      # Output: False

Interview Tip: "Always clean the string first — remove non-alphanumeric characters and lowercase everything. Then compare with reversed. Mention the two-pointer approach uses O(1) extra space."

What NOT to Say: "madam is not a palindrome because M and m are different" — Normalize case first! Always clarify with the interviewer whether comparison is case-sensitive.

Q07 — String Compression (Run Length Encoding)

Question: Implement basic string compression using the counts of repeated characters. If the compressed string is not shorter than the original, return the original.

Sample Input: "aabcccccaaa" Expected Output: "a2b1c5a3"

Sample Input: "abcdef" Expected Output: "abcdef" (compressed would be "a1b1c1d1e1f1", which is longer)

Solution 1: Index-Based Comparison (Easy to Understand)

python — editable

def compress_basic(s):
    if not s:
        return s
    # Walk through string comparing each char with the next
    # When the character changes, record the char and its count
    result = ""
    count = 1
    for i in range(1, len(s)):
        if s[i] == s[i - 1]:
            count += 1
        else:
            result += s[i - 1] + str(count)
            count = 1
    # Don't forget the last group of characters
    result += s[-1] + str(count)
    return result if len(result) < len(s) else s

print(compress_basic("aabcccccaaa"))  # Output: a2b1c5a3
print(compress_basic("abcdef"))       # Output: abcdef

Solution 2: Using a List for Efficient Concatenation (Intermediate)

python — editable

def compress_list(s):
    if not s:
        return s
    # String concatenation in a loop creates a new string each time — O(n^2)
    # Using a list and joining at the end is O(n)
    result = []
    count = 1

    for i in range(1, len(s)):
        if s[i] == s[i - 1]:
            count += 1
        else:
            result.append(f"{s[i - 1]}{count}")
            count = 1
    result.append(f"{s[-1]}{count}")

    compressed = ''.join(result)
    return compressed if len(compressed) < len(s) else s

print(compress_list("aabcccccaaa"))  # Output: a2b1c5a3
print(compress_list("abcdef"))       # Output: abcdef

Solution 3: Using itertools.groupby — Pythonic (Interview Answer)

python — editable

from itertools import groupby

def compress(s):
    if not s:
        return s
    # groupby clusters consecutive identical characters together
    # Each group gives us the character and an iterator of occurrences
    groups = [(char, sum(1 for _ in group)) for char, group in groupby(s)]
    print("Step 1 - Groups:", groups)
    # Output: Step 1 - Groups: [('a', 2), ('b', 1), ('c', 5), ('a', 3)]
    # ↑ Each tuple = (character, consecutive count)

    compressed = ''.join(f"{char}{count}" for char, count in groups)
    print("Step 2 - Compressed:", compressed)
    # Output: Step 2 - Compressed: a2b1c5a3

    return compressed if len(compressed) < len(s) else s

print(compress("aabcccccaaa"))  # Output: a2b1c5a3
print(compress("abcdef"))       # Output: abcdef

Interview Tip: "Remember to handle the last group (after the loop ends in manual solutions). And always return the original if compressed isn't shorter. Mention that using a list instead of string concatenation is O(n) vs O(n^2)."

What NOT to Say: "I'll just concatenate strings in a loop" — String concatenation in Python creates a new string object each time, making it O(n^2). Use a list and join.

Q08 — Two Sum Problem

Question: Given a list of integers and a target sum, find two numbers that add up to the target. Return their indices.

Sample Input: nums = [2, 7, 11, 15], target = 9 Expected Output: [0, 1] (because nums[0] + nums[1] = 2 + 7 = 9)

Sample Input: nums = [3, 2, 4], target = 6 Expected Output: [1, 2]

Solution 1: Brute Force — Nested Loops (Easy to Understand)

python — editable

def two_sum_brute(nums, target):
    # Try every possible pair of numbers
    # If their sum equals the target, return both indices
    for i in range(len(nums)):
        for j in range(i + 1, len(nums)):
            if nums[i] + nums[j] == target:
                return [i, j]
    return []

print(two_sum_brute([2, 7, 11, 15], 9))  # Output: [0, 1]
print(two_sum_brute([3, 2, 4], 6))        # Output: [1, 2]

Solution 2: Sort + Two Pointers (Intermediate)

python — editable

def two_sum_sorted(nums, target):
    # Create index-value pairs, sort by value
    # Use two pointers from both ends
    indexed = sorted(enumerate(nums), key=lambda x: x[1])
    left, right = 0, len(indexed) - 1

    while left < right:
        current_sum = indexed[left][1] + indexed[right][1]
        if current_sum == target:
            # Return original indices, sorted for consistency
            return sorted([indexed[left][0], indexed[right][0]])
        elif current_sum < target:
            left += 1
        else:
            right -= 1
    return []

print(two_sum_sorted([2, 7, 11, 15], 9))  # Output: [0, 1]
print(two_sum_sorted([3, 2, 4], 6))        # Output: [1, 2]

Solution 3: Hash Map — Optimal (Interview Answer)

python — editable

def two_sum(nums, target):
    # For each number, compute its complement (target - num)
    # If the complement is already in our hash map, we found the pair
    # Otherwise, store this number's index for future lookups
    seen = {}
    for i, num in enumerate(nums):
        complement = target - num
        print(f"Step {i+1} - num={num}, complement={complement}, seen={seen}")
        # Output (iteration 1): Step 1 - num=2, complement=7, seen={}
        # Output (iteration 2): Step 2 - num=7, complement=2, seen={2: 0}
        # ↑ complement 2 IS in seen — match found! Return [0, 1]
        if complement in seen:
            return [seen[complement], i]
        seen[num] = i
    return []

print(two_sum([2, 7, 11, 15], 9))  # Output: [0, 1]
print(two_sum([3, 2, 4], 6))        # Output: [1, 2]

Interview Tip: "One-pass hash map solution. O(n) time, O(n) space. Always mention the time-space tradeoff: brute force uses O(1) space but O(n^2) time, hash map uses O(n) space for O(n) time."

What NOT to Say: "I'll use two nested loops" — That's O(n^2). Hash map is O(n). The interviewer expects you to know the optimal approach.

Q09 — FizzBuzz

Question: Print numbers from 1 to n. For multiples of 3 print "Fizz", for multiples of 5 print "Buzz", for multiples of both print "FizzBuzz".

Sample Input: n = 15 Expected Output: ['1', '2', 'Fizz', '4', 'Buzz', 'Fizz', '7', '8', 'Fizz', 'Buzz', '11', 'Fizz', '13', '14', 'FizzBuzz']

Solution 1: If-Elif Chain (Easy to Understand)

python — editable

def fizzbuzz_basic(n):
    # Check divisibility in the right order:
    # 15 first (both 3 and 5), then 3, then 5, then plain number
    result = []
    for i in range(1, n + 1):
        if i % 15 == 0:
            result.append("FizzBuzz")
        elif i % 3 == 0:
            result.append("Fizz")
        elif i % 5 == 0:
            result.append("Buzz")
        else:
            result.append(str(i))
    return result

print(fizzbuzz_basic(15))
# Output: ['1', '2', 'Fizz', '4', 'Buzz', 'Fizz', '7', '8', 'Fizz', 'Buzz', '11', 'Fizz', '13', '14', 'FizzBuzz']

Solution 2: String Concatenation — Avoids Hardcoding 15 (Intermediate)

python — editable

def fizzbuzz_concat(n):
    # Build the output string by concatenating "Fizz" and "Buzz" separately
    # This scales better if new rules are added (e.g., "Jazz" for 7)
    result = []
    for i in range(1, n + 1):
        output = ""
        if i % 3 == 0:
            output += "Fizz"
        if i % 5 == 0:
            output += "Buzz"
        result.append(output if output else str(i))
    return result

print(fizzbuzz_concat(15))
# Output: ['1', '2', 'Fizz', '4', 'Buzz', 'Fizz', '7', '8', 'Fizz', 'Buzz', '11', 'Fizz', '13', '14', 'FizzBuzz']

Solution 3: List Comprehension — Pythonic (Interview Answer)

python — editable

def fizzbuzz(n):
    # One-liner using conditional expressions in a list comprehension
    # Compact but still readable — shows Python fluency
    result = [
        'FizzBuzz' if i % 15 == 0
        else 'Fizz' if i % 3 == 0
        else 'Buzz' if i % 5 == 0
        else str(i)
        for i in range(1, n + 1)
    ]
    print("Step 1 - First 5:", result[:5])
    # Output: Step 1 - First 5: ['1', '2', 'Fizz', '4', 'Buzz']
    print("Step 2 - Last 3:", result[-3:])
    # Output: Step 2 - Last 3: ['13', '14', 'FizzBuzz']
    # ↑ 15 hits both % 3 and % 5, so it becomes 'FizzBuzz'
    return result

print(fizzbuzz(15))
# Output: ['1', '2', 'Fizz', '4', 'Buzz', 'Fizz', '7', '8', 'Fizz', 'Buzz', '11', 'Fizz', '13', '14', 'FizzBuzz']

Interview Tip: "Check 15 FIRST (not 3 then 5 separately with elif). Show the clean version first, then mention the concatenation approach scales better for additional rules. The one-liner shows Python fluency."

What NOT to Say: "if i % 3 == 0 and i % 5 == 0" as the LAST check — Order matters! If you check 3 first with elif, 15 prints "Fizz" instead of "FizzBuzz".

Q10 — Remove Duplicates from a List (Preserve Order)

Question: Remove duplicate elements from a list while keeping the first occurrence's order.

Sample Input: [1, 3, 2, 3, 1, 5, 2] Expected Output: [1, 3, 2, 5]

Solution 1: Nested Check with a New List (Easy to Understand)

python — editable

def remove_dupes_basic(lst):
    # Walk through the list and only add items we haven't seen yet
    result = []
    for item in lst:
        if item not in result:
            result.append(item)
    return result

print(remove_dupes_basic([1, 3, 2, 3, 1, 5, 2]))
# Output: [1, 3, 2, 5]

Solution 2: Using a Set for O(1) Lookups (Intermediate)

python — editable

def remove_dupes_set(lst):
    # 'item not in result' in Solution 1 is O(n) — making overall O(n^2)
    # Using a set for lookups makes each check O(1) — overall O(n)
    seen = set()
    result = []
    for item in lst:
        if item not in seen:
            seen.add(item)
            result.append(item)
    return result

print(remove_dupes_set([1, 3, 2, 3, 1, 5, 2]))
# Output: [1, 3, 2, 5]

Solution 3: dict.fromkeys — Pythonic (Interview Answer)

python — editable

def remove_dupes(lst):
    # dict.fromkeys preserves insertion order (Python 3.7+)
    # Keys are unique by definition, so duplicates are dropped
    as_dict = dict.fromkeys(lst)
    print("Step 1 - dict.fromkeys:", as_dict)
    # Output: Step 1 - dict.fromkeys: {1: None, 3: None, 2: None, 5: None}
    # ↑ Duplicates gone! Keys kept first-occurrence order. Values are None.
    print("Step 2 - Extract keys:", list(as_dict))
    # Output: Step 2 - Extract keys: [1, 3, 2, 5]
    return list(as_dict)

print(remove_dupes([1, 3, 2, 3, 1, 5, 2]))
# Output: [1, 3, 2, 5]

Interview Tip: "list(set(lst)) removes duplicates but DESTROYS order. Always mention you're preserving order — it shows attention to detail. The dict.fromkeys trick is clean and Pythonic."

What NOT to Say: "Just use set()" — Sets are unordered. The interviewer is testing if you know this distinction.

Q11 — Flatten a Nested List

Question: Convert a nested list of arbitrary depth into a flat one-dimensional list.

Sample Input: [1, [2, [3, 4], 5], 6] Expected Output: [1, 2, 3, 4, 5, 6]

Solution 1: Recursive Approach (Easy to Understand)

python — editable

def flatten_recursive(lst):
    # Walk through each element
    # If it's a list, recursively flatten it and add all elements
    # If it's not a list, just add it directly
    result = []
    for item in lst:
        if isinstance(item, list):
            result.extend(flatten_recursive(item))
        else:
            result.append(item)
    return result

print(flatten_recursive([1, [2, [3, 4], 5], 6]))
# Output: [1, 2, 3, 4, 5, 6]

Solution 2: Iterative with a Stack (Intermediate)

python — editable

def flatten_iterative(lst):
    # Use a stack to avoid recursion (avoids stack overflow on deep nesting)
    # Process from right to left so output is in correct order
    stack = list(lst)
    result = []
    while stack:
        item = stack.pop(0)
        if isinstance(item, list):
            # Insert sub-items at the front of the stack to preserve order
            stack = item + stack
        else:
            result.append(item)
    return result

print(flatten_iterative([1, [2, [3, 4], 5], 6]))
# Output: [1, 2, 3, 4, 5, 6]

Solution 3: Generator-Based — Pythonic (Interview Answer)

python — editable

def flatten(lst):
    # Generator function yields items one at a time
    # Uses 'yield from' for recursive flattening — clean and memory-efficient
    for item in lst:
        if isinstance(item, list):
            yield from flatten(item)
        else:
            yield item

data = [1, [2, [3, 4], 5], 6]
gen = flatten(data)
print("Step 1 - Generator object:", gen)
# Output: Step 1 - Generator object: <generator object flatten at 0x...>
# ↑ Nothing computed yet — it's lazy!
result = list(gen)
print("Step 2 - Consumed:", result)
# Output: Step 2 - Consumed: [1, 2, 3, 4, 5, 6]

# Bonus: one-level flatten with list comprehension (when depth = 1)
nested = [[1, 2], [3, 4], [5, 6]]
flat = [x for sublist in nested for x in sublist]
print(flat)
# Output: [1, 2, 3, 4, 5, 6]

Interview Tip: "Ask the interviewer: is it one level deep or arbitrarily nested? One-level = list comprehension. Arbitrary = recursion. Mentioning yield from shows you understand Python generators."

What NOT to Say: "I'll just use a list comprehension" — That only works for one-level nesting. Arbitrary depth requires recursion or a stack.

Q12 — Find Missing Number in a List (1 to N)

Question: Given a list of n-1 distinct numbers taken from the range 1 to n, find the one missing number.

Sample Input: nums = [1, 2, 4, 5, 6], n = 6 Expected Output: 3

Solution 1: Sort and Scan (Easy to Understand)

python — editable

def find_missing_sort(nums, n):
    # Sort the list, then walk through looking for a gap
    # When nums[i] doesn't match expected value (i+1), that's the missing one
    nums_sorted = sorted(nums)
    for i in range(len(nums_sorted)):
        if nums_sorted[i] != i + 1:
            return i + 1
    # If no gap found, the missing number is n (the last one)
    return n

print(find_missing_sort([1, 2, 4, 5, 6], 6))
# Output: 3

Solution 2: Math Formula — Sum Difference (Intermediate)

python — editable

def find_missing_math(nums, n):
    # The sum of 1 to n = n*(n+1)/2
    # Subtract the actual sum of the list — the difference is the missing number
    expected = n * (n + 1) // 2
    actual = sum(nums)
    return expected - actual

print(find_missing_math([1, 2, 4, 5, 6], 6))
# Output: 3

Solution 3: XOR — Optimal / No Overflow Risk (Interview Answer)

python — editable

def find_missing_xor(nums, n):
    # XOR has the property: a ^ a = 0 and a ^ 0 = a
    # XOR all numbers from 1 to n, then XOR with all numbers in the list
    # Every number that appears in both cancels out, leaving only the missing one
    xor = 0
    for i in range(1, n + 1):
        xor ^= i
    print("Step 1 - XOR of 1..n:", xor)
    # Output: Step 1 - XOR of 1..n: 7
    # ↑ 1^2^3^4^5^6 = 7

    for num in nums:
        xor ^= num
    print("Step 2 - After XOR with list:", xor)
    # Output: Step 2 - After XOR with list: 3
    # ↑ All paired numbers cancelled out, only 3 (the missing one) remains

    return xor

print(find_missing_xor([1, 2, 4, 5, 6], 6))
# Output: 3

Bonus: Set Difference

python — editable

def find_missing_set(nums, n):
    # Create the full set and subtract the given numbers
    return (set(range(1, n + 1)) - set(nums)).pop()

print(find_missing_set([1, 2, 4, 5, 6], 6))
# Output: 3

Interview Tip: "Math formula (n*(n+1)/2) is the classic answer. Mention XOR as a bonus — it avoids integer overflow in languages like Java/C++ where that matters. In Python, integers have arbitrary precision so overflow isn't an issue, but mentioning it shows cross-language awareness."

What NOT to Say: "I'll sort and scan" — That's O(n log n). The math and XOR approaches are both O(n) time and O(1) space.

Q13 — Matrix / 2D List Operations

Question: Perform common operations on a 2D list: transpose, flatten, and rotate 90 degrees clockwise.

Sample Input:

matrix = [

[1, 2, 3],

[4, 5, 6],

[7, 8, 9]

]

Expected Output (transpose): [[1, 4, 7], [2, 5, 8], [3, 6, 9]] Expected Output (flatten): [1, 2, 3, 4, 5, 6, 7, 8, 9] Expected Output (rotate 90 CW): [[7, 4, 1], [8, 5, 2], [9, 6, 3]]

Solution 1: Manual Loops (Easy to Understand)

python — editable

matrix = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
]

# Transpose: swap rows and columns using nested loops
def transpose_manual(m):
    rows, cols = len(m), len(m[0])
    # Create a new matrix where result[j][i] = m[i][j]
    result = []
    for j in range(cols):
        new_row = []
        for i in range(rows):
            new_row.append(m[i][j])
        result.append(new_row)
    return result

print(transpose_manual(matrix))
# Output: [[1, 4, 7], [2, 5, 8], [3, 6, 9]]

# Flatten: just iterate all rows and all elements
def flatten_manual(m):
    result = []
    for row in m:
        for val in row:
            result.append(val)
    return result

print(flatten_manual(matrix))
# Output: [1, 2, 3, 4, 5, 6, 7, 8, 9]

# Rotate 90 CW: transpose then reverse each row
def rotate_manual(m):
    t = transpose_manual(m)
    return [row[::-1] for row in t]

print(rotate_manual(matrix))
# Output: [[7, 4, 1], [8, 5, 2], [9, 6, 3]]

Solution 2: List Comprehension (Intermediate)

python — editable

matrix = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
]

# Transpose using list comprehension
transposed = [[matrix[i][j] for i in range(len(matrix))]
              for j in range(len(matrix[0]))]
print(transposed)
# Output: [[1, 4, 7], [2, 5, 8], [3, 6, 9]]

# Flatten using list comprehension
flat = [x for row in matrix for x in row]
print(flat)
# Output: [1, 2, 3, 4, 5, 6, 7, 8, 9]

Solution 3: zip(*matrix) — Pythonic (Interview Answer)

python — editable

matrix = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
]

# Transpose: zip(*matrix) unpacks each row as an argument to zip
# zip groups the first elements, second elements, etc.
print("Step 1 - Raw zip tuples:", list(zip(*matrix)))
# Output: Step 1 - Raw zip tuples: [(1, 4, 7), (2, 5, 8), (3, 6, 9)]
# ↑ zip returns tuples — we convert each to a list
transposed = [list(row) for row in zip(*matrix)]
print("Step 2 - Transposed:", transposed)
# Output: Step 2 - Transposed: [[1, 4, 7], [2, 5, 8], [3, 6, 9]]

# Flatten
flat = [x for row in matrix for x in row]
print(flat)
# Output: [1, 2, 3, 4, 5, 6, 7, 8, 9]

# Rotate 90 clockwise: reverse the rows first, then transpose
print("Step 1 - Reversed rows:", matrix[::-1])
# Output: Step 1 - Reversed rows: [[7, 8, 9], [4, 5, 6], [1, 2, 3]]
# matrix[::-1] reverses row order, then zip groups columns
rotated = [list(row) for row in zip(*matrix[::-1])]
print("Step 2 - Rotated 90 CW:", rotated)
# Output: Step 2 - Rotated 90 CW: [[7, 4, 1], [8, 5, 2], [9, 6, 3]]

Interview Tip: "zip(*matrix) for transpose is a classic Python trick. The * unpacks each row as a separate argument to zip. For rotation, remember: 90 CW = reverse rows then transpose."

What NOT to Say: "I'll use numpy for this" — Unless the interviewer allows external libraries, stick to pure Python. Show you understand the underlying logic.

Q14 — Dictionary Manipulation (Merge, Invert, Sort)

Question: Perform common dictionary operations: merge two dictionaries, invert a dictionary (swap keys and values), and sort by value.

Sample Input (merge): d1 = {'a': 1, 'b': 2}, d2 = {'b': 3, 'c': 4} Expected Output (merge): {'a': 1, 'b': 3, 'c': 4} (d2 overwrites d1 on conflict)

Sample Input (invert): {'a': 1, 'b': 2, 'c': 3} Expected Output (invert): {1: 'a', 2: 'b', 3: 'c'}

Sample Input (sort by value): {'alice': 85, 'bob': 92, 'charlie': 78} Expected Output (sort desc): {'bob': 92, 'alice': 85, 'charlie': 78}

Solution 1: Manual Loops (Easy to Understand)

python — editable

# Merge: copy d1, then update with d2
def merge_manual(d1, d2):
    # Start with a copy of d1 so we don't mutate the original
    # Then add/overwrite with all key-value pairs from d2
    result = {}
    for k, v in d1.items():
        result[k] = v
    for k, v in d2.items():
        result[k] = v
    return result

print(merge_manual({'a': 1, 'b': 2}, {'b': 3, 'c': 4}))
# Output: {'a': 1, 'b': 3, 'c': 4}

# Invert: swap keys and values
def invert_manual(d):
    result = {}
    for k, v in d.items():
        result[v] = k
    return result

print(invert_manual({'a': 1, 'b': 2, 'c': 3}))
# Output: {1: 'a', 2: 'b', 3: 'c'}

# Sort by value descending
def sort_by_value_manual(d):
    # Convert to list of tuples, sort by second element, rebuild dict
    pairs = list(d.items())
    pairs.sort(key=lambda x: x[1], reverse=True)
    return dict(pairs)

print(sort_by_value_manual({'alice': 85, 'bob': 92, 'charlie': 78}))
# Output: {'bob': 92, 'alice': 85, 'charlie': 78}

Solution 2: Using dict.update and comprehensions (Intermediate)

python — editable

# Merge with update()
def merge_update(d1, d2):
    # dict.update modifies in place, so copy first
    result = d1.copy()
    result.update(d2)
    return result

print(merge_update({'a': 1, 'b': 2}, {'b': 3, 'c': 4}))
# Output: {'a': 1, 'b': 3, 'c': 4}

# Invert with dict comprehension
original = {'a': 1, 'b': 2, 'c': 3}
inverted = {v: k for k, v in original.items()}
print(inverted)
# Output: {1: 'a', 2: 'b', 3: 'c'}

# Sort by value with sorted()
scores = {'alice': 85, 'bob': 92, 'charlie': 78}
sorted_scores = dict(sorted(scores.items(), key=lambda x: x[1], reverse=True))
print(sorted_scores)
# Output: {'bob': 92, 'alice': 85, 'charlie': 78}

Solution 3: Modern Python Operators — Pythonic (Interview Answer)

python — editable

# Merge with | operator (Python 3.9+)
d1 = {'a': 1, 'b': 2}
d2 = {'b': 3, 'c': 4}
print("Step 1 - d1:", d1, "| d2:", d2)
# Output: Step 1 - d1: {'a': 1, 'b': 2} | d2: {'b': 3, 'c': 4}
merged = d1 | d2  # d2 values overwrite d1 on key conflict
print("Step 2 - Merged:", merged)
# Output: Step 2 - Merged: {'a': 1, 'b': 3, 'c': 4}
# ↑ Key 'b' conflict: d2's value (3) wins over d1's value (2)

# For Python 3.5+, use unpacking
merged_compat = {**d1, **d2}
print(merged_compat)
# Output: {'a': 1, 'b': 3, 'c': 4}

# Invert with comprehension
original = {'a': 1, 'b': 2, 'c': 3}
print("Step 1 - original.items():", list(original.items()))
# Output: Step 1 - original.items(): [('a', 1), ('b', 2), ('c', 3)]
inverted = {v: k for k, v in original.items()}
print("Step 2 - Inverted:", inverted)
# Output: Step 2 - Inverted: {1: 'a', 2: 'b', 3: 'c'}

# Sort by value
scores = {'alice': 85, 'bob': 92, 'charlie': 78}
print("Step 1 - Unsorted items:", list(scores.items()))
# Output: Step 1 - Unsorted items: [('alice', 85), ('bob', 92), ('charlie', 78)]
sorted_scores = dict(sorted(scores.items(), key=lambda x: x[1], reverse=True))
print("Step 2 - Sorted desc:", sorted_scores)
# Output: Step 2 - Sorted desc: {'bob': 92, 'alice': 85, 'charlie': 78}

# Bonus: Group by value using defaultdict
from collections import defaultdict
data = [('a', 1), ('b', 2), ('a', 3), ('b', 4)]
grouped = defaultdict(list)
for key, val in data:
    grouped[key].append(val)
print(dict(sorted(grouped.items())))
# Output: {'a': [1, 3], 'b': [2, 4]}

Interview Tip: "Know d1 | d2 for Python 3.9+ and {**d1, **d2} for older versions. Both merge with right-side precedence. For inverting, warn that duplicate values will cause key collisions — only the last one survives."

What NOT to Say: "Dictionaries are unordered" — Since Python 3.7, dictionaries maintain insertion order. This is guaranteed by the language spec, not just an implementation detail.

Q15 — List Comprehension vs Generator Expression

Question: Explain and demonstrate the difference between list comprehension and generator expression. When should you use each?

Sample Input: Compute squares of numbers 0 through 999999. Expected Output: List comprehension uses ~8 MB RAM; generator uses ~200 bytes.

Solution 1: Side-by-Side Comparison (Easy to Understand)

python — editable

200 bytes

# Both produce the same values
small_list = [x ** 2 for x in range(5)]
small_gen = (x ** 2 for x in range(5))
print(small_list)
# Output: [0, 1, 4, 9, 16]
print(list(small_gen))
# Output: [0, 1, 4, 9, 16]">import sys

# List comprehension: uses square brackets []
# Creates the ENTIRE list in memory at once
squares_list = [x ** 2 for x in range(1000000)]
list_size = sys.getsizeof(squares_list)
print(f"List size: {list_size} bytes")
# Output: List size: 8448728 bytes

# Generator expression: uses parentheses ()
# Produces items ONE AT A TIME, on demand — lazy evaluation
squares_gen = (x ** 2 for x in range(1000000))
gen_size = sys.getsizeof(squares_gen)
print(f"Generator size: {gen_size} bytes")
# Output: Generator size: 200 bytes

# Both produce the same values
small_list = [x ** 2 for x in range(5)]
small_gen = (x ** 2 for x in range(5))
print(small_list)
# Output: [0, 1, 4, 9, 16]
print(list(small_gen))
# Output: [0, 1, 4, 9, 16]

Solution 2: Showing Lazy vs Eager Behavior (Intermediate)

python — editable

# Generators are LAZY — they compute values only when asked
def noisy_square(x):
    # This function prints when called, so we can see when computation happens
    print(f"  Computing {x}^2")
    return x ** 2

# List comprehension: ALL computations happen immediately
print("List comprehension (eager):")
result_list = [noisy_square(x) for x in range(3)]
# Output:
#   Computing 0^2
#   Computing 1^2
#   Computing 2^2
print(f"Result: {result_list}")
# Output: Result: [0, 1, 4]

print()

# Generator: computations happen only when you iterate
print("Generator expression (lazy):")
result_gen = (noisy_square(x) for x in range(3))
print("Generator created, nothing computed yet.")
# Output: Generator created, nothing computed yet.
print("Now iterating:")
# Output: Now iterating:
for val in result_gen:
    print(f"  Got: {val}")
# Output:
#   Computing 0^2
#   Got: 0
#   Computing 1^2
#   Got: 1
#   Computing 2^2
#   Got: 4

Solution 3: Real-World Data Engineering Use Case (Interview Answer)

python — editable

import sys

# WHEN TO USE LIST: need random access, len(), or iterate multiple times
data_list = [x ** 2 for x in range(10)]
print(data_list[5])     # Random access works
# Output: 25
print(len(data_list))   # len() works
# Output: 10

# WHEN TO USE GENERATOR: huge data, single pass, memory matters
# Real data engineering use: processing a large CSV without loading it all
def process_large_file(filepath):
    """Process a file line by line without loading it all into memory."""
    with open(filepath) as f:
        # Generator — yields one cleaned line at a time
        # Only 1 line in memory at any point
        valid_lines = (line.strip() for line in f
                       if not line.startswith('#'))
        for line in valid_lines:
            pass  # process each line here

# Key difference: generators are single-use
gen = (x for x in range(5))
first_pass = list(gen)
second_pass = list(gen)
print(f"First pass: {first_pass}")
# Output: First pass: [0, 1, 2, 3, 4]
print(f"Second pass: {second_pass}")
# Output: Second pass: []

# Chaining generators for ETL pipelines — each stage is lazy
raw_data = range(1, 11)
print("Step 1 - Raw data:", list(raw_data))
# Output: Step 1 - Raw data: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print("Step 2 - After double:", [x * 2 for x in raw_data])
# Output: Step 2 - After double: [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
print("Step 3 - After filter >10:", [x * 2 for x in raw_data if x * 2 > 10])
# Output: Step 3 - After filter >10: [12, 14, 16, 18, 20]
# ↑ These prints show each stage — but in production, use generators:

step1 = (x * 2 for x in raw_data)        # transform: double
step2 = (x for x in step1 if x > 10)     # filter: keep > 10
step3 = (f"val={x}" for x in step2)       # format: add label
# Nothing has been computed yet — only when we consume:
print(list(step3))
# Output: ['val=12', 'val=14', 'val=16', 'val=18', 'val=20']

Interview Tip: "In data engineering, generators are critical for ETL pipelines. Saying 'I process data lazily to avoid OOM errors' shows production awareness. Key rule: use a list when you need to access data multiple times; use a generator for single-pass processing of large datasets."

What NOT to Say: "They're the same thing" — List comprehension uses [] and stores everything in memory. Generator uses () and is lazy. Also, generators are single-use — once exhausted, they produce nothing.

Python Tricky Output & Gotchas — "What Will This Print?"Next →