Python Strings & Puzzles — Data Engineer Interview Prep
Memory Map
Q01 — Find Duplicate Characters in a String
Question: Given a string, find all characters that appear more than once.
Sample Input: "programming"
Expected Output: ['g', 'm', 'r'] (sorted)
Solution 1: Brute Force — Nested Loop (Easy to Understand)
def find_duplicates_brute(s):
# Compare every character with every other character
# If they match and we haven't recorded it yet, it's a duplicate
duplicates = []
for i in range(len(s)):
for j in range(i + 1, len(s)):
if s[i] == s[j] and s[i] not in duplicates:
duplicates.append(s[i])
return sorted(duplicates)
print(find_duplicates_brute("programming"))
# Output: ['g', 'm', 'r']
Solution 2: Using a Set to Track Seen Characters (Intermediate)
def find_duplicates_set(s):
# Walk through the string one character at a time
# Keep a 'seen' set — if a char is already in seen, it's a duplicate
seen = set()
duplicates = set()
for char in s:
if char in seen:
duplicates.add(char)
seen.add(char)
return sorted(duplicates)
print(find_duplicates_set("programming"))
# Output: ['g', 'm', 'r']
Solution 3: Counter — Optimal / Pythonic (Interview Answer)
from collections import Counter
def find_duplicates(s):
# Step 1: Counter scans the string once and builds a dict of {char: count}
counts = Counter(s)
print("Counter output:", counts)
# Output: Counter output: Counter({'g': 2, 'r': 2, 'm': 2, 'p': 1, 'o': 1, 'a': 1, 'i': 1, 'n': 1})
# ↑ Now we can SEE: 'g', 'r', 'm' each appear 2 times, rest appear 1 time
# Step 2: Filter — keep only chars where count > 1
dupes = [char for char, count in counts.items() if count > 1]
print("Before sorting:", dupes)
# Output: Before sorting: ['g', 'r', 'm']
# Step 3: Sort for consistent output (dicts don't guarantee order)
return sorted(dupes)
print("Result:", find_duplicates("programming"))
# Output: Result: ['g', 'm', 'r']
Interview Tip: "Start with Counter for clarity, then offer the set-based approach — shows you know both library tools AND manual logic. Mention that Counter is O(n) time and O(k) space where k is unique characters."
What NOT to Say: "I'll use a nested loop to compare each char with every other char" — That's O(n^2). Use a hash set or Counter for O(n).
Q02 — Check if Two Strings Are Anagrams
Question: Two strings are anagrams if they contain the exact same characters in the same frequency, just rearranged. Check if two given strings are anagrams.
Sample Input: s1 = "listen", s2 = "silent"
Expected Output: True
Sample Input: s1 = "hello", s2 = "world"
Expected Output: False
Solution 1: Sort and Compare (Easy to Understand)
def is_anagram_sort(s1, s2):
# If both strings, when sorted, produce the same sequence
# then they must have the same characters in the same frequency
return sorted(s1.lower()) == sorted(s2.lower())
print(is_anagram_sort("listen", "silent")) # Output: True
print(is_anagram_sort("hello", "world")) # Output: False
Solution 2: Manual Frequency Count with a Dictionary (Intermediate)
def is_anagram_manual(s1, s2):
# Build frequency maps for both strings manually
# Then compare the two maps
s1, s2 = s1.lower(), s2.lower()
if len(s1) != len(s2):
return False
freq = {}
for char in s1:
freq[char] = freq.get(char, 0) + 1
for char in s2:
freq[char] = freq.get(char, 0) - 1
# If all counts are zero, strings are anagrams
return all(v == 0 for v in freq.values())
print(is_anagram_manual("listen", "silent")) # Output: True
print(is_anagram_manual("hello", "world")) # Output: False
Solution 3: Counter — Optimal / Pythonic (Interview Answer)
from collections import Counter
def is_anagram(s1, s2):
# Counter builds a frequency dict — comparing two Counters is O(n)
c1 = Counter(s1.lower())
c2 = Counter(s2.lower())
print("Step 1 - Counter(s1):", c1)
# Output: Step 1 - Counter(s1): Counter({'l': 1, 'i': 1, 's': 1, 't': 1, 'e': 1, 'n': 1})
print("Step 2 - Counter(s2):", c2)
# Output: Step 2 - Counter(s2): Counter({'s': 1, 'i': 1, 'l': 1, 'e': 1, 'n': 1, 't': 1})
# ↑ Same keys, same counts — just different order. So they're equal!
return c1 == c2
print(is_anagram("listen", "silent")) # Output: True
print(is_anagram("hello", "world")) # Output: False
Bonus: Group Anagrams (Common Follow-Up)
Question: Given a list of words, group them so that all anagrams are together.
Sample Input: ["eat", "tea", "tan", "ate", "nat", "bat"]
Expected Output: [['ate', 'eat', 'tea'], ['bat'], ['nat', 'tan']]
Solution 1: Brute Force — Compare Every Pair (Easy to Understand)
def group_anagrams_brute(words):
# For each word, compare with every other word to see if they're anagrams
# Two words are anagrams if sorting their characters gives the same result
used = [False] * len(words)
result = []
for i in range(len(words)):
if used[i]:
continue
group = [words[i]]
used[i] = True
for j in range(i + 1, len(words)):
if not used[j] and sorted(words[i].lower()) == sorted(words[j].lower()):
group.append(words[j])
used[j] = True
result.append(sorted(group))
print(sorted(result))
# Output: [['ate', 'eat', 'tea'], ['bat'], ['nat', 'tan']]
return sorted(result)
group_anagrams_brute(["eat", "tea", "tan", "ate", "nat", "bat"])
Solution 2: Plain Dict — Manual Key Check (Intermediate)
def group_anagrams_dict(words):
# Same idea as brute force BUT smarter:
# Sort each word → use it as a dictionary key → group words with same key
groups = {}
for word in words:
key = ''.join(sorted(word.lower()))
# Manual check: if key doesn't exist yet, create empty list
if key not in groups:
groups[key] = []
groups[key].append(word)
print("Groups dict:", groups)
# Output: Groups dict: {'aet': ['eat', 'tea', 'ate'], 'ant': ['tan', 'nat'], 'abt': ['bat']}
return sorted([sorted(g) for g in groups.values()])
print(group_anagrams_dict(["eat", "tea", "tan", "ate", "nat", "bat"]))
# Output: [['ate', 'eat', 'tea'], ['bat'], ['nat', 'tan']]
Solution 3: defaultdict — Optimal / Pythonic (Interview Answer)
from collections import defaultdict
def group_anagrams(words):
# defaultdict(list) auto-creates empty list for new keys — no if-check needed
groups = defaultdict(list)
for word in words:
# Step 1: sorted() splits word into individual chars and sorts them
sorted_chars = sorted(word.lower())
print(f" sorted('{word}') → {sorted_chars}")
# Output: sorted('eat') → ['a', 'e', 't']
# Output: sorted('tea') → ['a', 'e', 't'] ← same as 'eat'!
# Output: sorted('tan') → ['a', 'n', 't']
# Output: sorted('ate') → ['a', 'e', 't'] ← same as 'eat'!
# Output: sorted('nat') → ['a', 'n', 't'] ← same as 'tan'!
# Output: sorted('bat') → ['a', 'b', 't']
# Step 2: join() glues the sorted chars back into one string — becomes dict KEY
key = ''.join(sorted_chars)
print(f" ''.join({sorted_chars}) → '{key}'")
# Output: ''.join(['a', 'e', 't']) → 'aet'
# Step 3: Words with the same key land in the same bucket
groups[key].append(word)
# Let's see what the groups dict looks like inside
print("\nGroups dict:", dict(sorted(groups.items())))
# Output: Groups dict: {'abt': ['bat'], 'aet': ['eat', 'tea', 'ate'], 'ant': ['tan', 'nat']}
# ↑ 'eat', 'tea', 'ate' all sorted to 'aet' — so they grouped together!
# Step 4: Extract the grouped lists, sort each group and the overall result
result = sorted([sorted(g) for g in groups.values()])
print("Final result:", result)
# Output: Final result: [['ate', 'eat', 'tea'], ['bat'], ['nat', 'tan']]
return result
group_anagrams(["eat", "tea", "tan", "ate", "nat", "bat"])
Interview Tip: "Start with the sorted-key insight, then show defaultdict for clean grouping. Sorting each word is O(k log k) where k is word length, total O(n × k log k). Mention that Counter as key works too but sorted string is simpler to explain."
What NOT to Say: "I'll check if every character in s1 exists in s2" — This doesn't handle frequency. "aab" and "abb" would wrongly match.
Q03 — Longest Substring Without Repeating Characters
Question: Given a string, find the length of the longest contiguous substring where no character appears more than once.
Sample Input: "abcabcbb"
Expected Output: 3 (the substring is "abc")
Sample Input: "bbbbb"
Expected Output: 1 (the substring is "b")
Sample Input: "pwwkew"
Expected Output: 3 (the substring is "wke")
Solution 1: Brute Force — Check All Substrings (Easy to Understand)
def longest_unique_brute(s):
# Generate every possible substring
# Check if it has all unique characters
# Track the longest one
max_len = 0
for i in range(len(s)):
for j in range(i + 1, len(s) + 1):
substring = s[i:j]
if len(substring) == len(set(substring)):
max_len = max(max_len, len(substring))
return max_len
print(longest_unique_brute("abcabcbb")) # Output: 3
print(longest_unique_brute("bbbbb")) # Output: 1
print(longest_unique_brute("pwwkew")) # Output: 3
Solution 2: Sliding Window with a Set (Intermediate)
def longest_unique_set(s):
# Use two pointers (left, right) to define a window
# Expand right — if duplicate found, shrink from left
char_set = set()
left = 0
max_len = 0
for right in range(len(s)):
# If char already in window, remove from left until it's gone
while s[right] in char_set:
char_set.remove(s[left])
left += 1
char_set.add(s[right])
max_len = max(max_len, right - left + 1)
return max_len
print(longest_unique_set("abcabcbb")) # Output: 3
print(longest_unique_set("bbbbb")) # Output: 1
print(longest_unique_set("pwwkew")) # Output: 3
Solution 3: Sliding Window with a HashMap — Optimal (Interview Answer)
def longest_unique_substring(s):
# Store the last index of each character in a dict
# When we see a duplicate, jump left pointer past the previous occurrence
# This avoids the inner while-loop of Solution 2 — true O(n)
char_index = {}
left = 0
max_len = 0
for right, char in enumerate(s):
if char in char_index and char_index[char] >= left:
left = char_index[char] + 1
char_index[char] = right
max_len = max(max_len, right - left + 1)
print("Step 1 - char_index map:", char_index)
# Output: Step 1 - char_index map: {'a': 3, 'b': 7, 'c': 5}
# ↑ Each char stores its LAST seen index — this is the jump-to table
print("Step 2 - max window length:", max_len)
# Output: Step 2 - max window length: 3
return max_len
print(longest_unique_substring("abcabcbb")) # Output: 3
print(longest_unique_substring("bbbbb")) # Output: 1
print(longest_unique_substring("pwwkew")) # Output: 3
Interview Tip: "This is the classic sliding window pattern. O(n) time, O(min(n, alphabet_size)) space. Mention 'two pointer' — interviewers love that term."
What NOT to Say: "I'll check every possible substring" — That's O(n^3). Sliding window is O(n).
Q04 — Reverse a String / Reverse Words in a Sentence
Question: Given a string, reverse its characters. Also given a sentence, reverse the order of words while keeping each word's characters intact.
Sample Input (reverse characters): "hello"
Expected Output: "olleh"
Sample Input (reverse words): "Data Engineering is fun"
Expected Output: "fun is Engineering Data"
Solution 1: Two-Pointer Swap — Manual Approach (Easy to Understand)
def reverse_string_manual(s):
# Convert to list since strings are immutable in Python
# Swap characters from both ends moving inward
chars = list(s)
left, right = 0, len(chars) - 1
while left < right:
chars[left], chars[right] = chars[right], chars[left]
left += 1
right -= 1
return ''.join(chars)
print(reverse_string_manual("hello"))
# Output: olleh
def reverse_words_manual(sentence):
# Split into words, reverse the list, join back
words = sentence.split()
reversed_words = []
for i in range(len(words) - 1, -1, -1):
reversed_words.append(words[i])
return ' '.join(reversed_words)
print(reverse_words_manual("Data Engineering is fun"))
# Output: fun is Engineering Data
Solution 2: Using Built-in reversed() Function (Intermediate)
def reverse_string_builtin(s):
# reversed() returns an iterator — join it back into a string
return ''.join(reversed(s))
print(reverse_string_builtin("hello"))
# Output: olleh
def reverse_words_builtin(sentence):
# Split into words, reverse the list with reversed(), rejoin
return ' '.join(reversed(sentence.split()))
print(reverse_words_builtin("Data Engineering is fun"))
# Output: fun is Engineering Data
Solution 3: Slice Notation — Pythonic (Interview Answer)
# Reverse characters — [::-1] is the idiomatic Python way
s = "hello"
print("Step 1 - Original:", s)
# Output: Step 1 - Original: hello
print("Step 2 - Reversed:", s[::-1])
# Output: Step 2 - Reversed: olleh
# Reverse words in a sentence
sentence = "Data Engineering is fun"
words = sentence.split()
print("Step 1 - split():", words)
# Output: Step 1 - split(): ['Data', 'Engineering', 'is', 'fun']
print("Step 2 - Reversed:", words[::-1])
# Output: Step 2 - Reversed: ['fun', 'is', 'Engineering', 'Data']
print("Step 3 - Joined:", ' '.join(words[::-1]))
# Output: Step 3 - Joined: fun is Engineering Data
# Bonus: reverse each word individually (keep word order)
print(' '.join(word[::-1] for word in sentence.split()))
# Output: ataD gnireenignE si nuf
Interview Tip: "Show the Pythonic way first (slicing), then the two-pointer approach to prove you understand the underlying algorithm. Mention that strings are immutable in Python so you must convert to a list for in-place operations."
What NOT to Say: "Strings are mutable in Python so I'll swap in-place" — Strings are IMMUTABLE in Python. You must convert to a list first.
Q05 — Count Character Frequency / Most Common Character
Question: Given a string, count how many times each character appears and find the most frequent character(s).
Sample Input: "data engineering"
Expected Output (frequency): {'a': 2, 'd': 1, 'e': 2, 'g': 2, 'i': 2, 'n': 2, 'r': 1, 't': 1} (spaces excluded, sorted)
Expected Output (most common): 'a' (first non-repeating: 'd')
Solution 1: Manual Dictionary Count (Easy to Understand)
def char_frequency_manual(s):
# Walk through each character, tally in a dictionary
# Skip spaces since we only care about letters
freq = {}
for char in s:
if char != ' ':
freq[char] = freq.get(char, 0) + 1
return dict(sorted(freq.items()))
print(char_frequency_manual("data engineering"))
# Output: {'a': 2, 'd': 1, 'e': 2, 'g': 2, 'i': 2, 'n': 2, 'r': 1, 't': 1}
# Find most common character manually
def most_common_manual(s):
freq = char_frequency_manual(s)
return sorted(freq.items(), key=lambda x: (-x[1], x[0]))[0]
print(most_common_manual("data engineering"))
# Output: ('a', 2)
Solution 2: Using defaultdict (Intermediate)
from collections import defaultdict
def char_frequency_dd(s):
# defaultdict(int) initializes missing keys to 0
# Cleaner than using .get() with a default
freq = defaultdict(int)
for char in s:
if char != ' ':
freq[char] += 1
return dict(sorted(freq.items()))
print(char_frequency_dd("data engineering"))
# Output: {'a': 2, 'd': 1, 'e': 2, 'g': 2, 'i': 2, 'n': 2, 'r': 1, 't': 1}
Solution 3: Counter — Optimal / Pythonic (Interview Answer)
from collections import Counter
s = "data engineering"
counts = Counter(s.replace(' ', ''))
print("Step 1 - Raw Counter:", counts)
# Output: Step 1 - Raw Counter: Counter({'a': 2, 'n': 2, 'e': 2, 'g': 2, 'i': 2, 'd': 1, 't': 1, 'r': 1})
# ↑ Now we can SEE which chars appear once vs multiple times
# Full frequency sorted by key
print("Step 2 - Sorted:", dict(sorted(counts.items())))
# Output: Step 2 - Sorted: {'a': 2, 'd': 1, 'e': 2, 'g': 2, 'i': 2, 'n': 2, 'r': 1, 't': 1}
# Top 3 most common characters
print("Step 3 - Top 3:", counts.most_common(3))
# Output: Step 3 - Top 3: [('a', 2), ('n', 2), ('e', 2)]
# First non-repeating character
# Iterate the original string (preserves order) but look up counts in Counter
first_unique = next((c for c in s.replace(' ', '') if counts[c] == 1), None)
print("Step 4 - First unique:", first_unique)
# Output: Step 4 - First unique: d
Interview Tip: "Counter.most_common() is the go-to. For 'first non-repeating character', iterate the original string to preserve insertion order, but look up counts in the Counter. This is O(n)."
What NOT to Say: "I'll use s.count(c) inside a loop" — That's O(n^2) because count() scans the entire string each time. Build a Counter first (O(n)), then query it.
Q06 — Check if a String Is a Palindrome
Question: A palindrome reads the same forwards and backwards. Check if a given string is a palindrome, ignoring case and non-alphanumeric characters.
Sample Input: "A man, a plan, a canal: Panama"
Expected Output: True
Sample Input: "race a car"
Expected Output: False
Solution 1: Clean and Reverse with a Loop (Easy to Understand)
def is_palindrome_loop(s):
# First strip out anything that isn't a letter or digit
# Then compare character by character from both ends
cleaned = ''
for c in s:
if c.isalnum():
cleaned += c.lower()
for i in range(len(cleaned) // 2):
if cleaned[i] != cleaned[len(cleaned) - 1 - i]:
return False
return True
print(is_palindrome_loop("A man, a plan, a canal: Panama")) # Output: True
print(is_palindrome_loop("race a car")) # Output: False
Solution 2: Two-Pointer Without Extra String (Intermediate)
def is_palindrome_twoptr(s):
# Use two pointers from both ends, skip non-alphanumeric chars
# This avoids creating a cleaned copy of the string
left, right = 0, len(s) - 1
while left < right:
while left < right and not s[left].isalnum():
left += 1
while left < right and not s[right].isalnum():
right -= 1
if s[left].lower() != s[right].lower():
return False
left += 1
right -= 1
return True
print(is_palindrome_twoptr("A man, a plan, a canal: Panama")) # Output: True
print(is_palindrome_twoptr("race a car")) # Output: False
Solution 3: Slice Comparison — Pythonic (Interview Answer)
def is_palindrome(s):
# Clean: keep only alphanumeric, lowercase everything
# Then compare string with its reverse using slicing
cleaned = ''.join(c.lower() for c in s if c.isalnum())
print("Step 1 - Cleaned:", cleaned)
# Output: Step 1 - Cleaned: amanaplanacanalpanama
print("Step 2 - Reversed:", cleaned[::-1])
# Output: Step 2 - Reversed: amanaplanacanalpanama
# ↑ Same forwards and backwards — it's a palindrome!
return cleaned == cleaned[::-1]
print(is_palindrome("A man, a plan, a canal: Panama")) # Output: True
print(is_palindrome("race a car")) # Output: False
Interview Tip: "Always clean the string first — remove non-alphanumeric characters and lowercase everything. Then compare with reversed. Mention the two-pointer approach uses O(1) extra space."
What NOT to Say: "madam is not a palindrome because M and m are different" — Normalize case first! Always clarify with the interviewer whether comparison is case-sensitive.
Q07 — String Compression (Run Length Encoding)
Question: Implement basic string compression using the counts of repeated characters. If the compressed string is not shorter than the original, return the original.
Sample Input: "aabcccccaaa"
Expected Output: "a2b1c5a3"
Sample Input: "abcdef"
Expected Output: "abcdef" (compressed would be "a1b1c1d1e1f1", which is longer)
Solution 1: Index-Based Comparison (Easy to Understand)
def compress_basic(s):
if not s:
return s
# Walk through string comparing each char with the next
# When the character changes, record the char and its count
result = ""
count = 1
for i in range(1, len(s)):
if s[i] == s[i - 1]:
count += 1
else:
result += s[i - 1] + str(count)
count = 1
# Don't forget the last group of characters
result += s[-1] + str(count)
return result if len(result) < len(s) else s
print(compress_basic("aabcccccaaa")) # Output: a2b1c5a3
print(compress_basic("abcdef")) # Output: abcdef
Solution 2: Using a List for Efficient Concatenation (Intermediate)
def compress_list(s):
if not s:
return s
# String concatenation in a loop creates a new string each time — O(n^2)
# Using a list and joining at the end is O(n)
result = []
count = 1
for i in range(1, len(s)):
if s[i] == s[i - 1]:
count += 1
else:
result.append(f"{s[i - 1]}{count}")
count = 1
result.append(f"{s[-1]}{count}")
compressed = ''.join(result)
return compressed if len(compressed) < len(s) else s
print(compress_list("aabcccccaaa")) # Output: a2b1c5a3
print(compress_list("abcdef")) # Output: abcdef
Solution 3: Using itertools.groupby — Pythonic (Interview Answer)
from itertools import groupby
def compress(s):
if not s:
return s
# groupby clusters consecutive identical characters together
# Each group gives us the character and an iterator of occurrences
groups = [(char, sum(1 for _ in group)) for char, group in groupby(s)]
print("Step 1 - Groups:", groups)
# Output: Step 1 - Groups: [('a', 2), ('b', 1), ('c', 5), ('a', 3)]
# ↑ Each tuple = (character, consecutive count)
compressed = ''.join(f"{char}{count}" for char, count in groups)
print("Step 2 - Compressed:", compressed)
# Output: Step 2 - Compressed: a2b1c5a3
return compressed if len(compressed) < len(s) else s
print(compress("aabcccccaaa")) # Output: a2b1c5a3
print(compress("abcdef")) # Output: abcdef
Interview Tip: "Remember to handle the last group (after the loop ends in manual solutions). And always return the original if compressed isn't shorter. Mention that using a list instead of string concatenation is O(n) vs O(n^2)."
What NOT to Say: "I'll just concatenate strings in a loop" — String concatenation in Python creates a new string object each time, making it O(n^2). Use a list and join.
Q08 — Two Sum Problem
Question: Given a list of integers and a target sum, find two numbers that add up to the target. Return their indices.
Sample Input: nums = [2, 7, 11, 15], target = 9
Expected Output: [0, 1] (because nums[0] + nums[1] = 2 + 7 = 9)
Sample Input: nums = [3, 2, 4], target = 6
Expected Output: [1, 2]
Solution 1: Brute Force — Nested Loops (Easy to Understand)
def two_sum_brute(nums, target):
# Try every possible pair of numbers
# If their sum equals the target, return both indices
for i in range(len(nums)):
for j in range(i + 1, len(nums)):
if nums[i] + nums[j] == target:
return [i, j]
return []
print(two_sum_brute([2, 7, 11, 15], 9)) # Output: [0, 1]
print(two_sum_brute([3, 2, 4], 6)) # Output: [1, 2]
Solution 2: Sort + Two Pointers (Intermediate)
def two_sum_sorted(nums, target):
# Create index-value pairs, sort by value
# Use two pointers from both ends
indexed = sorted(enumerate(nums), key=lambda x: x[1])
left, right = 0, len(indexed) - 1
while left < right:
current_sum = indexed[left][1] + indexed[right][1]
if current_sum == target:
# Return original indices, sorted for consistency
return sorted([indexed[left][0], indexed[right][0]])
elif current_sum < target:
left += 1
else:
right -= 1
return []
print(two_sum_sorted([2, 7, 11, 15], 9)) # Output: [0, 1]
print(two_sum_sorted([3, 2, 4], 6)) # Output: [1, 2]
Solution 3: Hash Map — Optimal (Interview Answer)
def two_sum(nums, target):
# For each number, compute its complement (target - num)
# If the complement is already in our hash map, we found the pair
# Otherwise, store this number's index for future lookups
seen = {}
for i, num in enumerate(nums):
complement = target - num
print(f"Step {i+1} - num={num}, complement={complement}, seen={seen}")
# Output (iteration 1): Step 1 - num=2, complement=7, seen={}
# Output (iteration 2): Step 2 - num=7, complement=2, seen={2: 0}
# ↑ complement 2 IS in seen — match found! Return [0, 1]
if complement in seen:
return [seen[complement], i]
seen[num] = i
return []
print(two_sum([2, 7, 11, 15], 9)) # Output: [0, 1]
print(two_sum([3, 2, 4], 6)) # Output: [1, 2]
Interview Tip: "One-pass hash map solution. O(n) time, O(n) space. Always mention the time-space tradeoff: brute force uses O(1) space but O(n^2) time, hash map uses O(n) space for O(n) time."
What NOT to Say: "I'll use two nested loops" — That's O(n^2). Hash map is O(n). The interviewer expects you to know the optimal approach.
Q09 — FizzBuzz
Question: Print numbers from 1 to n. For multiples of 3 print "Fizz", for multiples of 5 print "Buzz", for multiples of both print "FizzBuzz".
Sample Input: n = 15
Expected Output: ['1', '2', 'Fizz', '4', 'Buzz', 'Fizz', '7', '8', 'Fizz', 'Buzz', '11', 'Fizz', '13', '14', 'FizzBuzz']
Solution 1: If-Elif Chain (Easy to Understand)
def fizzbuzz_basic(n):
# Check divisibility in the right order:
# 15 first (both 3 and 5), then 3, then 5, then plain number
result = []
for i in range(1, n + 1):
if i % 15 == 0:
result.append("FizzBuzz")
elif i % 3 == 0:
result.append("Fizz")
elif i % 5 == 0:
result.append("Buzz")
else:
result.append(str(i))
return result
print(fizzbuzz_basic(15))
# Output: ['1', '2', 'Fizz', '4', 'Buzz', 'Fizz', '7', '8', 'Fizz', 'Buzz', '11', 'Fizz', '13', '14', 'FizzBuzz']
Solution 2: String Concatenation — Avoids Hardcoding 15 (Intermediate)
def fizzbuzz_concat(n):
# Build the output string by concatenating "Fizz" and "Buzz" separately
# This scales better if new rules are added (e.g., "Jazz" for 7)
result = []
for i in range(1, n + 1):
output = ""
if i % 3 == 0:
output += "Fizz"
if i % 5 == 0:
output += "Buzz"
result.append(output if output else str(i))
return result
print(fizzbuzz_concat(15))
# Output: ['1', '2', 'Fizz', '4', 'Buzz', 'Fizz', '7', '8', 'Fizz', 'Buzz', '11', 'Fizz', '13', '14', 'FizzBuzz']
Solution 3: List Comprehension — Pythonic (Interview Answer)
def fizzbuzz(n):
# One-liner using conditional expressions in a list comprehension
# Compact but still readable — shows Python fluency
result = [
'FizzBuzz' if i % 15 == 0
else 'Fizz' if i % 3 == 0
else 'Buzz' if i % 5 == 0
else str(i)
for i in range(1, n + 1)
]
print("Step 1 - First 5:", result[:5])
# Output: Step 1 - First 5: ['1', '2', 'Fizz', '4', 'Buzz']
print("Step 2 - Last 3:", result[-3:])
# Output: Step 2 - Last 3: ['13', '14', 'FizzBuzz']
# ↑ 15 hits both % 3 and % 5, so it becomes 'FizzBuzz'
return result
print(fizzbuzz(15))
# Output: ['1', '2', 'Fizz', '4', 'Buzz', 'Fizz', '7', '8', 'Fizz', 'Buzz', '11', 'Fizz', '13', '14', 'FizzBuzz']
Interview Tip: "Check 15 FIRST (not 3 then 5 separately with elif). Show the clean version first, then mention the concatenation approach scales better for additional rules. The one-liner shows Python fluency."
What NOT to Say: "if i % 3 == 0 and i % 5 == 0" as the LAST check — Order matters! If you check 3 first with elif, 15 prints "Fizz" instead of "FizzBuzz".
Q10 — Remove Duplicates from a List (Preserve Order)
Question: Remove duplicate elements from a list while keeping the first occurrence's order.
Sample Input: [1, 3, 2, 3, 1, 5, 2]
Expected Output: [1, 3, 2, 5]
Solution 1: Nested Check with a New List (Easy to Understand)
def remove_dupes_basic(lst):
# Walk through the list and only add items we haven't seen yet
result = []
for item in lst:
if item not in result:
result.append(item)
return result
print(remove_dupes_basic([1, 3, 2, 3, 1, 5, 2]))
# Output: [1, 3, 2, 5]
Solution 2: Using a Set for O(1) Lookups (Intermediate)
def remove_dupes_set(lst):
# 'item not in result' in Solution 1 is O(n) — making overall O(n^2)
# Using a set for lookups makes each check O(1) — overall O(n)
seen = set()
result = []
for item in lst:
if item not in seen:
seen.add(item)
result.append(item)
return result
print(remove_dupes_set([1, 3, 2, 3, 1, 5, 2]))
# Output: [1, 3, 2, 5]
Solution 3: dict.fromkeys — Pythonic (Interview Answer)
def remove_dupes(lst):
# dict.fromkeys preserves insertion order (Python 3.7+)
# Keys are unique by definition, so duplicates are dropped
as_dict = dict.fromkeys(lst)
print("Step 1 - dict.fromkeys:", as_dict)
# Output: Step 1 - dict.fromkeys: {1: None, 3: None, 2: None, 5: None}
# ↑ Duplicates gone! Keys kept first-occurrence order. Values are None.
print("Step 2 - Extract keys:", list(as_dict))
# Output: Step 2 - Extract keys: [1, 3, 2, 5]
return list(as_dict)
print(remove_dupes([1, 3, 2, 3, 1, 5, 2]))
# Output: [1, 3, 2, 5]
Interview Tip:
"list(set(lst)) removes duplicates but DESTROYS order. Always mention you're preserving order — it shows attention to detail. The dict.fromkeys trick is clean and Pythonic."
What NOT to Say: "Just use set()" — Sets are unordered. The interviewer is testing if you know this distinction.
Q11 — Flatten a Nested List
Question: Convert a nested list of arbitrary depth into a flat one-dimensional list.
Sample Input: [1, [2, [3, 4], 5], 6]
Expected Output: [1, 2, 3, 4, 5, 6]
Solution 1: Recursive Approach (Easy to Understand)
def flatten_recursive(lst):
# Walk through each element
# If it's a list, recursively flatten it and add all elements
# If it's not a list, just add it directly
result = []
for item in lst:
if isinstance(item, list):
result.extend(flatten_recursive(item))
else:
result.append(item)
return result
print(flatten_recursive([1, [2, [3, 4], 5], 6]))
# Output: [1, 2, 3, 4, 5, 6]
Solution 2: Iterative with a Stack (Intermediate)
def flatten_iterative(lst):
# Use a stack to avoid recursion (avoids stack overflow on deep nesting)
# Process from right to left so output is in correct order
stack = list(lst)
result = []
while stack:
item = stack.pop(0)
if isinstance(item, list):
# Insert sub-items at the front of the stack to preserve order
stack = item + stack
else:
result.append(item)
return result
print(flatten_iterative([1, [2, [3, 4], 5], 6]))
# Output: [1, 2, 3, 4, 5, 6]
Solution 3: Generator-Based — Pythonic (Interview Answer)
def flatten(lst):
# Generator function yields items one at a time
# Uses 'yield from' for recursive flattening — clean and memory-efficient
for item in lst:
if isinstance(item, list):
yield from flatten(item)
else:
yield item
data = [1, [2, [3, 4], 5], 6]
gen = flatten(data)
print("Step 1 - Generator object:", gen)
# Output: Step 1 - Generator object: <generator object flatten at 0x...>
# ↑ Nothing computed yet — it's lazy!
result = list(gen)
print("Step 2 - Consumed:", result)
# Output: Step 2 - Consumed: [1, 2, 3, 4, 5, 6]
# Bonus: one-level flatten with list comprehension (when depth = 1)
nested = [[1, 2], [3, 4], [5, 6]]
flat = [x for sublist in nested for x in sublist]
print(flat)
# Output: [1, 2, 3, 4, 5, 6]
Interview Tip:
"Ask the interviewer: is it one level deep or arbitrarily nested? One-level = list comprehension. Arbitrary = recursion. Mentioning yield from shows you understand Python generators."
What NOT to Say: "I'll just use a list comprehension" — That only works for one-level nesting. Arbitrary depth requires recursion or a stack.
Q12 — Find Missing Number in a List (1 to N)
Question: Given a list of n-1 distinct numbers taken from the range 1 to n, find the one missing number.
Sample Input: nums = [1, 2, 4, 5, 6], n = 6
Expected Output: 3
Solution 1: Sort and Scan (Easy to Understand)
def find_missing_sort(nums, n):
# Sort the list, then walk through looking for a gap
# When nums[i] doesn't match expected value (i+1), that's the missing one
nums_sorted = sorted(nums)
for i in range(len(nums_sorted)):
if nums_sorted[i] != i + 1:
return i + 1
# If no gap found, the missing number is n (the last one)
return n
print(find_missing_sort([1, 2, 4, 5, 6], 6))
# Output: 3
Solution 2: Math Formula — Sum Difference (Intermediate)
def find_missing_math(nums, n):
# The sum of 1 to n = n*(n+1)/2
# Subtract the actual sum of the list — the difference is the missing number
expected = n * (n + 1) // 2
actual = sum(nums)
return expected - actual
print(find_missing_math([1, 2, 4, 5, 6], 6))
# Output: 3
Solution 3: XOR — Optimal / No Overflow Risk (Interview Answer)
def find_missing_xor(nums, n):
# XOR has the property: a ^ a = 0 and a ^ 0 = a
# XOR all numbers from 1 to n, then XOR with all numbers in the list
# Every number that appears in both cancels out, leaving only the missing one
xor = 0
for i in range(1, n + 1):
xor ^= i
print("Step 1 - XOR of 1..n:", xor)
# Output: Step 1 - XOR of 1..n: 7
# ↑ 1^2^3^4^5^6 = 7
for num in nums:
xor ^= num
print("Step 2 - After XOR with list:", xor)
# Output: Step 2 - After XOR with list: 3
# ↑ All paired numbers cancelled out, only 3 (the missing one) remains
return xor
print(find_missing_xor([1, 2, 4, 5, 6], 6))
# Output: 3
Bonus: Set Difference
def find_missing_set(nums, n):
# Create the full set and subtract the given numbers
return (set(range(1, n + 1)) - set(nums)).pop()
print(find_missing_set([1, 2, 4, 5, 6], 6))
# Output: 3
Interview Tip: "Math formula (n*(n+1)/2) is the classic answer. Mention XOR as a bonus — it avoids integer overflow in languages like Java/C++ where that matters. In Python, integers have arbitrary precision so overflow isn't an issue, but mentioning it shows cross-language awareness."
What NOT to Say: "I'll sort and scan" — That's O(n log n). The math and XOR approaches are both O(n) time and O(1) space.
Q13 — Matrix / 2D List Operations
Question: Perform common operations on a 2D list: transpose, flatten, and rotate 90 degrees clockwise.
Sample Input:
Expected Output (transpose): [[1, 4, 7], [2, 5, 8], [3, 6, 9]]
Expected Output (flatten): [1, 2, 3, 4, 5, 6, 7, 8, 9]
Expected Output (rotate 90 CW): [[7, 4, 1], [8, 5, 2], [9, 6, 3]]
Solution 1: Manual Loops (Easy to Understand)
matrix = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]
# Transpose: swap rows and columns using nested loops
def transpose_manual(m):
rows, cols = len(m), len(m[0])
# Create a new matrix where result[j][i] = m[i][j]
result = []
for j in range(cols):
new_row = []
for i in range(rows):
new_row.append(m[i][j])
result.append(new_row)
return result
print(transpose_manual(matrix))
# Output: [[1, 4, 7], [2, 5, 8], [3, 6, 9]]
# Flatten: just iterate all rows and all elements
def flatten_manual(m):
result = []
for row in m:
for val in row:
result.append(val)
return result
print(flatten_manual(matrix))
# Output: [1, 2, 3, 4, 5, 6, 7, 8, 9]
# Rotate 90 CW: transpose then reverse each row
def rotate_manual(m):
t = transpose_manual(m)
return [row[::-1] for row in t]
print(rotate_manual(matrix))
# Output: [[7, 4, 1], [8, 5, 2], [9, 6, 3]]
Solution 2: List Comprehension (Intermediate)
matrix = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]
# Transpose using list comprehension
transposed = [[matrix[i][j] for i in range(len(matrix))]
for j in range(len(matrix[0]))]
print(transposed)
# Output: [[1, 4, 7], [2, 5, 8], [3, 6, 9]]
# Flatten using list comprehension
flat = [x for row in matrix for x in row]
print(flat)
# Output: [1, 2, 3, 4, 5, 6, 7, 8, 9]
Solution 3: zip(*matrix) — Pythonic (Interview Answer)
matrix = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]
# Transpose: zip(*matrix) unpacks each row as an argument to zip
# zip groups the first elements, second elements, etc.
print("Step 1 - Raw zip tuples:", list(zip(*matrix)))
# Output: Step 1 - Raw zip tuples: [(1, 4, 7), (2, 5, 8), (3, 6, 9)]
# ↑ zip returns tuples — we convert each to a list
transposed = [list(row) for row in zip(*matrix)]
print("Step 2 - Transposed:", transposed)
# Output: Step 2 - Transposed: [[1, 4, 7], [2, 5, 8], [3, 6, 9]]
# Flatten
flat = [x for row in matrix for x in row]
print(flat)
# Output: [1, 2, 3, 4, 5, 6, 7, 8, 9]
# Rotate 90 clockwise: reverse the rows first, then transpose
print("Step 1 - Reversed rows:", matrix[::-1])
# Output: Step 1 - Reversed rows: [[7, 8, 9], [4, 5, 6], [1, 2, 3]]
# matrix[::-1] reverses row order, then zip groups columns
rotated = [list(row) for row in zip(*matrix[::-1])]
print("Step 2 - Rotated 90 CW:", rotated)
# Output: Step 2 - Rotated 90 CW: [[7, 4, 1], [8, 5, 2], [9, 6, 3]]
Interview Tip:
"zip(*matrix) for transpose is a classic Python trick. The * unpacks each row as a separate argument to zip. For rotation, remember: 90 CW = reverse rows then transpose."
What NOT to Say: "I'll use numpy for this" — Unless the interviewer allows external libraries, stick to pure Python. Show you understand the underlying logic.
Q14 — Dictionary Manipulation (Merge, Invert, Sort)
Question: Perform common dictionary operations: merge two dictionaries, invert a dictionary (swap keys and values), and sort by value.
Sample Input (merge): d1 = {'a': 1, 'b': 2}, d2 = {'b': 3, 'c': 4}
Expected Output (merge): {'a': 1, 'b': 3, 'c': 4} (d2 overwrites d1 on conflict)
Sample Input (invert): {'a': 1, 'b': 2, 'c': 3}
Expected Output (invert): {1: 'a', 2: 'b', 3: 'c'}
Sample Input (sort by value): {'alice': 85, 'bob': 92, 'charlie': 78}
Expected Output (sort desc): {'bob': 92, 'alice': 85, 'charlie': 78}
Solution 1: Manual Loops (Easy to Understand)
# Merge: copy d1, then update with d2
def merge_manual(d1, d2):
# Start with a copy of d1 so we don't mutate the original
# Then add/overwrite with all key-value pairs from d2
result = {}
for k, v in d1.items():
result[k] = v
for k, v in d2.items():
result[k] = v
return result
print(merge_manual({'a': 1, 'b': 2}, {'b': 3, 'c': 4}))
# Output: {'a': 1, 'b': 3, 'c': 4}
# Invert: swap keys and values
def invert_manual(d):
result = {}
for k, v in d.items():
result[v] = k
return result
print(invert_manual({'a': 1, 'b': 2, 'c': 3}))
# Output: {1: 'a', 2: 'b', 3: 'c'}
# Sort by value descending
def sort_by_value_manual(d):
# Convert to list of tuples, sort by second element, rebuild dict
pairs = list(d.items())
pairs.sort(key=lambda x: x[1], reverse=True)
return dict(pairs)
print(sort_by_value_manual({'alice': 85, 'bob': 92, 'charlie': 78}))
# Output: {'bob': 92, 'alice': 85, 'charlie': 78}
Solution 2: Using dict.update and comprehensions (Intermediate)
# Merge with update()
def merge_update(d1, d2):
# dict.update modifies in place, so copy first
result = d1.copy()
result.update(d2)
return result
print(merge_update({'a': 1, 'b': 2}, {'b': 3, 'c': 4}))
# Output: {'a': 1, 'b': 3, 'c': 4}
# Invert with dict comprehension
original = {'a': 1, 'b': 2, 'c': 3}
inverted = {v: k for k, v in original.items()}
print(inverted)
# Output: {1: 'a', 2: 'b', 3: 'c'}
# Sort by value with sorted()
scores = {'alice': 85, 'bob': 92, 'charlie': 78}
sorted_scores = dict(sorted(scores.items(), key=lambda x: x[1], reverse=True))
print(sorted_scores)
# Output: {'bob': 92, 'alice': 85, 'charlie': 78}
Solution 3: Modern Python Operators — Pythonic (Interview Answer)
# Merge with | operator (Python 3.9+)
d1 = {'a': 1, 'b': 2}
d2 = {'b': 3, 'c': 4}
print("Step 1 - d1:", d1, "| d2:", d2)
# Output: Step 1 - d1: {'a': 1, 'b': 2} | d2: {'b': 3, 'c': 4}
merged = d1 | d2 # d2 values overwrite d1 on key conflict
print("Step 2 - Merged:", merged)
# Output: Step 2 - Merged: {'a': 1, 'b': 3, 'c': 4}
# ↑ Key 'b' conflict: d2's value (3) wins over d1's value (2)
# For Python 3.5+, use unpacking
merged_compat = {**d1, **d2}
print(merged_compat)
# Output: {'a': 1, 'b': 3, 'c': 4}
# Invert with comprehension
original = {'a': 1, 'b': 2, 'c': 3}
print("Step 1 - original.items():", list(original.items()))
# Output: Step 1 - original.items(): [('a', 1), ('b', 2), ('c', 3)]
inverted = {v: k for k, v in original.items()}
print("Step 2 - Inverted:", inverted)
# Output: Step 2 - Inverted: {1: 'a', 2: 'b', 3: 'c'}
# Sort by value
scores = {'alice': 85, 'bob': 92, 'charlie': 78}
print("Step 1 - Unsorted items:", list(scores.items()))
# Output: Step 1 - Unsorted items: [('alice', 85), ('bob', 92), ('charlie', 78)]
sorted_scores = dict(sorted(scores.items(), key=lambda x: x[1], reverse=True))
print("Step 2 - Sorted desc:", sorted_scores)
# Output: Step 2 - Sorted desc: {'bob': 92, 'alice': 85, 'charlie': 78}
# Bonus: Group by value using defaultdict
from collections import defaultdict
data = [('a', 1), ('b', 2), ('a', 3), ('b', 4)]
grouped = defaultdict(list)
for key, val in data:
grouped[key].append(val)
print(dict(sorted(grouped.items())))
# Output: {'a': [1, 3], 'b': [2, 4]}
Interview Tip:
"Know d1 | d2 for Python 3.9+ and {**d1, **d2} for older versions. Both merge with right-side precedence. For inverting, warn that duplicate values will cause key collisions — only the last one survives."
What NOT to Say: "Dictionaries are unordered" — Since Python 3.7, dictionaries maintain insertion order. This is guaranteed by the language spec, not just an implementation detail.
Q15 — List Comprehension vs Generator Expression
Question: Explain and demonstrate the difference between list comprehension and generator expression. When should you use each?
Sample Input: Compute squares of numbers 0 through 999999. Expected Output: List comprehension uses ~8 MB RAM; generator uses ~200 bytes.
Solution 1: Side-by-Side Comparison (Easy to Understand)
200 bytes
# Both produce the same values
small_list = [x ** 2 for x in range(5)]
small_gen = (x ** 2 for x in range(5))
print(small_list)
# Output: [0, 1, 4, 9, 16]
print(list(small_gen))
# Output: [0, 1, 4, 9, 16]">import sys
# List comprehension: uses square brackets []
# Creates the ENTIRE list in memory at once
squares_list = [x ** 2 for x in range(1000000)]
list_size = sys.getsizeof(squares_list)
print(f"List size: {list_size} bytes")
# Output: List size: 8448728 bytes
# Generator expression: uses parentheses ()
# Produces items ONE AT A TIME, on demand — lazy evaluation
squares_gen = (x ** 2 for x in range(1000000))
gen_size = sys.getsizeof(squares_gen)
print(f"Generator size: {gen_size} bytes")
# Output: Generator size: 200 bytes
# Both produce the same values
small_list = [x ** 2 for x in range(5)]
small_gen = (x ** 2 for x in range(5))
print(small_list)
# Output: [0, 1, 4, 9, 16]
print(list(small_gen))
# Output: [0, 1, 4, 9, 16]
Solution 2: Showing Lazy vs Eager Behavior (Intermediate)
# Generators are LAZY — they compute values only when asked
def noisy_square(x):
# This function prints when called, so we can see when computation happens
print(f" Computing {x}^2")
return x ** 2
# List comprehension: ALL computations happen immediately
print("List comprehension (eager):")
result_list = [noisy_square(x) for x in range(3)]
# Output:
# Computing 0^2
# Computing 1^2
# Computing 2^2
print(f"Result: {result_list}")
# Output: Result: [0, 1, 4]
print()
# Generator: computations happen only when you iterate
print("Generator expression (lazy):")
result_gen = (noisy_square(x) for x in range(3))
print("Generator created, nothing computed yet.")
# Output: Generator created, nothing computed yet.
print("Now iterating:")
# Output: Now iterating:
for val in result_gen:
print(f" Got: {val}")
# Output:
# Computing 0^2
# Got: 0
# Computing 1^2
# Got: 1
# Computing 2^2
# Got: 4
Solution 3: Real-World Data Engineering Use Case (Interview Answer)
import sys
# WHEN TO USE LIST: need random access, len(), or iterate multiple times
data_list = [x ** 2 for x in range(10)]
print(data_list[5]) # Random access works
# Output: 25
print(len(data_list)) # len() works
# Output: 10
# WHEN TO USE GENERATOR: huge data, single pass, memory matters
# Real data engineering use: processing a large CSV without loading it all
def process_large_file(filepath):
"""Process a file line by line without loading it all into memory."""
with open(filepath) as f:
# Generator — yields one cleaned line at a time
# Only 1 line in memory at any point
valid_lines = (line.strip() for line in f
if not line.startswith('#'))
for line in valid_lines:
pass # process each line here
# Key difference: generators are single-use
gen = (x for x in range(5))
first_pass = list(gen)
second_pass = list(gen)
print(f"First pass: {first_pass}")
# Output: First pass: [0, 1, 2, 3, 4]
print(f"Second pass: {second_pass}")
# Output: Second pass: []
# Chaining generators for ETL pipelines — each stage is lazy
raw_data = range(1, 11)
print("Step 1 - Raw data:", list(raw_data))
# Output: Step 1 - Raw data: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print("Step 2 - After double:", [x * 2 for x in raw_data])
# Output: Step 2 - After double: [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
print("Step 3 - After filter >10:", [x * 2 for x in raw_data if x * 2 > 10])
# Output: Step 3 - After filter >10: [12, 14, 16, 18, 20]
# ↑ These prints show each stage — but in production, use generators:
step1 = (x * 2 for x in raw_data) # transform: double
step2 = (x for x in step1 if x > 10) # filter: keep > 10
step3 = (f"val={x}" for x in step2) # format: add label
# Nothing has been computed yet — only when we consume:
print(list(step3))
# Output: ['val=12', 'val=14', 'val=16', 'val=18', 'val=20']
Interview Tip: "In data engineering, generators are critical for ETL pipelines. Saying 'I process data lazily to avoid OOM errors' shows production awareness. Key rule: use a list when you need to access data multiple times; use a generator for single-pass processing of large datasets."
What NOT to Say:
"They're the same thing" — List comprehension uses [] and stores everything in memory. Generator uses () and is lazy. Also, generators are single-use — once exhausted, they produce nothing.