Python Comprehensions

Master Python comprehensions: list, dict, set, generator expressions, nested comprehensions, and the walrus operator for efficient data processing.

Beginner 11 min read 10 examples

List Comprehensions

List comprehensions create lists concisely: [expression for item in iterable if condition].

Python
# Basic: [expression for item in iterable]
squares = [x**2 for x in range(1, 6)]
print(squares)  # [1, 4, 9, 16, 25]

# With filter condition
evens = [x for x in range(20) if x % 2 == 0]
print(evens)    # [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

# Both transform and filter
data  = [1, -2, 3, -4, 5, -6]
abs_evens = [abs(x) for x in data if x % 2 == 0]
print(abs_evens)    # [2, 4, 6]

# String transformations
words = ["hello", "WORLD", "Python", "CODE"]
lower = [w.lower() for w in words]
title = [w.title() for w in words if len(w) > 4]
print(lower)    # ['hello', 'world', 'python', 'code']
print(title)    # ['Hello', 'World', 'Python']

# With function call
from math import sqrt
roots = [round(sqrt(x), 2) for x in [4, 9, 16, 25]]
print(roots)    # [2.0, 3.0, 4.0, 5.0]

# With ternary expression
labels = ["even" if x % 2 == 0 else "odd" for x in range(6)]
print(labels)   # ['even', 'odd', 'even', 'odd', 'even', 'odd']

# Flatten a list of lists
nested = [[1, 2, 3], [4, 5], [6, 7, 8, 9]]
flat = [item for sublist in nested for item in sublist]
print(flat)     # [1, 2, 3, 4, 5, 6, 7, 8, 9]

# Compare to for loop (equivalent)
flat2 = []
for sublist in nested:
    for item in sublist:
        flat2.append(item)

# Comprehension with multiple conditions
nums = [x for x in range(100) if x % 3 == 0 if x % 5 == 0]
print(nums)     # [0, 15, 30, 45, 60, 75, 90]  (divisible by both 3 and 5)

Dict Comprehensions

Python
# Basic: {key: value for item in iterable}
squares = {x: x**2 for x in range(1, 6)}
print(squares)  # {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}

# From an existing dict
prices = {"apple": 0.5, "banana": 0.3, "cherry": 1.2}
with_tax = {k: round(v * 1.1, 2) for k, v in prices.items()}
print(with_tax)  # {'apple': 0.55, 'banana': 0.33, 'cherry': 1.32}

# Filter entries
expensive = {k: v for k, v in prices.items() if v >= 0.5}
print(expensive)  # {'apple': 0.5, 'cherry': 1.2}

# Invert a dict
original = {"a": 1, "b": 2, "c": 3}
inverted = {v: k for k, v in original.items()}
print(inverted)  # {1: 'a', 2: 'b', 3: 'c'}

# Build from two lists
keys   = ["name", "age", "city"]
values = ["Alice", 30, "London"]
person = {k: v for k, v in zip(keys, values)}
print(person)   # {'name': 'Alice', 'age': 30, 'city': 'London'}

# Normalize keys
raw = {"Name": "Alice", "AGE": 30, "City": "London"}
normalized = {k.lower(): v for k, v in raw.items()}
print(normalized)   # {'name': 'Alice', 'age': 30, 'city': 'London'}

# Group items by property
words = ["apple", "banana", "cherry", "avocado", "blueberry", "apricot"]
by_letter = {
    letter: [w for w in words if w[0] == letter]
    for letter in set(w[0] for w in words)
}
print(by_letter)
# {'a': ['apple', 'avocado', 'apricot'], 'b': ['banana', 'blueberry'], 'c': ['cherry']}

Set Comprehensions

Python
# Basic: {expression for item in iterable}
# Result is unordered and contains unique elements
squares = {x**2 for x in range(-3, 4)}
print(squares)  # {0, 1, 4, 9}  (unordered, -3 and 3 give same square)

# Unique first characters
words = ["apple", "banana", "cherry", "avocado"]
first_chars = {w[0] for w in words}
print(first_chars)   # {'a', 'b', 'c'}

# Deduplicate a processed result
data = ["Alice", "bob", "Alice", "BOB", "charlie"]
unique_lower = {name.lower() for name in data}
print(unique_lower)     # {'alice', 'bob', 'charlie'}

# Filter and deduplicate
numbers = [1, 2, 2, 3, 4, 4, 5, 6, 7, 8]
even_squares = {x**2 for x in numbers if x % 2 == 0}
print(even_squares)     # {4, 16, 36, 64}

# Set comprehension with membership test
valid_tags = {"python", "javascript", "typescript", "rust"}
user_tags  = ["Python", "Java", "Python", "Rust", "Go"]
valid_user_tags = {t.lower() for t in user_tags if t.lower() in valid_tags}
print(valid_user_tags)  # {'python', 'rust'}

Generator Expressions

Generator expressions use the same syntax as list comprehensions but with parentheses. They are lazy - values are produced one at a time.

Python
import sys

# List comprehension - stores all values in memory
squares_list = [x**2 for x in range(1000)]
print(sys.getsizeof(squares_list))  # ~8000+ bytes

# Generator expression - lazy, minimal memory
squares_gen = (x**2 for x in range(1000))
print(sys.getsizeof(squares_gen))   # ~112 bytes (just the generator object)

# Generator is an iterator - values are produced on demand
gen = (x**2 for x in range(5))
print(next(gen))    # 0
print(next(gen))    # 1
print(next(gen))    # 4
for x in gen:       # continues from where we left off
    print(x)        # 9, 16

# Best use: pass directly to functions that consume iterables
numbers = range(1000000)

# sum() with generator - never builds a list
total = sum(x**2 for x in numbers if x % 2 == 0)

# any() / all() with generator - short-circuit evaluation
has_negative = any(x < 0 for x in [1, 2, -3, 4])
print(has_negative)     # True (stops at -3)

all_positive = all(x > 0 for x in [1, 2, 3, 4])
print(all_positive)     # True

# max / min with key
words = ["banana", "apple", "cherry"]
longest = max(len(w) for w in words)   # 6

# Generator vs comprehension decision
# Use list comprehension when:
# - You need to use the result multiple times
# - You need to index or slice the result
# - You need len() of the result

# Use generator expression when:
# - You iterate over the result once
# - You pass it to sum(), any(), all(), max(), min(), join()
# - Memory efficiency matters (processing large datasets)

Nested Comprehensions

Python
# Nested list comprehension: outer for runs first, inner for iterates per item
# [expr for outer in outer_iter for inner in inner_iter]

# Flatten a 2D list (most common use)
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat = [cell for row in matrix for cell in row]
print(flat)     # [1, 2, 3, 4, 5, 6, 7, 8, 9]

# With filter
flat_positives = [n for row in [[-1, 2], [3, -4], [5, -6]] for n in row if n > 0]
print(flat_positives)   # [2, 3, 5]

# Create a 2D matrix (outer for creates rows, inner for creates cells)
grid = [[0 for _ in range(3)] for _ in range(3)]
print(grid)     # [[0, 0, 0], [0, 0, 0], [0, 0, 0]]

# Identity matrix
n = 3
identity = [[1 if i == j else 0 for j in range(n)] for i in range(n)]
print(identity)
# [[1, 0, 0],
#  [0, 1, 0],
#  [0, 0, 1]]

# Transpose
matrix_t = [[matrix[j][i] for j in range(len(matrix))] for i in range(len(matrix[0]))]
# Better: list(zip(*matrix)) or [[row[i] for row in matrix] for i in range(cols)]

# Cartesian product (pairs from two lists)
colors = ["red", "blue"]
sizes  = ["S", "M", "L"]
variants = [(c, s) for c in colors for s in sizes]
print(variants)
# [('red', 'S'), ('red', 'M'), ('red', 'L'), ('blue', 'S'), ('blue', 'M'), ('blue', 'L')]

# WARNING: avoid 3+ level nesting - use itertools or a function instead
# Deep nesting is hard to read. Limit to 2 levels.
Limit comprehension nesting to 2 levels

Nested comprehensions are powerful but can quickly become unreadable. Limit nesting to 2 levels maximum. For deeper nesting (3+ levels), use a regular for loop, a helper function, or itertools.chain.from_iterable(). Readability is more important than conciseness.

Walrus Operator in Comprehensions

The walrus operator := avoids computing expensive values twice in a comprehension.

Python
import math

# Without walrus - expensive() is called twice (once for filter, once for value)
def expensive(x):
    return math.sqrt(abs(x))   # simulating a slow operation

data = range(-10, 10)

# BAD: compute twice
result1 = [expensive(x) for x in data if expensive(x) > 2]

# GOOD: walrus computes once, assigns to y, uses y in both places
result2 = [y for x in data if (y := expensive(x)) > 2]

print(result1 == result2)   # True - same result, more efficient

# Walrus in filter condition to capture intermediate values
words = ["hello", "world", "Python", "is", "awesome"]

# Find words with more than 4 chars and return their uppercase version
long_upper = [u for w in words if (u := w.upper()) and len(w) > 4]
print(long_upper)   # ['HELLO', 'WORLD', 'PYTHON', 'AWESOME']

# Process and filter in one pass
import re
texts = ["Price: $9.99", "Item: Widget", "Price: $24.99", "Item: Gadget"]

prices = [
    float(m.group(1))
    for text in texts
    if (m := re.search(r"\$(\d+\.\d+)", text))
]
print(prices)   # [9.99, 24.99]

# The scoping rule: walrus variables in a comprehension
# leak into the enclosing scope (unlike loop variable which is local)
values = [1, 2, 3, 4, 5]
filtered = [last := x for x in values if x > 2]
print(filtered) # [3, 4, 5]
print(last)     # 5  (leaked from the comprehension - use with care)

Frequently Asked Questions

A list comprehension [x for x in ...] evaluates immediately and stores all results in memory as a list. A generator expression (x for x in ...) is lazy - it produces values one at a time on demand and uses almost no memory. Use generator expressions when you only need to iterate once (e.g., pass to sum(), max(), or a for loop). Use list comprehensions when you need a list you can index, slice, or iterate multiple times.

Use a list comprehension when creating a list from a transformation or filter of another iterable - it is more concise and typically faster. Use a regular for loop when: the body has side effects (printing, writing to a file), the logic requires multiple statements, you need early exit with break, or the transformation is complex enough that the comprehension becomes hard to read. Rule of thumb: if it fits on one readable line, a comprehension is probably right.

Yes, typically 10-30% faster for simple operations. The main reason is that list comprehensions use an optimized internal C loop rather than the Python bytecode dispatch of a for loop. However, for very complex operations (multiple nested loops, expensive function calls), the performance difference is minimal. Prefer comprehensions for readability and correctness first, performance second.

Yes. The walrus operator := is particularly useful in comprehensions when you need the result of an expensive computation in both the filter condition and the value: [y for x in data if (y := expensive(x)) > threshold] - computes expensive(x) once, uses it for both the condition and as the output value. Without walrus, you would compute it twice or use a more verbose form.