Python is a feature iceberg. Standing atop it, you behold a vast ocean of libraries (often written in more performant languages like C), ready to do your bidding with a mere pip install and a few ugly lines of code. Many programmers learn enough Python to sail the high C’s, and leave it at that.
Or, one day, you ask yourself the question. “I wonder if there’s a way to —” and a crevasse opens up beneath you, plunging you into the ice. Some hours later, you emerge from the underside, disgusted that 1) there indeed was a way and 2) you did not like it. But if you come to accept what you have seen, your iceberg rises a little higher above the C…
They say higher ground is easier to defend.
4: truthiness considered harmful
Our example of the day is a function that simultaneously finds the largest element in a list, the first/last indices where it appears, and the number of times it appears.
def find_max(values: list[T]) -> tuple[T, int, int, int]:
assert values
max_value = values[0]
first_index = last_index
count = 1
...
return max_value, first_index, last_index, countThere is an annoying pet peeve of mine on the first line: truthiness.
assert valuesWhat are we asserting, exactly? To translate for people who no speak Pythonicano, the assertion is intended to check that the list is not empty:
assert len(values) > 0But this is not the only way to check that a list is not empty.
We could also use the not operator:
assert not valuesBut in other contexts it can also mean any or all of
assert values is not None
assert values != 0
assert values != ""Python is a language that is supposed to read like English, so it baffles me to no end when it is abbreviated like this. Using truthiness is lazy and can lead to confusion:
def maybe_length(values: list[T] | None) -> int | None:
"""
Get the length of the list, otherwise return None.
"""
return len(values) if values else None
>>> maybe_length([])
None # oopsReplacing a truthy value with an explicit boolean condition costs at most a dozen more characters. You can definitely afford to explicitly state the condition.
“What about boolean operations?”
>>> length = len(values or []) # smelly
>>> length = 0 if values is None else len(values) # betterNo excuses.
5: runtime immutable collections
Now that that’s out of the way, here’s a full implementation of the function.1
def find_max(values: list[T]) -> tuple[T, int, int, int]:
assert len(values) > 0 # fixed
max_value = values[0]
first_index = last_index = 0
count = 1
for i, x in enumerate(values[1:], start=1):
if x > max_value:
values[max_value] = x # HI I AM A BUG
first_index = last_index = i
count = 1
elif x == max_value:
last_index = i
count += 1
return max_value, first_index, last_index, countIf you are observant, you might spot the bug — we accidentally updated values[max_value] instead of max_value itself.
Not only is this wrong, but it also screws up the original list, which was passed by reference.
>>> values = [1, 0, 2, 2, 1]
>>> find_max(values)
1, 3, 4, 2
>>> values
[1, 2, 2, 2, 1] # oopsHowever, if we pass in a tuple instead:
>>> values = (1, 0, 2, 2, 1)
>>> find_max(values)
TypeError: 'tuple' object does not support item assignmentTuples are the immutable sibling of lists, so the tuple caught the bug the moment we tried to change its contents — offensive programming!
Similarly, you can use frozenset as an immutable set, or types.MappingProxyType as an immutable dict.2
As a bonus, immutable collections can also be hashed, allowing you to store them in sets or use them as keys in dictionaries.
6: immutable type annotations
In Python, list and tuple are distinct classes — so passing a tuple into a function that expects a list will cause your type checker to scream.
If you want your function to accept both, you will need
from collections.abc import Sequence # for Python 3.9+
from typing import Sequence # for Python <3.9
def find_max(values: Sequence[T]) -> tuple[T, int, int, int]:
...collections.abc.Sequence is a common parent type of list and tuple. I use it instead of list or tuple in my method signatures all the time, because
- The robustness principle: be liberal in what you accept.
- The
Sequenceinterface is read-only, so your type checker will catch bugs likevalues[max_value] = x.
Immutable type annotations offer a different kind of defense than offensive programming — they are statically checked before the code is ever run.
However, they aren’t checked at runtime, so you should still use immutable collection classes like tuple and frozenset if you want to defend them from code you can’t control, like third-party libraries.
There are also nicely-named parent types for dict, set, and their immutable counterparts.
| concrete classes you want to accept | immutable type annotation |
|---|---|
list[T] tuple[T, ...] | Sequence[T] |
dict[K, V] types.MappingProxyType[K, V] | Mapping[K, V] |
set[T] frozenset[T] | Set[T] |
list[T] tuple[T, ...] set[T] frozenset[T] | Collection[T]3 |
7: unpackable returns are a code smell
After fixing the bug, we publish our find_max function.
Someone installs it without reading the documentation, and calls it with the wrong variable names:
>>> values = (0, 1, 0, 2, 3, 1, 0, 3)
>>> first_index, last_index, max_value, count = find_max(values)
>>> max_value
7 # oopsThe bug was completely invisible to the poor user’s type checker. Sure, they got a bit unlucky there by calling it on a Sequence[int], but as a defensive programmer you need to have a sixth sense for bad luck.
The defense against this kind of confusion is to prevent it from even arising in the first place.
The cause of the confusion?
Tuple returns are smelly.
They’re not always bad — users are not going to mess up the order if the function returned only first_index and last_index.
But as the tuple gets longer, it literally gets exponentially smellier.
There are ways to unpack an iterable of things, but only permutation is ever right.
To prevent unpacking, you probably want to return a non-unpackable data structure from find_max instead, where the return values can be directly accessed by name (“max_value”, “first_index”, etc.).
Besides defining your own class from scratch, there are a few ways to implement such record data structures:
Seasoned Python developers should be well-versed in all of these solutions, since some programming frameworks expect functions to only return primitive record types like namedtuples or dicts.
But for the most part, dataclasses are the best vanilla Python technique to get the job done:
from dataclasses import dataclass
@dataclass(frozen=True)
class FoundMax(Generic[T]):
max_value: T
first_index: int
last_index: int
count: int
def __post_init__(self):
# Arbitrary validation!
assert self.first_index <= self.last_index
assert self.count > 0
assert self.count <= self.last_index - self.first_index + 1
@property
def is_unique(self) -> bool:
# An example convenience method on the return object!
return self.count == 1An added benefit of this single return object is that we can now handle failure cases more gracefully too.
def find_max(values: Sequence[T]) -> FoundMax[T] | None:
if len(values) == 0:
return None
...
return FoundMax(
max_value=max_value,
first_index=first_index,
last_index=last_index,
count=count,
)
>>> values = (0, 1, 0, 2, 3, 1, 0, 3)
>>> result = find_max(values)
>>> assert result is not None
>>> result
FoundMax(max_value=3, first_index=4, last_index=7, count=2)
>>> result.max_value
3People often ask why I hardly document my code, despite otherwise being a perfectionist. Besides laziness, the reason is actually evident in this example. Even with zero comments or documentation, a perfectly defensive function should already be self-explanatory to users just based on its name and signature. Such is the power of dataclasses…
“Wait — you said they are the best vanilla Python technique? Is there something even better?”
I guess you’ll just have to stay tuned!