Domain papers distilled from python-numbers-everyone-should-know: - async-overhead: 1,400x sync vs async overhead - collection-membership: 200x set vs list at 1000 items - json-serialization: 8x orjson vs stdlib - exception-flow: 6.5x exception overhead (try/except free) - string-formatting: f-strings > % > .format() - memory-slots: 69% memory reduction with __slots__ - import-optimization: 100ms+ for heavy packages - database-patterns: 98% commit overhead in SQLite RULEBOOK.md: ~200 token distillation for coding subagents 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2.9 KiB
2.9 KiB
Collection Membership: The 200x Performance Cliff
Domain Paper: Python Collection Selection for Membership Testing Date: 2026-01-03 Source: Python Numbers Everyone Should Know benchmarks
Executive Summary
Membership testing (x in collection) is one of the most common operations in Python code. The choice of collection type can result in a 200x performance difference at just 1,000 items.
Key Finding: At 1,000 items, checking if an item exists in a list takes 3.9 microseconds. The same check in a set takes 19 nanoseconds. That is a 206x difference.
The Core Numbers
Membership Testing Performance
| Operation | Time | Throughput |
|---|---|---|
item in set (existing) |
19.0 ns | 52.7M ops/sec |
key in dict (existing) |
20.8 ns | 48.1M ops/sec |
item in list (first) |
13.9 ns | 72.0M ops/sec |
item in list (middle, 500th) |
1,956 ns | 511k ops/sec |
item in list (last, 999th) |
3,852 ns | 260k ops/sec |
item in list (missing) |
3,915 ns | 255k ops/sec |
The 200x Cliff Explained
Set membership (any position): ~19 ns O(1)
List membership (worst case): ~3,915 ns O(n)
________
206x slower
Crossover Analysis
Crossover point: A list with ~15-20 items will match set performance for a full scan. Below that, lists may actually be faster due to lower overhead.
When to Use Each Type
Use Set when:
- Collection has more than ~20 items
- Checking membership more than once
- Order does not matter
Use Dict when:
- You need to associate values with keys
- Checking membership AND need to retrieve associated data
Use List when:
- Collection is very small (< 20 items)
- You iterate but rarely check membership
- Items might not be hashable
Practical Rules for Coding Agents
Rule 1: Default to Set for Membership
# PREFER
allowed_values = {'a', 'b', 'c'}
if value in allowed_values:
# AVOID
allowed_values = ['a', 'b', 'c']
if value in allowed_values:
Rule 2: Convert Lists Before Repeated Lookups
def process_items(items: list, valid_ids: list):
valid_set = set(valid_ids) # Convert once
return [item for item in items if item.id in valid_set]
Rule 3: Prefer dict.get() Over Check-then-Access
# AVOID (double lookup)
if key in config:
value = config[key]
# PREFER (single lookup)
value = config.get(key, default)
Summary Table
| Scenario | Best Choice | Why |
|---|---|---|
| Membership test on 1000+ items | Set | 200x faster than list |
| Key-value lookup | Dict | O(1) access with associated data |
| Ordered collection, rare membership | List | Lower memory, maintains order |
| Very small collection (< 20 items) | List or Set | Negligible difference |
Benchmark source: python-numbers-everyone-should-know