Files
python-performance-adrs/papers/collection-membership.md
dafit 7efd1368d1 feat: Add 8 domain papers and RULEBOOK.md
Domain papers distilled from python-numbers-everyone-should-know:
- async-overhead: 1,400x sync vs async overhead
- collection-membership: 200x set vs list at 1000 items
- json-serialization: 8x orjson vs stdlib
- exception-flow: 6.5x exception overhead (try/except free)
- string-formatting: f-strings > % > .format()
- memory-slots: 69% memory reduction with __slots__
- import-optimization: 100ms+ for heavy packages
- database-patterns: 98% commit overhead in SQLite

RULEBOOK.md: ~200 token distillation for coding subagents

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 14:31:40 +01:00

2.9 KiB

Collection Membership: The 200x Performance Cliff

Domain Paper: Python Collection Selection for Membership Testing Date: 2026-01-03 Source: Python Numbers Everyone Should Know benchmarks


Executive Summary

Membership testing (x in collection) is one of the most common operations in Python code. The choice of collection type can result in a 200x performance difference at just 1,000 items.

Key Finding: At 1,000 items, checking if an item exists in a list takes 3.9 microseconds. The same check in a set takes 19 nanoseconds. That is a 206x difference.


The Core Numbers

Membership Testing Performance

Operation Time Throughput
item in set (existing) 19.0 ns 52.7M ops/sec
key in dict (existing) 20.8 ns 48.1M ops/sec
item in list (first) 13.9 ns 72.0M ops/sec
item in list (middle, 500th) 1,956 ns 511k ops/sec
item in list (last, 999th) 3,852 ns 260k ops/sec
item in list (missing) 3,915 ns 255k ops/sec

The 200x Cliff Explained

Set membership (any position):     ~19 ns    O(1)
List membership (worst case):   ~3,915 ns    O(n)
                                ________
                                  206x slower

Crossover Analysis

Crossover point: A list with ~15-20 items will match set performance for a full scan. Below that, lists may actually be faster due to lower overhead.

When to Use Each Type

Use Set when:

  • Collection has more than ~20 items
  • Checking membership more than once
  • Order does not matter

Use Dict when:

  • You need to associate values with keys
  • Checking membership AND need to retrieve associated data

Use List when:

  • Collection is very small (< 20 items)
  • You iterate but rarely check membership
  • Items might not be hashable

Practical Rules for Coding Agents

Rule 1: Default to Set for Membership

# PREFER
allowed_values = {'a', 'b', 'c'}
if value in allowed_values:

# AVOID
allowed_values = ['a', 'b', 'c']
if value in allowed_values:

Rule 2: Convert Lists Before Repeated Lookups

def process_items(items: list, valid_ids: list):
    valid_set = set(valid_ids)  # Convert once
    return [item for item in items if item.id in valid_set]

Rule 3: Prefer dict.get() Over Check-then-Access

# AVOID (double lookup)
if key in config:
    value = config[key]

# PREFER (single lookup)
value = config.get(key, default)

Summary Table

Scenario Best Choice Why
Membership test on 1000+ items Set 200x faster than list
Key-value lookup Dict O(1) access with associated data
Ordered collection, rare membership List Lower memory, maintains order
Very small collection (< 20 items) List or Set Negligible difference

Benchmark source: python-numbers-everyone-should-know