Files

dafit 7efd1368d1 feat: Add 8 domain papers and RULEBOOK.md

Domain papers distilled from python-numbers-everyone-should-know:
- async-overhead: 1,400x sync vs async overhead
- collection-membership: 200x set vs list at 1000 items
- json-serialization: 8x orjson vs stdlib
- exception-flow: 6.5x exception overhead (try/except free)
- string-formatting: f-strings > % > .format()
- memory-slots: 69% memory reduction with __slots__
- import-optimization: 100ms+ for heavy packages
- database-patterns: 98% commit overhead in SQLite

RULEBOOK.md: ~200 token distillation for coding subagents

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-03 14:31:40 +01:00

2.9 KiB

Raw Permalink Blame History

Collection Membership: The 200x Performance Cliff

Domain Paper: Python Collection Selection for Membership Testing Date: 2026-01-03 Source: Python Numbers Everyone Should Know benchmarks

Executive Summary

Membership testing (x in collection) is one of the most common operations in Python code. The choice of collection type can result in a 200x performance difference at just 1,000 items.

Key Finding: At 1,000 items, checking if an item exists in a list takes 3.9 microseconds. The same check in a set takes 19 nanoseconds. That is a 206x difference.

The Core Numbers

Membership Testing Performance

Operation	Time	Throughput
`item in set` (existing)	19.0 ns	52.7M ops/sec
`key in dict` (existing)	20.8 ns	48.1M ops/sec
`item in list` (first)	13.9 ns	72.0M ops/sec
`item in list` (middle, 500th)	1,956 ns	511k ops/sec
`item in list` (last, 999th)	3,852 ns	260k ops/sec
`item in list` (missing)	3,915 ns	255k ops/sec

The 200x Cliff Explained

Set membership (any position):     ~19 ns    O(1)
List membership (worst case):   ~3,915 ns    O(n)
                                ________
                                  206x slower

Crossover Analysis

Crossover point: A list with ~15-20 items will match set performance for a full scan. Below that, lists may actually be faster due to lower overhead.

When to Use Each Type

Use Set when:

Collection has more than ~20 items
Checking membership more than once
Order does not matter

Use Dict when:

You need to associate values with keys
Checking membership AND need to retrieve associated data

Use List when:

Collection is very small (< 20 items)
You iterate but rarely check membership
Items might not be hashable

Practical Rules for Coding Agents

Rule 1: Default to Set for Membership

# PREFER
allowed_values = {'a', 'b', 'c'}
if value in allowed_values:

# AVOID
allowed_values = ['a', 'b', 'c']
if value in allowed_values:

Rule 2: Convert Lists Before Repeated Lookups

def process_items(items: list, valid_ids: list):
    valid_set = set(valid_ids)  # Convert once
    return [item for item in items if item.id in valid_set]

Rule 3: Prefer `dict.get()` Over Check-then-Access

# AVOID (double lookup)
if key in config:
    value = config[key]

# PREFER (single lookup)
value = config.get(key, default)

Summary Table

Scenario	Best Choice	Why
Membership test on 1000+ items	Set	200x faster than list
Key-value lookup	Dict	O(1) access with associated data
Ordered collection, rare membership	List	Lower memory, maintains order
Very small collection (< 20 items)	List or Set	Negligible difference

Benchmark source: python-numbers-everyone-should-know

2.9 KiB Raw Permalink Blame History