# Collection Membership: The 200x Performance Cliff **Domain Paper: Python Collection Selection for Membership Testing** **Date:** 2026-01-03 **Source:** Python Numbers Everyone Should Know benchmarks --- ## Executive Summary Membership testing (`x in collection`) is one of the most common operations in Python code. The choice of collection type can result in a **200x performance difference** at just 1,000 items. **Key Finding**: At 1,000 items, checking if an item exists in a list takes 3.9 microseconds. The same check in a set takes 19 nanoseconds. That is a 206x difference. --- ## The Core Numbers ### Membership Testing Performance | Operation | Time | Throughput | |-----------|------|------------| | `item in set` (existing) | 19.0 ns | 52.7M ops/sec | | `key in dict` (existing) | 20.8 ns | 48.1M ops/sec | | `item in list` (first) | 13.9 ns | 72.0M ops/sec | | `item in list` (middle, 500th) | 1,956 ns | 511k ops/sec | | `item in list` (last, 999th) | 3,852 ns | 260k ops/sec | | `item in list` (missing) | 3,915 ns | 255k ops/sec | ### The 200x Cliff Explained ``` Set membership (any position): ~19 ns O(1) List membership (worst case): ~3,915 ns O(n) ________ 206x slower ``` --- ## Crossover Analysis **Crossover point**: A list with ~15-20 items will match set performance for a full scan. Below that, lists may actually be faster due to lower overhead. ### When to Use Each Type **Use Set when:** - Collection has more than ~20 items - Checking membership more than once - Order does not matter **Use Dict when:** - You need to associate values with keys - Checking membership AND need to retrieve associated data **Use List when:** - Collection is very small (< 20 items) - You iterate but rarely check membership - Items might not be hashable --- ## Practical Rules for Coding Agents ### Rule 1: Default to Set for Membership ```python # PREFER allowed_values = {'a', 'b', 'c'} if value in allowed_values: # AVOID allowed_values = ['a', 'b', 'c'] if value in allowed_values: ``` ### Rule 2: Convert Lists Before Repeated Lookups ```python def process_items(items: list, valid_ids: list): valid_set = set(valid_ids) # Convert once return [item for item in items if item.id in valid_set] ``` ### Rule 3: Prefer `dict.get()` Over Check-then-Access ```python # AVOID (double lookup) if key in config: value = config[key] # PREFER (single lookup) value = config.get(key, default) ``` --- ## Summary Table | Scenario | Best Choice | Why | |----------|-------------|-----| | Membership test on 1000+ items | Set | 200x faster than list | | Key-value lookup | Dict | O(1) access with associated data | | Ordered collection, rare membership | List | Lower memory, maintains order | | Very small collection (< 20 items) | List or Set | Negligible difference | --- *Benchmark source: python-numbers-everyone-should-know*