python-performance-adrs/papers/collection-membership.md

# Collection Membership: The 200x Performance Cliff

**Domain Paper: Python Collection Selection for Membership Testing**
**Date:** 2026-01-03
**Source:** Python Numbers Everyone Should Know benchmarks

---

## Executive Summary

Membership testing (`x in collection`) is one of the most common operations in Python code. The choice of collection type can result in a **200x performance difference** at just 1,000 items.

**Key Finding**: At 1,000 items, checking if an item exists in a list takes 3.9 microseconds. The same check in a set takes 19 nanoseconds. That is a 206x difference.

---

## The Core Numbers

### Membership Testing Performance

| Operation | Time | Throughput |
|-----------|------|------------|
| `item in set` (existing) | 19.0 ns | 52.7M ops/sec |
| `key in dict` (existing) | 20.8 ns | 48.1M ops/sec |
| `item in list` (first) | 13.9 ns | 72.0M ops/sec |
| `item in list` (middle, 500th) | 1,956 ns | 511k ops/sec |
| `item in list` (last, 999th) | 3,852 ns | 260k ops/sec |
| `item in list` (missing) | 3,915 ns | 255k ops/sec |

### The 200x Cliff Explained

```
Set membership (any position):     ~19 ns    O(1)
List membership (worst case):   ~3,915 ns    O(n)
                                ________
                                  206x slower
```

---

## Crossover Analysis

**Crossover point**: A list with ~15-20 items will match set performance for a full scan. Below that, lists may actually be faster due to lower overhead.

### When to Use Each Type

**Use Set when:**
- Collection has more than ~20 items
- Checking membership more than once
- Order does not matter

**Use Dict when:**
- You need to associate values with keys
- Checking membership AND need to retrieve associated data

**Use List when:**
- Collection is very small (< 20 items)
- You iterate but rarely check membership
- Items might not be hashable

---

## Practical Rules for Coding Agents

### Rule 1: Default to Set for Membership

```python
# PREFER
allowed_values = {'a', 'b', 'c'}
if value in allowed_values:

# AVOID
allowed_values = ['a', 'b', 'c']
if value in allowed_values:
```

### Rule 2: Convert Lists Before Repeated Lookups

```python
def process_items(items: list, valid_ids: list):
    valid_set = set(valid_ids)  # Convert once
    return [item for item in items if item.id in valid_set]
```

### Rule 3: Prefer `dict.get()` Over Check-then-Access

```python
# AVOID (double lookup)
if key in config:
    value = config[key]

# PREFER (single lookup)
value = config.get(key, default)
```

---

## Summary Table

| Scenario | Best Choice | Why |
|----------|-------------|-----|
| Membership test on 1000+ items | Set | 200x faster than list |
| Key-value lookup | Dict | O(1) access with associated data |
| Ordered collection, rare membership | List | Lower memory, maintains order |
| Very small collection (< 20 items) | List or Set | Negligible difference |

---

*Benchmark source: python-numbers-everyone-should-know*