feat: Add 8 domain papers and RULEBOOK.md

Domain papers distilled from python-numbers-everyone-should-know: - async-overhead: 1,400x sync vs async overhead - collection-membership: 200x set vs list at 1000 items - json-serialization: 8x orjson vs stdlib - exception-flow: 6.5x exception overhead (try/except free) - string-formatting: f-strings > % > .format() - memory-slots: 69% memory reduction with __slots__ - import-optimization: 100ms+ for heavy packages - database-patterns: 98% commit overhead in SQLite RULEBOOK.md: ~200 token distillation for coding subagents 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 14:31:40 +01:00
parent 4def3b46c2
commit 7efd1368d1
9 changed files with 909 additions and 0 deletions
--- a/papers/async-overhead.md
+++ b/papers/async-overhead.md
@@ -0,0 +1,126 @@
+# Async Overhead in Python: When the Cure is Worse Than the Disease
+
+**Domain Paper: Python Performance ADRs**
+**Date:** 2026-01-03
+**Source:** Python Numbers Everyone Should Know benchmarks (Python 3.14.2, Apple Silicon)
+
+---
+
+## Executive Summary
+
+Async Python introduces a **1,400x overhead** for simple operations compared to synchronous equivalents. This overhead is fixed regardless of what work the function does. The critical insight: async only makes sense when you're waiting on I/O that takes orders of magnitude longer than this overhead.
+
+**The Core Numbers:**
+- Sync function call: **20.3 ns**
+- Async equivalent via `run_until_complete`: **28.2 us** (28,200 ns)
+- **Ratio: 1,387x slower** (approximately 1,400x)
+
+---
+
+## What Was Benchmarked
+
+### Methodology
+
+The benchmarks measured pure async machinery overhead using CPython 3.14.2 on Apple Silicon. Each operation was run thousands of times with warmup periods, reporting median values.
+
+### Test Functions
+
+```python
+# The async function being tested
+async def return_value_coro():
+    return 42
+
+# The sync equivalent
+def sync_function():
+    return 42
+```
+
+---
+
+## Key Findings
+
+### Coroutine Creation (Cheap)
+
+| Operation | Time |
+|-----------|------|
+| Create coroutine object | 47.0 ns |
+
+**Key insight:** Creating a coroutine object is cheap (47 ns). The cost comes when you actually run it.
+
+### Running Coroutines (Expensive)
+
+| Operation | Time |
+|-----------|------|
+| `run_until_complete(empty)` | 27.6 us |
+| `run_until_complete(return value)` | 26.6 us |
+| Run nested await | 28.9 us |
+
+**Key insight:** Every `run_until_complete` costs ~27 us regardless of coroutine complexity.
+
+### The Critical Comparison
+
+| Operation | Time | Ratio |
+|-----------|------|-------|
+| Sync function call | 20.3 ns | 1x |
+| Async equivalent | 28.2 us | **1,387x** |
+
+---
+
+## When Async IS Appropriate
+
+### Good Use Cases
+
+1. **Web servers handling concurrent connections** - FastAPI/Starlette: 115-125k req/sec
+2. **Concurrent network I/O** - Fetching data from multiple APIs simultaneously
+3. **High-latency operations with parallelism** - `asyncio.gather()` for multiple slow API calls
+
+### Bad Use Cases
+
+1. **Wrapping synchronous database drivers** - Use native async drivers or stay sync
+2. **CPU-bound computation** - Async doesn't parallelize CPU work (GIL)
+3. **Simple scripts with sequential operations** - CLI tools, data processing pipelines
+
+---
+
+## Practical Rules for Coding Agents
+
+### Rule 1: Default to Sync
+Write synchronous code unless you have a specific, measurable need for async.
+
+### Rule 2: The 1ms Threshold
+Only consider async when individual I/O operations take **>1 millisecond**.
+
+### Rule 3: Batch Over Broadcast
+If you need async, gather operations together:
+
+```python
+# Good: 27 us overhead ONCE
+results = await asyncio.gather(*[fetch(url) for url in urls])
+
+# Bad: 27 us overhead PER call
+for url in urls:
+    result = await fetch(url)
+```
+
+### Rule 4: Stay in the Loop
+Avoid `run_until_complete` inside an already-running loop.
+
+### Rule 5: Match Your I/O Library
+Use async libraries for async code, sync libraries for sync code.
+
+---
+
+## Summary Table
+
+| Scenario | Recommendation | Reasoning |
+|----------|----------------|-----------|
+| Simple function returning data | Sync | Async adds 1,400x overhead |
+| In-memory operations | Sync | No I/O to wait on |
+| Single database query | Sync | Query time < async amortization |
+| Multiple independent API calls | Async + gather | Parallelism benefit outweighs overhead |
+| Web server (many connections) | Async framework | Concurrent handling essential |
+| CLI tool | Sync | Sequential operations, no benefit |
+
+---
+
+*Benchmark source: python-numbers-everyone-should-know (2026-01-01, Python 3.14.2, Apple Silicon)*
--- a/papers/collection-membership.md
+++ b/papers/collection-membership.md
@@ -0,0 +1,109 @@
+# Collection Membership: The 200x Performance Cliff
+
+**Domain Paper: Python Collection Selection for Membership Testing**
+**Date:** 2026-01-03
+**Source:** Python Numbers Everyone Should Know benchmarks
+
+---
+
+## Executive Summary
+
+Membership testing (`x in collection`) is one of the most common operations in Python code. The choice of collection type can result in a **200x performance difference** at just 1,000 items.
+
+**Key Finding**: At 1,000 items, checking if an item exists in a list takes 3.9 microseconds. The same check in a set takes 19 nanoseconds. That is a 206x difference.
+
+---
+
+## The Core Numbers
+
+### Membership Testing Performance
+
+| Operation | Time | Throughput |
+|-----------|------|------------|
+| `item in set` (existing) | 19.0 ns | 52.7M ops/sec |
+| `key in dict` (existing) | 20.8 ns | 48.1M ops/sec |
+| `item in list` (first) | 13.9 ns | 72.0M ops/sec |
+| `item in list` (middle, 500th) | 1,956 ns | 511k ops/sec |
+| `item in list` (last, 999th) | 3,852 ns | 260k ops/sec |
+| `item in list` (missing) | 3,915 ns | 255k ops/sec |
+
+### The 200x Cliff Explained
+
+```
+Set membership (any position):     ~19 ns    O(1)
+List membership (worst case):   ~3,915 ns    O(n)
+                                ________
+                                  206x slower
+```
+
+---
+
+## Crossover Analysis
+
+**Crossover point**: A list with ~15-20 items will match set performance for a full scan. Below that, lists may actually be faster due to lower overhead.
+
+### When to Use Each Type
+
+**Use Set when:**
+- Collection has more than ~20 items
+- Checking membership more than once
+- Order does not matter
+
+**Use Dict when:**
+- You need to associate values with keys
+- Checking membership AND need to retrieve associated data
+
+**Use List when:**
+- Collection is very small (< 20 items)
+- You iterate but rarely check membership
+- Items might not be hashable
+
+---
+
+## Practical Rules for Coding Agents
+
+### Rule 1: Default to Set for Membership
+
+```python
+# PREFER
+allowed_values = {'a', 'b', 'c'}
+if value in allowed_values:
+
+# AVOID
+allowed_values = ['a', 'b', 'c']
+if value in allowed_values:
+```
+
+### Rule 2: Convert Lists Before Repeated Lookups
+
+```python
+def process_items(items: list, valid_ids: list):
+    valid_set = set(valid_ids)  # Convert once
+    return [item for item in items if item.id in valid_set]
+```
+
+### Rule 3: Prefer `dict.get()` Over Check-then-Access
+
+```python
+# AVOID (double lookup)
+if key in config:
+    value = config[key]
+
+# PREFER (single lookup)
+value = config.get(key, default)
+```
+
+---
+
+## Summary Table
+
+| Scenario | Best Choice | Why |
+|----------|-------------|-----|
+| Membership test on 1000+ items | Set | 200x faster than list |
+| Key-value lookup | Dict | O(1) access with associated data |
+| Ordered collection, rare membership | List | Lower memory, maintains order |
+| Very small collection (< 20 items) | List or Set | Negligible difference |
+
+---
+
+*Benchmark source: python-numbers-everyone-should-know*
--- a/papers/database-patterns.md
+++ b/papers/database-patterns.md
@@ -0,0 +1,121 @@
+# Database Patterns: SQLite, DiskCache, and MongoDB
+
+**Domain:** Persistence and data access patterns in Python
+**Source:** python-numbers-everyone-should-know benchmarks
+**Date:** 2026-01-03
+
+---
+
+## Executive Summary
+
+**Reads are cheap, writes are expensive.** SQLite commits dominate write latency (192 microseconds with commit vs 3 microseconds without). For read-heavy workloads, SQLite achieves 280K ops/sec by primary key. For write-heavy workloads, consider diskcache (8x faster writes) or batch operations.
+
+---
+
+## Key Findings
+
+### The Numbers
+
+| Operation | SQLite | DiskCache | MongoDB |
+|-----------|--------|-----------|---------|
+| **Write one object** | 192 us (5.2k/s) | 24 us (42k/s) | 119 us (8.4k/s) |
+| **Read by key/id** | 3.6 us (280k/s) | 4.3 us (236k/s) | 121 us (8.2k/s) |
+
+### Finding 1: The Commit Tax
+
+SQLite writes with commit: **192 microseconds**
+SQLite writes without commit: **3 microseconds**
+
+The commit operation accounts for **98.4% of write latency**.
+
+### Finding 2: DiskCache Wins for Simple Key-Value
+
+| Operation | SQLite Raw | DiskCache |
+|-----------|------------|-----------|
+| Write | 192 us | 24 us |
+| Read | 3.6 us | 4.3 us |
+
+DiskCache achieves **8x faster writes** with comparable read performance.
+
+### Finding 3: Batching Provides 9x Throughput
+
+SQLite `executemany()` 10 rows: **215 microseconds total** (21.5 us/row)
+10 individual inserts: **1,920 microseconds** (192 us/row)
+
+---
+
+## When to Use Each Storage Option
+
+### SQLite
+- Read-heavy workloads (100:1 read/write ratio)
+- Need to query inside JSON with `json_extract()`
+- ACID guarantees matter
+
+### DiskCache
+- Key-value storage with automatic serialization
+- Cache patterns (TTL, LRU eviction)
+- Agent state persistence
+
+### MongoDB
+- Distributed, multi-node deployments
+- Complex aggregation pipelines
+- Full-text search requirements
+
+---
+
+## Practical Rules for Coding Agents
+
+### Rule 1: Default to DiskCache for Agent State
+```python
+from diskcache import Cache
+cache = Cache('/tmp/agent-cache')
+cache.set('conversation:123', messages)  # 24 microseconds
+```
+
+### Rule 2: Batch SQLite Writes
+```python
+# GOOD: Batch with executemany
+conn.executemany('INSERT INTO items (data) VALUES (?)', items_list)
+conn.commit()  # One commit for all items
+```
+
+### Rule 3: Use Transactions for Multi-Step Operations
+```python
+conn.execute('BEGIN')
+conn.execute('INSERT INTO users ...')
+conn.execute('INSERT INTO audit_log ...')
+conn.commit()  # One fsync
+```
+
+### Rule 4: Lazy Import for CLI Tools
+```python
+def save_to_db(data):
+    import sqlite3  # 1.63ms only when needed
+    conn = sqlite3.connect('app.db')
+```
+
+---
+
+## Summary
+
+| Metric | SQLite | DiskCache | MongoDB |
+|--------|--------|-----------|---------|
+| Write latency | 192 us | 24 us | 119 us |
+| Read latency | 3.6 us | 4.3 us | 121 us |
+| Writes/sec | 5.2k | 42k | 8.4k |
+| Reads/sec | 280k | 236k | 8.2k |
+
+---
+
+## The Bottom Line
+
+1. **Reads are cheap everywhere** - Optimize for write patterns
+2. **SQLite commits dominate latency** - Batch or use transactions
+3. **DiskCache for key-value** - 8x faster writes, automatic serialization
+4. **MongoDB for distribution** - Not for local performance
+
+*The tortoise way: Measure, understand the cost, choose deliberately.*
+
+---
+
+*Benchmark source: python-numbers-everyone-should-know*
--- a/papers/exception-flow.md
+++ b/papers/exception-flow.md
@@ -0,0 +1,111 @@
+# Exception Flow: Performance Patterns
+
+**Domain:** Exception handling overhead
+**Source:** python-numbers-everyone-should-know benchmarks (Python 3.14.2, Apple Silicon)
+
+---
+
+## TL;DR
+
+- **try/except with no exception**: Nearly free (1.1 ns overhead)
+- **Raising an exception**: 6.5x slower than the happy path (139 ns vs 21.5 ns)
+- **EAFP is fine when exceptions are rare** (<5% of calls)
+- **Use LBYL for expected failures** (dict key lookup, file existence)
+- **Never use exceptions for normal control flow**
+
+---
+
+## The Numbers
+
+### Happy Path (No Exception Raised)
+
+| Operation | Time | Overhead vs Baseline |
+|-----------|------|---------------------|
+| Function call (no try/except) | 20.4 ns | baseline |
+| try/except (no exception raised) | 21.5 ns | +1.1 ns (+5%) |
+| try/except ValueError (specific) | 22.9 ns | +2.5 ns (+12%) |
+| try/except/finally | 22.1 ns | +1.7 ns (+8%) |
+
+**Key insight:** The try block itself is essentially free.
+
+### Sad Path (Exception Raised)
+
+| Operation | Time | Slowdown vs Happy Path |
+|-----------|------|----------------------|
+| raise + catch ValueError | 139 ns | **6.5x slower** |
+| raise + catch (base Exception) | 140 ns | 6.5x slower |
+| raise + catch custom exception | 146 ns | 6.8x slower |
+| raise + catch with `as e` | 148 ns | 6.9x slower |
+
+**Key insight:** The 6.5x overhead comes from:
+1. Creating the exception object (~40 ns)
+2. Capturing the traceback (~70 ns)
+3. Stack unwinding and handler lookup (~30 ns)
+
+---
+
+## EAFP vs LBYL: When to Use Which
+
+### EAFP (Easier to Ask Forgiveness than Permission)
+
+```python
+try:
+    value = data[key]
+except KeyError:
+    value = default
+```
+
+**Use when:** Exceptions are rare (<5% of calls)
+
+### LBYL (Look Before You Leap)
+
+```python
+if key in data:
+    value = data[key]
+else:
+    value = default
+```
+
+**Use when:** The failure case is common (>15% of calls)
+
+### Crossover Point
+
+**Rule of thumb:** If exceptions occur more than 15% of the time, use LBYL.
+
+### dict.get() Beats Both
+
+```python
+# Best: Use .get() - 26.3 ns, no exception possible
+config = settings.get('database', {})
+```
+
+---
+
+## Practical Rules for Coding Agents
+
+1. **try/except blocks are free** - don't avoid them for performance
+2. **Raising exceptions costs 6.5x** - only raise for truly exceptional cases
+3. **Use .get() for dicts** - beats both EAFP and LBYL
+4. **Return Optional for expected missing** - not exceptions
+5. **EAFP for file ops** - TOCTOU protection matters more than perf
+6. **LBYL when failures are common** (>15% of calls)
+7. **Never use exceptions for control flow**
+
+---
+
+## Summary
+
+| Scenario | Recommendation |
+|----------|----------------|
+| Exception rate <5% | EAFP (try/except) |
+| Exception rate >15% | LBYL (check first) |
+| Dict key lookup | Use `.get()` |
+| Optional return value | Return `None`, not exception |
+| File operations | EAFP (TOCTOU protection) |
+| Control flow | Never use exceptions |
+
+**The core insight:** try/except is free; raising is not. Design APIs to minimize raises, not to avoid try blocks.
+
+---
+
+*Benchmark source: python-numbers-everyone-should-know*
--- a/papers/import-optimization.md
+++ b/papers/import-optimization.md
@@ -0,0 +1,104 @@
+# Import Optimization
+
+**Domain Paper: Python Performance ADRs**
+**Date:** 2026-01-03
+**Source:** python-numbers-everyone-should-know benchmarks
+
+---
+
+## Executive Summary
+
+Import costs range from **sub-microsecond (cached) to 100+ milliseconds** (large frameworks). For CLI tools and short-lived scripts, import time can dominate total execution.
+
+---
+
+## Benchmark Data (First Import, Fresh Process)
+
+### Built-in Modules
+
+| Module | First Import |
+|--------|-------------|
+| `sys` | 0.2 us |
+| `os` | 0.2 us |
+| `math` | 24 us |
+
+### Standard Library
+
+| Module | First Import |
+|--------|-------------|
+| `datetime` | 72 us |
+| `typing` | 2.0 ms |
+| `json` | 2.9 ms |
+| `dataclasses` | 6.0 ms |
+| `logging` | 10.5 ms |
+| `asyncio` | 17.7 ms |
+
+### External Packages
+
+| Package | First Import |
+|---------|-------------|
+| `pydantic` | 15.8 ms |
+| `flask` | 47.3 ms |
+| `fastapi` | 104.4 ms |
+
+**Key insight:** FastAPI takes 100ms just to import. For a CLI tool that runs in 50ms, this is unacceptable overhead.
+
+---
+
+## Lazy Import Patterns
+
+### Pattern 1: Function-Level Import
+```python
+def process_data(data):
+    import pandas as pd  # Only when needed
+    return pd.DataFrame(data)
+```
+
+### Pattern 2: TYPE_CHECKING Guard
+```python
+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    import pandas as pd
+
+def process(data: "pd.DataFrame"):
+    import pandas as pd
+    return pd.DataFrame(data)
+```
+
+---
+
+## Practical Rules for Coding Agents
+
+### MUST
+
+1. **Use `TYPE_CHECKING` for type-only imports** when the type is from a heavy package
+2. **Use function-level imports for rarely-used code paths**
+3. **Never import heavy packages at module level in CLI tools**
+
+### SHOULD
+
+4. **Use `from __future__ import annotations`** for cleaner TYPE_CHECKING
+5. **Profile import time for new dependencies:**
+   ```bash
+   python -c "import time; s=time.perf_counter(); import PACKAGE; print(f'{(time.perf_counter()-s)*1000:.1f}ms')"
+   ```
+
+---
+
+## Summary Table
+
+| Scenario | Pattern | Example |
+|----------|---------|---------|
+| Type hints for heavy types | `TYPE_CHECKING` | pandas, numpy types |
+| Rarely-used function | Function-level import | Error handling paths |
+| CLI fast path | Defer until needed | `--version`, `--help` |
+| Serverless cold start | Minimize top-level | Lambda/Cloud Functions |
+
+---
+
+*Import costs are hidden taxes. Pay them lazily.*
+
+---
+
+*Benchmark source: python-numbers-everyone-should-know*
--- a/papers/json-serialization.md
+++ b/papers/json-serialization.md
@@ -0,0 +1,93 @@
+# JSON Serialization Performance in Python
+
+**Domain Paper: Python Performance ADRs**
+**Date:** 2026-01-03
+**Source:** Python Numbers Everyone Should Know benchmarks
+
+---
+
+## Executive Summary
+
+Alternative JSON libraries like `orjson` and `msgspec` deliver **8-12x faster serialization** and **2-7x faster deserialization** compared to stdlib `json`. The performance gap is consistent across payload sizes.
+
+---
+
+## Key Findings
+
+### Serialization Performance (dumps)
+
+| Library | Simple Object | Complex Object | Speedup vs stdlib |
+|---------|--------------|----------------|-------------------|
+| `json.dumps()` | 708 ns | 2.65 us | 1x (baseline) |
+| `orjson.dumps()` | 61 ns | 310 ns | **11.6x / 8.5x** |
+| `msgspec.encode()` | 92 ns | 445 ns | 7.7x / 6.0x |
+| `ujson.dumps()` | 264 ns | 1.64 us | 2.7x / 1.6x |
+
+### Deserialization Performance (loads)
+
+| Library | Simple Object | Complex Object | Speedup vs stdlib |
+|---------|--------------|----------------|-------------------|
+| `json.loads()` | 714 ns | 2.22 us | 1x (baseline) |
+| `orjson.loads()` | 106 ns | 839 ns | **6.7x / 2.6x** |
+| `msgspec.decode()` | 101 ns | 850 ns | 7.1x / 2.6x |
+
+---
+
+## When to Use Each Library
+
+### Use stdlib `json` when:
+- Zero dependencies required
+- Need custom JSONEncoder subclass
+- Compatibility is paramount
+
+### Use `orjson` when:
+- Maximum performance needed
+- You can accept bytes output
+- You need datetime/UUID support
+
+### Use `msgspec` when:
+- You need typed decoding
+- You want MessagePack too
+- Memory efficiency matters
+
+---
+
+## Practical Rules for Coding Agents
+
+### Rule 1: Default to orjson for new projects
+
+```python
+# Instead of:
+import json
+data = json.dumps(obj)
+
+# Prefer:
+import orjson
+data = orjson.dumps(obj)  # Returns bytes
+```
+
+### Rule 2: Use stdlib json only when explicitly needed
+
+Acceptable reasons:
+- Must avoid external dependencies
+- Need custom JSONEncoder subclass
+- Working in constrained environment
+
+### Rule 3: Profile before optimizing JSON
+
+At 2-3 microseconds per operation, JSON serialization is rarely the bottleneck unless you're doing thousands of operations per second.
+
+---
+
+## Summary Table
+
+| Scenario | Recommendation | Expected Speedup |
+|----------|----------------|------------------|
+| General use | orjson | 8x serialization, 2.5x deserialization |
+| Typed data | msgspec | 6x + type safety |
+| Drop-in replacement | ujson | 1.5-2x |
+| Zero dependencies | json (stdlib) | Baseline |
+
+---
+
+*Benchmark source: python-numbers-everyone-should-know*
--- a/papers/memory-slots.md
+++ b/papers/memory-slots.md
@@ -0,0 +1,102 @@
+# Memory Optimization with __slots__ in Python
+
+**Domain Paper: Python Performance ADRs**
+**Date:** 2026-01-03
+**Source:** python-numbers-everyone-should-know benchmarks
+
+---
+
+## Executive Summary
+
+Python's `__slots__` mechanism provides **52-70% memory reduction** when creating many instances of the same class.
+
+**Key Finding**: For a class with 5 attributes, `__slots__` reduces instance memory from 694 bytes to 212 bytes (69% reduction).
+
+---
+
+## Benchmark Results: Memory Footprint
+
+### Single Instance Memory (5 Attributes)
+
+| Type | Memory (bytes) | vs Regular Class |
+|------|----------------|------------------|
+| Regular class | 694 | baseline |
+| `__slots__` class | 212 | -69% |
+| dataclass | 694 | same as regular |
+| `@dataclass(slots=True)` | 212 | -69% |
+| namedtuple | 228 | -67% |
+
+### At Scale (1,000 Instances)
+
+| Type | Total Memory |
+|------|--------------|
+| List of 1,000 regular class | 165.2 KB |
+| List of 1,000 `__slots__` class | 79.1 KB |
+
+**Memory Savings**: 52% reduction at scale
+
+---
+
+## Attribute Access Speed (Virtually Identical)
+
+| Operation | Regular | `__slots__` |
+|-----------|---------|-------------|
+| Read attr | 14.1 ns | 14.1 ns |
+| Write attr | 15.7 ns | 16.4 ns |
+
+---
+
+## Trade-offs
+
+### What __slots__ Prevents
+
+1. **No dynamic attribute assignment**
+2. **No `__dict__` access** (`vars()` doesn't work)
+3. **Inheritance complications**
+4. **No weak references by default**
+
+---
+
+## Practical Rules for Coding Agents
+
+### Rule 1: Instance Count Threshold
+```
+IF creating > 100 instances of the same class
+AND attributes are fixed at design time
+THEN consider __slots__ or @dataclass(slots=True)
+```
+
+### Rule 2: Prefer Slots Dataclass (Python 3.10+)
+```python
+@dataclass(slots=True)
+class User:
+    id: int
+    name: str
+    email: str
+```
+
+### Rule 3: Don't Optimize Prematurely
+For < 100 instances, use regular classes for flexibility.
+
+### Rule 4: Document the Trade-off
+```python
+# Using __slots__ for memory efficiency (1000+ instances expected)
+__slots__ = ['x', 'y', 'z']
+```
+
+---
+
+## Summary
+
+| Aspect | Regular Class | `__slots__` Class |
+|--------|---------------|-------------------|
+| Memory (5 attrs) | 694 bytes | 212 bytes |
+| Read speed | 14.1 ns | 14.1 ns |
+| Dynamic attributes | Yes | No |
+| Best for | Flexibility | Many instances |
+
+**Bottom Line**: Use `__slots__` (or `@dataclass(slots=True)`) when creating many instances of fixed-attribute classes. For small numbers, stick with regular classes.
+
+---
+
+*Benchmark source: python-numbers-everyone-should-know*
--- a/papers/string-formatting.md
+++ b/papers/string-formatting.md
@@ -0,0 +1,111 @@
+# String Formatting: Domain Exploration
+
+**Date:** 2026-01-03
+**Source:** python-numbers-everyone-should-know benchmarks
+**Python Version:** 3.14.2 (CPython, ARM64 macOS)
+
+---
+
+## Executive Summary
+
+String formatting performance: **simple concatenation is fastest for trivial joins**, while **f-strings offer the best balance of readability and performance** for interpolation use cases.
+
+---
+
+## Raw Benchmark Results
+
+| Operation | Time (ns) | Throughput |
+|-----------|-----------|------------|
+| `concat_small` | 39.1 ns | 25.6M ops/sec |
+| `f_string` | 64.9 ns | 15.4M ops/sec |
+| `percent_formatting` | 89.8 ns | 11.1M ops/sec |
+| `format_method` | 103 ns | 9.7M ops/sec |
+
+### Relative Performance
+
+| Method | vs f-string |
+|--------|-------------|
+| `concat_small` | 1.66x faster |
+| `f_string` | 1.00x (reference) |
+| `percent_formatting` | 0.72x slower |
+| `format_method` | 0.63x slower |
+
+---
+
+## Why F-Strings Are Fast
+
+F-strings are parsed at **compile time**, not runtime:
+
+1. **No method lookup**: F-strings don't call `.format()` at runtime
+2. **No tuple creation**: `%` formatting requires `(name,)` tuple
+3. **Specialized bytecode**: `FORMAT_VALUE` and `BUILD_STRING` are optimized
+
+---
+
+## When to Use Each Method
+
+### Concatenation Wins
+For 2-3 literal strings with no formatting:
+```python
+path = base_dir + '/' + filename  # Simpler, faster
+```
+
+### % Formatting for Logging
+```python
+# Deferred evaluation - string built only if debug enabled
+logger.debug('Processing %s items', count)
+
+# f-string - string ALWAYS built, then discarded
+logger.debug(f'Processing {count} items')  # Wasteful
+```
+
+### .format() for Dynamic Templates
+```python
+template = get_template_from_config()  # Returns 'User: {name}'
+result = template.format(name=user.name)
+```
+
+---
+
+## Practical Rules for Coding Agents
+
+### Rule 1: Default to F-Strings
+```python
+# Preferred
+message = f'User {user.name} logged in at {timestamp}'
+```
+
+### Rule 2: Use Concatenation for Trivial Joins
+```python
+url = base_url + endpoint  # Fine - simpler and faster
+```
+
+### Rule 3: Use join() for Multiple Parts
+```python
+# Correct - O(n) time
+result = ''.join([part1, part2, part3, part4])
+
+# Inefficient - O(n^2) time
+result = part1 + part2 + part3 + part4
+```
+
+### Rule 4: Keep % for Logging
+```python
+logger.info('Processed %d records in %.2fs', count, elapsed)
+```
+
+---
+
+## Summary
+
+| Scenario | Best Choice | Reason |
+|----------|-------------|--------|
+| Variable interpolation | f-string | 1.6x faster than `.format()` |
+| Simple 2-part join | Concatenation | 1.7x faster than f-string |
+| Building from many parts | `''.join()` | O(n) vs O(n^2) |
+| Logging statements | `%` style | Deferred evaluation |
+| Dynamic templates | `.format()` | Template flexibility |
+
+---
+
+*Benchmark source: python-numbers-everyone-should-know*