feat: Add 8 domain papers and RULEBOOK.md

Domain papers distilled from python-numbers-everyone-should-know: - async-overhead: 1,400x sync vs async overhead - collection-membership: 200x set vs list at 1000 items - json-serialization: 8x orjson vs stdlib - exception-flow: 6.5x exception overhead (try/except free) - string-formatting: f-strings > % > .format() - memory-slots: 69% memory reduction with __slots__ - import-optimization: 100ms+ for heavy packages - database-patterns: 98% commit overhead in SQLite RULEBOOK.md: ~200 token distillation for coding subagents 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 14:31:40 +01:00
parent 4def3b46c2
commit 7efd1368d1
9 changed files with 909 additions and 0 deletions
--- a/RULEBOOK.md
+++ b/RULEBOOK.md
@@ -0,0 +1,32 @@
 # RULEBOOK: Python Performance
 **Load into coding subagents. ~200 tokens. See `/papers/*.md` for deep dives.**
 ## The Numbers
 - Async overhead: **1,400x** (sync 20ns, async 28us)
 - Set vs list membership: **200x** at 1000 items
 - orjson vs stdlib json: **8x** faster
 - Exception raise: **6.5x** (try/except is free)
 - SQLite commit: **98%** of write latency
 ## Rules
 **Collections**: Default to `set` for membership. Use `dict.get()` not check-then-access.
 **Async**: Default sync. Only async for I/O >1ms. Use `gather()` not loops.
 **JSON**: Default `orjson`. stdlib only for zero-deps requirement.
 **Exceptions**: try/except is free. Use `.get()` for dicts. EAFP for <15% failure rate.
 **Strings**: f-strings default. `%` for logging (deferred eval). `join()` for many parts.
 **Memory**: `@dataclass(slots=True)` for 100+ instances.
 **Imports**: `TYPE_CHECKING` for heavy types. Lazy imports in CLI tools.
 **Database**: DiskCache for key-value. Batch SQLite writes. Reads cheap, writes expensive.
 ---
 *Source: python-numbers-everyone-should-know (Python 3.14.2, Apple Silicon)*
--- a/papers/async-overhead.md
+++ b/papers/async-overhead.md
@@ -0,0 +1,126 @@
 # Async Overhead in Python: When the Cure is Worse Than the Disease
 **Domain Paper: Python Performance ADRs**
 **Date:** 2026-01-03
 **Source:** Python Numbers Everyone Should Know benchmarks (Python 3.14.2, Apple Silicon)
 ---
 ## Executive Summary
 Async Python introduces a **1,400x overhead** for simple operations compared to synchronous equivalents. This overhead is fixed regardless of what work the function does. The critical insight: async only makes sense when you're waiting on I/O that takes orders of magnitude longer than this overhead.
 **The Core Numbers:**
 - Sync function call: **20.3 ns**
 - Async equivalent via `run_until_complete`: **28.2 us** (28,200 ns)
 - **Ratio: 1,387x slower** (approximately 1,400x)
 ---
 ## What Was Benchmarked
 ### Methodology
 The benchmarks measured pure async machinery overhead using CPython 3.14.2 on Apple Silicon. Each operation was run thousands of times with warmup periods, reporting median values.
 ### Test Functions
 ```python
 # The async function being tested
 async def return_value_coro():
    return 42
 # The sync equivalent
 def sync_function():
    return 42
 ```
 ---
 ## Key Findings
 ### Coroutine Creation (Cheap)
 | Operation | Time |
 |-----------|------|
 | Create coroutine object | 47.0 ns |
 **Key insight:** Creating a coroutine object is cheap (47 ns). The cost comes when you actually run it.
 ### Running Coroutines (Expensive)
 | Operation | Time |
 |-----------|------|
 | `run_until_complete(empty)` | 27.6 us |
 | `run_until_complete(return value)` | 26.6 us |
 | Run nested await | 28.9 us |
 **Key insight:** Every `run_until_complete` costs ~27 us regardless of coroutine complexity.
 ### The Critical Comparison
 | Operation | Time | Ratio |
 |-----------|------|-------|
 | Sync function call | 20.3 ns | 1x |
 | Async equivalent | 28.2 us | **1,387x** |
 ---
 ## When Async IS Appropriate
 ### Good Use Cases
 1. **Web servers handling concurrent connections** - FastAPI/Starlette: 115-125k req/sec
 2. **Concurrent network I/O** - Fetching data from multiple APIs simultaneously
 3. **High-latency operations with parallelism** - `asyncio.gather()` for multiple slow API calls
 ### Bad Use Cases
 1. **Wrapping synchronous database drivers** - Use native async drivers or stay sync
 2. **CPU-bound computation** - Async doesn't parallelize CPU work (GIL)
 3. **Simple scripts with sequential operations** - CLI tools, data processing pipelines
 ---
 ## Practical Rules for Coding Agents
 ### Rule 1: Default to Sync
 Write synchronous code unless you have a specific, measurable need for async.
 ### Rule 2: The 1ms Threshold
 Only consider async when individual I/O operations take **>1 millisecond**.
 ### Rule 3: Batch Over Broadcast
 If you need async, gather operations together:
 ```python
 # Good: 27 us overhead ONCE
 results = await asyncio.gather(*[fetch(url) for url in urls])
 # Bad: 27 us overhead PER call
 for url in urls:
    result = await fetch(url)
 ```
 ### Rule 4: Stay in the Loop
 Avoid `run_until_complete` inside an already-running loop.
 ### Rule 5: Match Your I/O Library
 Use async libraries for async code, sync libraries for sync code.
 ---
 ## Summary Table
 | Scenario | Recommendation | Reasoning |
 |----------|----------------|-----------|
 | Simple function returning data | Sync | Async adds 1,400x overhead |
 | In-memory operations | Sync | No I/O to wait on |
 | Single database query | Sync | Query time < async amortization |
 | Multiple independent API calls | Async + gather | Parallelism benefit outweighs overhead |
 | Web server (many connections) | Async framework | Concurrent handling essential |
 | CLI tool | Sync | Sequential operations, no benefit |
 ---
 *Benchmark source: python-numbers-everyone-should-know (2026-01-01, Python 3.14.2, Apple Silicon)*
--- a/papers/collection-membership.md
+++ b/papers/collection-membership.md
@@ -0,0 +1,109 @@
 # Collection Membership: The 200x Performance Cliff
 **Domain Paper: Python Collection Selection for Membership Testing**
 **Date:** 2026-01-03
 **Source:** Python Numbers Everyone Should Know benchmarks
 ---
 ## Executive Summary
 Membership testing (`x in collection`) is one of the most common operations in Python code. The choice of collection type can result in a **200x performance difference** at just 1,000 items.
 **Key Finding**: At 1,000 items, checking if an item exists in a list takes 3.9 microseconds. The same check in a set takes 19 nanoseconds. That is a 206x difference.
 ---
 ## The Core Numbers
 ### Membership Testing Performance
 | Operation | Time | Throughput |
 |-----------|------|------------|
 | `item in set` (existing) | 19.0 ns | 52.7M ops/sec |
 | `key in dict` (existing) | 20.8 ns | 48.1M ops/sec |
 | `item in list` (first) | 13.9 ns | 72.0M ops/sec |
 | `item in list` (middle, 500th) | 1,956 ns | 511k ops/sec |
 | `item in list` (last, 999th) | 3,852 ns | 260k ops/sec |
 | `item in list` (missing) | 3,915 ns | 255k ops/sec |
 ### The 200x Cliff Explained
 ```
 Set membership (any position):     ~19 ns    O(1)
 List membership (worst case):   ~3,915 ns    O(n)
                                ________
                                  206x slower
 ```
 ---
 ## Crossover Analysis
 **Crossover point**: A list with ~15-20 items will match set performance for a full scan. Below that, lists may actually be faster due to lower overhead.
 ### When to Use Each Type
 **Use Set when:**
 - Collection has more than ~20 items
 - Checking membership more than once
 - Order does not matter
 **Use Dict when:**
 - You need to associate values with keys
 - Checking membership AND need to retrieve associated data
 **Use List when:**
 - Collection is very small (< 20 items)
 - You iterate but rarely check membership
 - Items might not be hashable
 ---
 ## Practical Rules for Coding Agents
 ### Rule 1: Default to Set for Membership
 ```python
 # PREFER
 allowed_values = {'a', 'b', 'c'}
 if value in allowed_values:
 # AVOID
 allowed_values = ['a', 'b', 'c']
 if value in allowed_values:
 ```
 ### Rule 2: Convert Lists Before Repeated Lookups
 ```python
 def process_items(items: list, valid_ids: list):
    valid_set = set(valid_ids)  # Convert once
    return [item for item in items if item.id in valid_set]
 ```
 ### Rule 3: Prefer `dict.get()` Over Check-then-Access
 ```python
 # AVOID (double lookup)
 if key in config:
    value = config[key]
 # PREFER (single lookup)
 value = config.get(key, default)
 ```
 ---
 ## Summary Table
 | Scenario | Best Choice | Why |
 |----------|-------------|-----|
 | Membership test on 1000+ items | Set | 200x faster than list |
 | Key-value lookup | Dict | O(1) access with associated data |
 | Ordered collection, rare membership | List | Lower memory, maintains order |
 | Very small collection (< 20 items) | List or Set | Negligible difference |
 ---
 *Benchmark source: python-numbers-everyone-should-know*
--- a/papers/database-patterns.md
+++ b/papers/database-patterns.md
@@ -0,0 +1,121 @@
 # Database Patterns: SQLite, DiskCache, and MongoDB
 **Domain:** Persistence and data access patterns in Python
 **Source:** python-numbers-everyone-should-know benchmarks
 **Date:** 2026-01-03
 ---
 ## Executive Summary
 **Reads are cheap, writes are expensive.** SQLite commits dominate write latency (192 microseconds with commit vs 3 microseconds without). For read-heavy workloads, SQLite achieves 280K ops/sec by primary key. For write-heavy workloads, consider diskcache (8x faster writes) or batch operations.
 ---
 ## Key Findings
 ### The Numbers
 | Operation | SQLite | DiskCache | MongoDB |
 |-----------|--------|-----------|---------|
 | **Write one object** | 192 us (5.2k/s) | 24 us (42k/s) | 119 us (8.4k/s) |
 | **Read by key/id** | 3.6 us (280k/s) | 4.3 us (236k/s) | 121 us (8.2k/s) |
 ### Finding 1: The Commit Tax
 SQLite writes with commit: **192 microseconds**
 SQLite writes without commit: **3 microseconds**
 The commit operation accounts for **98.4% of write latency**.
 ### Finding 2: DiskCache Wins for Simple Key-Value
 | Operation | SQLite Raw | DiskCache |
 |-----------|------------|-----------|
 | Write | 192 us | 24 us |
 | Read | 3.6 us | 4.3 us |
 DiskCache achieves **8x faster writes** with comparable read performance.
 ### Finding 3: Batching Provides 9x Throughput
 SQLite `executemany()` 10 rows: **215 microseconds total** (21.5 us/row)
 10 individual inserts: **1,920 microseconds** (192 us/row)
 ---
 ## When to Use Each Storage Option
 ### SQLite
 - Read-heavy workloads (100:1 read/write ratio)
 - Need to query inside JSON with `json_extract()`
 - ACID guarantees matter
 ### DiskCache
 - Key-value storage with automatic serialization
 - Cache patterns (TTL, LRU eviction)
 - Agent state persistence
 ### MongoDB
 - Distributed, multi-node deployments
 - Complex aggregation pipelines
 - Full-text search requirements
 ---
 ## Practical Rules for Coding Agents
 ### Rule 1: Default to DiskCache for Agent State
 ```python
 from diskcache import Cache
 cache = Cache('/tmp/agent-cache')
 cache.set('conversation:123', messages)  # 24 microseconds
 ```
 ### Rule 2: Batch SQLite Writes
 ```python
 # GOOD: Batch with executemany
 conn.executemany('INSERT INTO items (data) VALUES (?)', items_list)
 conn.commit()  # One commit for all items
 ```
 ### Rule 3: Use Transactions for Multi-Step Operations
 ```python
 conn.execute('BEGIN')
 conn.execute('INSERT INTO users ...')
 conn.execute('INSERT INTO audit_log ...')
 conn.commit()  # One fsync
 ```
 ### Rule 4: Lazy Import for CLI Tools
 ```python
 def save_to_db(data):
    import sqlite3  # 1.63ms only when needed
    conn = sqlite3.connect('app.db')
 ```
 ---
 ## Summary
 | Metric | SQLite | DiskCache | MongoDB |
 |--------|--------|-----------|---------|
 | Write latency | 192 us | 24 us | 119 us |
 | Read latency | 3.6 us | 4.3 us | 121 us |
 | Writes/sec | 5.2k | 42k | 8.4k |
 | Reads/sec | 280k | 236k | 8.2k |
 ---
 ## The Bottom Line
 1. **Reads are cheap everywhere** - Optimize for write patterns
 2. **SQLite commits dominate latency** - Batch or use transactions
 3. **DiskCache for key-value** - 8x faster writes, automatic serialization
 4. **MongoDB for distribution** - Not for local performance
 *The tortoise way: Measure, understand the cost, choose deliberately.*
 ---
 *Benchmark source: python-numbers-everyone-should-know*
--- a/papers/exception-flow.md
+++ b/papers/exception-flow.md
@@ -0,0 +1,111 @@
 # Exception Flow: Performance Patterns
 **Domain:** Exception handling overhead
 **Source:** python-numbers-everyone-should-know benchmarks (Python 3.14.2, Apple Silicon)
 ---
 ## TL;DR
 - **try/except with no exception**: Nearly free (1.1 ns overhead)
 - **Raising an exception**: 6.5x slower than the happy path (139 ns vs 21.5 ns)
 - **EAFP is fine when exceptions are rare** (<5% of calls)
 - **Use LBYL for expected failures** (dict key lookup, file existence)
 - **Never use exceptions for normal control flow**
 ---
 ## The Numbers
 ### Happy Path (No Exception Raised)
 | Operation | Time | Overhead vs Baseline |
 |-----------|------|---------------------|
 | Function call (no try/except) | 20.4 ns | baseline |
 | try/except (no exception raised) | 21.5 ns | +1.1 ns (+5%) |
 | try/except ValueError (specific) | 22.9 ns | +2.5 ns (+12%) |
 | try/except/finally | 22.1 ns | +1.7 ns (+8%) |
 **Key insight:** The try block itself is essentially free.
 ### Sad Path (Exception Raised)
 | Operation | Time | Slowdown vs Happy Path |
 |-----------|------|----------------------|
 | raise + catch ValueError | 139 ns | **6.5x slower** |
 | raise + catch (base Exception) | 140 ns | 6.5x slower |
 | raise + catch custom exception | 146 ns | 6.8x slower |
 | raise + catch with `as e` | 148 ns | 6.9x slower |
 **Key insight:** The 6.5x overhead comes from:
 1. Creating the exception object (~40 ns)
 2. Capturing the traceback (~70 ns)
 3. Stack unwinding and handler lookup (~30 ns)
 ---
 ## EAFP vs LBYL: When to Use Which
 ### EAFP (Easier to Ask Forgiveness than Permission)
 ```python
 try:
    value = data[key]
 except KeyError:
    value = default
 ```
 **Use when:** Exceptions are rare (<5% of calls)
 ### LBYL (Look Before You Leap)
 ```python
 if key in data:
    value = data[key]
 else:
    value = default
 ```
 **Use when:** The failure case is common (>15% of calls)
 ### Crossover Point
 **Rule of thumb:** If exceptions occur more than 15% of the time, use LBYL.
 ### dict.get() Beats Both
 ```python
 # Best: Use .get() - 26.3 ns, no exception possible
 config = settings.get('database', {})
 ```
 ---
 ## Practical Rules for Coding Agents
 1. **try/except blocks are free** - don't avoid them for performance
 2. **Raising exceptions costs 6.5x** - only raise for truly exceptional cases
 3. **Use .get() for dicts** - beats both EAFP and LBYL
 4. **Return Optional for expected missing** - not exceptions
 5. **EAFP for file ops** - TOCTOU protection matters more than perf
 6. **LBYL when failures are common** (>15% of calls)
 7. **Never use exceptions for control flow**
 ---
 ## Summary
 | Scenario | Recommendation |
 |----------|----------------|
 | Exception rate <5% | EAFP (try/except) |
 | Exception rate >15% | LBYL (check first) |
 | Dict key lookup | Use `.get()` |
 | Optional return value | Return `None`, not exception |
 | File operations | EAFP (TOCTOU protection) |
 | Control flow | Never use exceptions |
 **The core insight:** try/except is free; raising is not. Design APIs to minimize raises, not to avoid try blocks.
 ---
 *Benchmark source: python-numbers-everyone-should-know*
--- a/papers/import-optimization.md
+++ b/papers/import-optimization.md
@@ -0,0 +1,104 @@
 # Import Optimization
 **Domain Paper: Python Performance ADRs**
 **Date:** 2026-01-03
 **Source:** python-numbers-everyone-should-know benchmarks
 ---
 ## Executive Summary
 Import costs range from **sub-microsecond (cached) to 100+ milliseconds** (large frameworks). For CLI tools and short-lived scripts, import time can dominate total execution.
 ---
 ## Benchmark Data (First Import, Fresh Process)
 ### Built-in Modules
 | Module | First Import |
 |--------|-------------|
 | `sys` | 0.2 us |
 | `os` | 0.2 us |
 | `math` | 24 us |
 ### Standard Library
 | Module | First Import |
 |--------|-------------|
 | `datetime` | 72 us |
 | `typing` | 2.0 ms |
 | `json` | 2.9 ms |
 | `dataclasses` | 6.0 ms |
 | `logging` | 10.5 ms |
 | `asyncio` | 17.7 ms |
 ### External Packages
 | Package | First Import |
 |---------|-------------|
 | `pydantic` | 15.8 ms |
 | `flask` | 47.3 ms |
 | `fastapi` | 104.4 ms |
 **Key insight:** FastAPI takes 100ms just to import. For a CLI tool that runs in 50ms, this is unacceptable overhead.
 ---
 ## Lazy Import Patterns
 ### Pattern 1: Function-Level Import
 ```python
 def process_data(data):
    import pandas as pd  # Only when needed
    return pd.DataFrame(data)
 ```
 ### Pattern 2: TYPE_CHECKING Guard
 ```python
 from typing import TYPE_CHECKING
 if TYPE_CHECKING:
    import pandas as pd
 def process(data: "pd.DataFrame"):
    import pandas as pd
    return pd.DataFrame(data)
 ```
 ---
 ## Practical Rules for Coding Agents
 ### MUST
 1. **Use `TYPE_CHECKING` for type-only imports** when the type is from a heavy package
 2. **Use function-level imports for rarely-used code paths**
 3. **Never import heavy packages at module level in CLI tools**
 ### SHOULD
 4. **Use `from __future__ import annotations`** for cleaner TYPE_CHECKING
 5. **Profile import time for new dependencies:**
   ```bash
   python -c "import time; s=time.perf_counter(); import PACKAGE; print(f'{(time.perf_counter()-s)*1000:.1f}ms')"
   ```
 ---
 ## Summary Table
 | Scenario | Pattern | Example |
 |----------|---------|---------|
 | Type hints for heavy types | `TYPE_CHECKING` | pandas, numpy types |
 | Rarely-used function | Function-level import | Error handling paths |
 | CLI fast path | Defer until needed | `--version`, `--help` |
 | Serverless cold start | Minimize top-level | Lambda/Cloud Functions |
 ---
 *Import costs are hidden taxes. Pay them lazily.*
 ---
 *Benchmark source: python-numbers-everyone-should-know*
--- a/papers/json-serialization.md
+++ b/papers/json-serialization.md
@@ -0,0 +1,93 @@
 # JSON Serialization Performance in Python
 **Domain Paper: Python Performance ADRs**
 **Date:** 2026-01-03
 **Source:** Python Numbers Everyone Should Know benchmarks
 ---
 ## Executive Summary
 Alternative JSON libraries like `orjson` and `msgspec` deliver **8-12x faster serialization** and **2-7x faster deserialization** compared to stdlib `json`. The performance gap is consistent across payload sizes.
 ---
 ## Key Findings
 ### Serialization Performance (dumps)
 | Library | Simple Object | Complex Object | Speedup vs stdlib |
 |---------|--------------|----------------|-------------------|
 | `json.dumps()` | 708 ns | 2.65 us | 1x (baseline) |
 | `orjson.dumps()` | 61 ns | 310 ns | **11.6x / 8.5x** |
 | `msgspec.encode()` | 92 ns | 445 ns | 7.7x / 6.0x |
 | `ujson.dumps()` | 264 ns | 1.64 us | 2.7x / 1.6x |
 ### Deserialization Performance (loads)
 | Library | Simple Object | Complex Object | Speedup vs stdlib |
 |---------|--------------|----------------|-------------------|
 | `json.loads()` | 714 ns | 2.22 us | 1x (baseline) |
 | `orjson.loads()` | 106 ns | 839 ns | **6.7x / 2.6x** |
 | `msgspec.decode()` | 101 ns | 850 ns | 7.1x / 2.6x |
 ---
 ## When to Use Each Library
 ### Use stdlib `json` when:
 - Zero dependencies required
 - Need custom JSONEncoder subclass
 - Compatibility is paramount
 ### Use `orjson` when:
 - Maximum performance needed
 - You can accept bytes output
 - You need datetime/UUID support
 ### Use `msgspec` when:
 - You need typed decoding
 - You want MessagePack too
 - Memory efficiency matters
 ---
 ## Practical Rules for Coding Agents
 ### Rule 1: Default to orjson for new projects
 ```python
 # Instead of:
 import json
 data = json.dumps(obj)
 # Prefer:
 import orjson
 data = orjson.dumps(obj)  # Returns bytes
 ```
 ### Rule 2: Use stdlib json only when explicitly needed
 Acceptable reasons:
 - Must avoid external dependencies
 - Need custom JSONEncoder subclass
 - Working in constrained environment
 ### Rule 3: Profile before optimizing JSON
 At 2-3 microseconds per operation, JSON serialization is rarely the bottleneck unless you're doing thousands of operations per second.
 ---
 ## Summary Table
 | Scenario | Recommendation | Expected Speedup |
 |----------|----------------|------------------|
 | General use | orjson | 8x serialization, 2.5x deserialization |
 | Typed data | msgspec | 6x + type safety |
 | Drop-in replacement | ujson | 1.5-2x |
 | Zero dependencies | json (stdlib) | Baseline |
 ---
 *Benchmark source: python-numbers-everyone-should-know*
--- a/papers/memory-slots.md
+++ b/papers/memory-slots.md
@@ -0,0 +1,102 @@
 # Memory Optimization with __slots__ in Python
 **Domain Paper: Python Performance ADRs**
 **Date:** 2026-01-03
 **Source:** python-numbers-everyone-should-know benchmarks
 ---
 ## Executive Summary
 Python's `__slots__` mechanism provides **52-70% memory reduction** when creating many instances of the same class.
 **Key Finding**: For a class with 5 attributes, `__slots__` reduces instance memory from 694 bytes to 212 bytes (69% reduction).
 ---
 ## Benchmark Results: Memory Footprint
 ### Single Instance Memory (5 Attributes)
 | Type | Memory (bytes) | vs Regular Class |
 |------|----------------|------------------|
 | Regular class | 694 | baseline |
 | `__slots__` class | 212 | -69% |
 | dataclass | 694 | same as regular |
 | `@dataclass(slots=True)` | 212 | -69% |
 | namedtuple | 228 | -67% |
 ### At Scale (1,000 Instances)
 | Type | Total Memory |
 |------|--------------|
 | List of 1,000 regular class | 165.2 KB |
 | List of 1,000 `__slots__` class | 79.1 KB |
 **Memory Savings**: 52% reduction at scale
 ---
 ## Attribute Access Speed (Virtually Identical)
 | Operation | Regular | `__slots__` |
 |-----------|---------|-------------|
 | Read attr | 14.1 ns | 14.1 ns |
 | Write attr | 15.7 ns | 16.4 ns |
 ---
 ## Trade-offs
 ### What __slots__ Prevents
 1. **No dynamic attribute assignment**
 2. **No `__dict__` access** (`vars()` doesn't work)
 3. **Inheritance complications**
 4. **No weak references by default**
 ---
 ## Practical Rules for Coding Agents
 ### Rule 1: Instance Count Threshold
 ```
 IF creating > 100 instances of the same class
 AND attributes are fixed at design time
 THEN consider __slots__ or @dataclass(slots=True)
 ```
 ### Rule 2: Prefer Slots Dataclass (Python 3.10+)
 ```python
@dataclass(slots=True)
 class User:
    id: int
    name: str
    email: str
 ```
 ### Rule 3: Don't Optimize Prematurely
 For < 100 instances, use regular classes for flexibility.
 ### Rule 4: Document the Trade-off
 ```python
 # Using __slots__ for memory efficiency (1000+ instances expected)
 __slots__ = ['x', 'y', 'z']
 ```
 ---
 ## Summary
 | Aspect | Regular Class | `__slots__` Class |
 |--------|---------------|-------------------|
 | Memory (5 attrs) | 694 bytes | 212 bytes |
 | Read speed | 14.1 ns | 14.1 ns |
 | Dynamic attributes | Yes | No |
 | Best for | Flexibility | Many instances |
 **Bottom Line**: Use `__slots__` (or `@dataclass(slots=True)`) when creating many instances of fixed-attribute classes. For small numbers, stick with regular classes.
 ---
 *Benchmark source: python-numbers-everyone-should-know*
--- a/papers/string-formatting.md
+++ b/papers/string-formatting.md
@@ -0,0 +1,111 @@
 # String Formatting: Domain Exploration
 **Date:** 2026-01-03
 **Source:** python-numbers-everyone-should-know benchmarks
 **Python Version:** 3.14.2 (CPython, ARM64 macOS)
 ---
 ## Executive Summary
 String formatting performance: **simple concatenation is fastest for trivial joins**, while **f-strings offer the best balance of readability and performance** for interpolation use cases.
 ---
 ## Raw Benchmark Results
 | Operation | Time (ns) | Throughput |
 |-----------|-----------|------------|
 | `concat_small` | 39.1 ns | 25.6M ops/sec |
 | `f_string` | 64.9 ns | 15.4M ops/sec |
 | `percent_formatting` | 89.8 ns | 11.1M ops/sec |
 | `format_method` | 103 ns | 9.7M ops/sec |
 ### Relative Performance
 | Method | vs f-string |
 |--------|-------------|
 | `concat_small` | 1.66x faster |
 | `f_string` | 1.00x (reference) |
 | `percent_formatting` | 0.72x slower |
 | `format_method` | 0.63x slower |
 ---
 ## Why F-Strings Are Fast
 F-strings are parsed at **compile time**, not runtime:
 1. **No method lookup**: F-strings don't call `.format()` at runtime
 2. **No tuple creation**: `%` formatting requires `(name,)` tuple
 3. **Specialized bytecode**: `FORMAT_VALUE` and `BUILD_STRING` are optimized
 ---
 ## When to Use Each Method
 ### Concatenation Wins
 For 2-3 literal strings with no formatting:
 ```python
 path = base_dir + '/' + filename  # Simpler, faster
 ```
 ### % Formatting for Logging
 ```python
 # Deferred evaluation - string built only if debug enabled
 logger.debug('Processing %s items', count)
 # f-string - string ALWAYS built, then discarded
 logger.debug(f'Processing {count} items')  # Wasteful
 ```
 ### .format() for Dynamic Templates
 ```python
 template = get_template_from_config()  # Returns 'User: {name}'
 result = template.format(name=user.name)
 ```
 ---
 ## Practical Rules for Coding Agents
 ### Rule 1: Default to F-Strings
 ```python
 # Preferred
 message = f'User {user.name} logged in at {timestamp}'
 ```
 ### Rule 2: Use Concatenation for Trivial Joins
 ```python
 url = base_url + endpoint  # Fine - simpler and faster
 ```
 ### Rule 3: Use join() for Multiple Parts
 ```python
 # Correct - O(n) time
 result = ''.join([part1, part2, part3, part4])
 # Inefficient - O(n^2) time
 result = part1 + part2 + part3 + part4
 ```
 ### Rule 4: Keep % for Logging
 ```python
 logger.info('Processed %d records in %.2fs', count, elapsed)
 ```
 ---
 ## Summary
 | Scenario | Best Choice | Reason |
 |----------|-------------|--------|
 | Variable interpolation | f-string | 1.6x faster than `.format()` |
 | Simple 2-part join | Concatenation | 1.7x faster than f-string |
 | Building from many parts | `''.join()` | O(n) vs O(n^2) |
 | Logging statements | `%` style | Deferred evaluation |
 | Dynamic templates | `.format()` | Template flexibility |
 ---
 *Benchmark source: python-numbers-everyone-should-know*