feat: Add 8 domain papers and RULEBOOK.md

Domain papers distilled from python-numbers-everyone-should-know: - async-overhead: 1,400x sync vs async overhead - collection-membership: 200x set vs list at 1000 items - json-serialization: 8x orjson vs stdlib - exception-flow: 6.5x exception overhead (try/except free) - string-formatting: f-strings > % > .format() - memory-slots: 69% memory reduction with __slots__ - import-optimization: 100ms+ for heavy packages - database-patterns: 98% commit overhead in SQLite RULEBOOK.md: ~200 token distillation for coding subagents 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 14:31:40 +01:00
parent 4def3b46c2
commit 7efd1368d1
9 changed files with 909 additions and 0 deletions
--- a/papers/async-overhead.md
+++ b/papers/async-overhead.md
@@ -0,0 +1,126 @@
+# Async Overhead in Python: When the Cure is Worse Than the Disease
+
+**Domain Paper: Python Performance ADRs**
+**Date:** 2026-01-03
+**Source:** Python Numbers Everyone Should Know benchmarks (Python 3.14.2, Apple Silicon)
+
+---
+
+## Executive Summary
+
+Async Python introduces a **1,400x overhead** for simple operations compared to synchronous equivalents. This overhead is fixed regardless of what work the function does. The critical insight: async only makes sense when you're waiting on I/O that takes orders of magnitude longer than this overhead.
+
+**The Core Numbers:**
+- Sync function call: **20.3 ns**
+- Async equivalent via `run_until_complete`: **28.2 us** (28,200 ns)
+- **Ratio: 1,387x slower** (approximately 1,400x)
+
+---
+
+## What Was Benchmarked
+
+### Methodology
+
+The benchmarks measured pure async machinery overhead using CPython 3.14.2 on Apple Silicon. Each operation was run thousands of times with warmup periods, reporting median values.
+
+### Test Functions
+
+```python
+# The async function being tested
+async def return_value_coro():
+    return 42
+
+# The sync equivalent
+def sync_function():
+    return 42
+```
+
+---
+
+## Key Findings
+
+### Coroutine Creation (Cheap)
+
+| Operation | Time |
+|-----------|------|
+| Create coroutine object | 47.0 ns |
+
+**Key insight:** Creating a coroutine object is cheap (47 ns). The cost comes when you actually run it.
+
+### Running Coroutines (Expensive)
+
+| Operation | Time |
+|-----------|------|
+| `run_until_complete(empty)` | 27.6 us |
+| `run_until_complete(return value)` | 26.6 us |
+| Run nested await | 28.9 us |
+
+**Key insight:** Every `run_until_complete` costs ~27 us regardless of coroutine complexity.
+
+### The Critical Comparison
+
+| Operation | Time | Ratio |
+|-----------|------|-------|
+| Sync function call | 20.3 ns | 1x |
+| Async equivalent | 28.2 us | **1,387x** |
+
+---
+
+## When Async IS Appropriate
+
+### Good Use Cases
+
+1. **Web servers handling concurrent connections** - FastAPI/Starlette: 115-125k req/sec
+2. **Concurrent network I/O** - Fetching data from multiple APIs simultaneously
+3. **High-latency operations with parallelism** - `asyncio.gather()` for multiple slow API calls
+
+### Bad Use Cases
+
+1. **Wrapping synchronous database drivers** - Use native async drivers or stay sync
+2. **CPU-bound computation** - Async doesn't parallelize CPU work (GIL)
+3. **Simple scripts with sequential operations** - CLI tools, data processing pipelines
+
+---
+
+## Practical Rules for Coding Agents
+
+### Rule 1: Default to Sync
+Write synchronous code unless you have a specific, measurable need for async.
+
+### Rule 2: The 1ms Threshold
+Only consider async when individual I/O operations take **>1 millisecond**.
+
+### Rule 3: Batch Over Broadcast
+If you need async, gather operations together:
+
+```python
+# Good: 27 us overhead ONCE
+results = await asyncio.gather(*[fetch(url) for url in urls])
+
+# Bad: 27 us overhead PER call
+for url in urls:
+    result = await fetch(url)
+```
+
+### Rule 4: Stay in the Loop
+Avoid `run_until_complete` inside an already-running loop.
+
+### Rule 5: Match Your I/O Library
+Use async libraries for async code, sync libraries for sync code.
+
+---
+
+## Summary Table
+
+| Scenario | Recommendation | Reasoning |
+|----------|----------------|-----------|
+| Simple function returning data | Sync | Async adds 1,400x overhead |
+| In-memory operations | Sync | No I/O to wait on |
+| Single database query | Sync | Query time < async amortization |
+| Multiple independent API calls | Async + gather | Parallelism benefit outweighs overhead |
+| Web server (many connections) | Async framework | Concurrent handling essential |
+| CLI tool | Sync | Sequential operations, no benefit |
+
+---
+
+*Benchmark source: python-numbers-everyone-should-know (2026-01-01, Python 3.14.2, Apple Silicon)*