feat: Add 8 domain papers and RULEBOOK.md
Domain papers distilled from python-numbers-everyone-should-know: - async-overhead: 1,400x sync vs async overhead - collection-membership: 200x set vs list at 1000 items - json-serialization: 8x orjson vs stdlib - exception-flow: 6.5x exception overhead (try/except free) - string-formatting: f-strings > % > .format() - memory-slots: 69% memory reduction with __slots__ - import-optimization: 100ms+ for heavy packages - database-patterns: 98% commit overhead in SQLite RULEBOOK.md: ~200 token distillation for coding subagents 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
32
RULEBOOK.md
Normal file
32
RULEBOOK.md
Normal file
@@ -0,0 +1,32 @@
|
|||||||
|
# RULEBOOK: Python Performance
|
||||||
|
|
||||||
|
**Load into coding subagents. ~200 tokens. See `/papers/*.md` for deep dives.**
|
||||||
|
|
||||||
|
## The Numbers
|
||||||
|
- Async overhead: **1,400x** (sync 20ns, async 28us)
|
||||||
|
- Set vs list membership: **200x** at 1000 items
|
||||||
|
- orjson vs stdlib json: **8x** faster
|
||||||
|
- Exception raise: **6.5x** (try/except is free)
|
||||||
|
- SQLite commit: **98%** of write latency
|
||||||
|
|
||||||
|
## Rules
|
||||||
|
|
||||||
|
**Collections**: Default to `set` for membership. Use `dict.get()` not check-then-access.
|
||||||
|
|
||||||
|
**Async**: Default sync. Only async for I/O >1ms. Use `gather()` not loops.
|
||||||
|
|
||||||
|
**JSON**: Default `orjson`. stdlib only for zero-deps requirement.
|
||||||
|
|
||||||
|
**Exceptions**: try/except is free. Use `.get()` for dicts. EAFP for <15% failure rate.
|
||||||
|
|
||||||
|
**Strings**: f-strings default. `%` for logging (deferred eval). `join()` for many parts.
|
||||||
|
|
||||||
|
**Memory**: `@dataclass(slots=True)` for 100+ instances.
|
||||||
|
|
||||||
|
**Imports**: `TYPE_CHECKING` for heavy types. Lazy imports in CLI tools.
|
||||||
|
|
||||||
|
**Database**: DiskCache for key-value. Batch SQLite writes. Reads cheap, writes expensive.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Source: python-numbers-everyone-should-know (Python 3.14.2, Apple Silicon)*
|
||||||
126
papers/async-overhead.md
Normal file
126
papers/async-overhead.md
Normal file
@@ -0,0 +1,126 @@
|
|||||||
|
# Async Overhead in Python: When the Cure is Worse Than the Disease
|
||||||
|
|
||||||
|
**Domain Paper: Python Performance ADRs**
|
||||||
|
**Date:** 2026-01-03
|
||||||
|
**Source:** Python Numbers Everyone Should Know benchmarks (Python 3.14.2, Apple Silicon)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Executive Summary
|
||||||
|
|
||||||
|
Async Python introduces a **1,400x overhead** for simple operations compared to synchronous equivalents. This overhead is fixed regardless of what work the function does. The critical insight: async only makes sense when you're waiting on I/O that takes orders of magnitude longer than this overhead.
|
||||||
|
|
||||||
|
**The Core Numbers:**
|
||||||
|
- Sync function call: **20.3 ns**
|
||||||
|
- Async equivalent via `run_until_complete`: **28.2 us** (28,200 ns)
|
||||||
|
- **Ratio: 1,387x slower** (approximately 1,400x)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What Was Benchmarked
|
||||||
|
|
||||||
|
### Methodology
|
||||||
|
|
||||||
|
The benchmarks measured pure async machinery overhead using CPython 3.14.2 on Apple Silicon. Each operation was run thousands of times with warmup periods, reporting median values.
|
||||||
|
|
||||||
|
### Test Functions
|
||||||
|
|
||||||
|
```python
|
||||||
|
# The async function being tested
|
||||||
|
async def return_value_coro():
|
||||||
|
return 42
|
||||||
|
|
||||||
|
# The sync equivalent
|
||||||
|
def sync_function():
|
||||||
|
return 42
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key Findings
|
||||||
|
|
||||||
|
### Coroutine Creation (Cheap)
|
||||||
|
|
||||||
|
| Operation | Time |
|
||||||
|
|-----------|------|
|
||||||
|
| Create coroutine object | 47.0 ns |
|
||||||
|
|
||||||
|
**Key insight:** Creating a coroutine object is cheap (47 ns). The cost comes when you actually run it.
|
||||||
|
|
||||||
|
### Running Coroutines (Expensive)
|
||||||
|
|
||||||
|
| Operation | Time |
|
||||||
|
|-----------|------|
|
||||||
|
| `run_until_complete(empty)` | 27.6 us |
|
||||||
|
| `run_until_complete(return value)` | 26.6 us |
|
||||||
|
| Run nested await | 28.9 us |
|
||||||
|
|
||||||
|
**Key insight:** Every `run_until_complete` costs ~27 us regardless of coroutine complexity.
|
||||||
|
|
||||||
|
### The Critical Comparison
|
||||||
|
|
||||||
|
| Operation | Time | Ratio |
|
||||||
|
|-----------|------|-------|
|
||||||
|
| Sync function call | 20.3 ns | 1x |
|
||||||
|
| Async equivalent | 28.2 us | **1,387x** |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## When Async IS Appropriate
|
||||||
|
|
||||||
|
### Good Use Cases
|
||||||
|
|
||||||
|
1. **Web servers handling concurrent connections** - FastAPI/Starlette: 115-125k req/sec
|
||||||
|
2. **Concurrent network I/O** - Fetching data from multiple APIs simultaneously
|
||||||
|
3. **High-latency operations with parallelism** - `asyncio.gather()` for multiple slow API calls
|
||||||
|
|
||||||
|
### Bad Use Cases
|
||||||
|
|
||||||
|
1. **Wrapping synchronous database drivers** - Use native async drivers or stay sync
|
||||||
|
2. **CPU-bound computation** - Async doesn't parallelize CPU work (GIL)
|
||||||
|
3. **Simple scripts with sequential operations** - CLI tools, data processing pipelines
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Practical Rules for Coding Agents
|
||||||
|
|
||||||
|
### Rule 1: Default to Sync
|
||||||
|
Write synchronous code unless you have a specific, measurable need for async.
|
||||||
|
|
||||||
|
### Rule 2: The 1ms Threshold
|
||||||
|
Only consider async when individual I/O operations take **>1 millisecond**.
|
||||||
|
|
||||||
|
### Rule 3: Batch Over Broadcast
|
||||||
|
If you need async, gather operations together:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Good: 27 us overhead ONCE
|
||||||
|
results = await asyncio.gather(*[fetch(url) for url in urls])
|
||||||
|
|
||||||
|
# Bad: 27 us overhead PER call
|
||||||
|
for url in urls:
|
||||||
|
result = await fetch(url)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Rule 4: Stay in the Loop
|
||||||
|
Avoid `run_until_complete` inside an already-running loop.
|
||||||
|
|
||||||
|
### Rule 5: Match Your I/O Library
|
||||||
|
Use async libraries for async code, sync libraries for sync code.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary Table
|
||||||
|
|
||||||
|
| Scenario | Recommendation | Reasoning |
|
||||||
|
|----------|----------------|-----------|
|
||||||
|
| Simple function returning data | Sync | Async adds 1,400x overhead |
|
||||||
|
| In-memory operations | Sync | No I/O to wait on |
|
||||||
|
| Single database query | Sync | Query time < async amortization |
|
||||||
|
| Multiple independent API calls | Async + gather | Parallelism benefit outweighs overhead |
|
||||||
|
| Web server (many connections) | Async framework | Concurrent handling essential |
|
||||||
|
| CLI tool | Sync | Sequential operations, no benefit |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Benchmark source: python-numbers-everyone-should-know (2026-01-01, Python 3.14.2, Apple Silicon)*
|
||||||
109
papers/collection-membership.md
Normal file
109
papers/collection-membership.md
Normal file
@@ -0,0 +1,109 @@
|
|||||||
|
# Collection Membership: The 200x Performance Cliff
|
||||||
|
|
||||||
|
**Domain Paper: Python Collection Selection for Membership Testing**
|
||||||
|
**Date:** 2026-01-03
|
||||||
|
**Source:** Python Numbers Everyone Should Know benchmarks
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Executive Summary
|
||||||
|
|
||||||
|
Membership testing (`x in collection`) is one of the most common operations in Python code. The choice of collection type can result in a **200x performance difference** at just 1,000 items.
|
||||||
|
|
||||||
|
**Key Finding**: At 1,000 items, checking if an item exists in a list takes 3.9 microseconds. The same check in a set takes 19 nanoseconds. That is a 206x difference.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## The Core Numbers
|
||||||
|
|
||||||
|
### Membership Testing Performance
|
||||||
|
|
||||||
|
| Operation | Time | Throughput |
|
||||||
|
|-----------|------|------------|
|
||||||
|
| `item in set` (existing) | 19.0 ns | 52.7M ops/sec |
|
||||||
|
| `key in dict` (existing) | 20.8 ns | 48.1M ops/sec |
|
||||||
|
| `item in list` (first) | 13.9 ns | 72.0M ops/sec |
|
||||||
|
| `item in list` (middle, 500th) | 1,956 ns | 511k ops/sec |
|
||||||
|
| `item in list` (last, 999th) | 3,852 ns | 260k ops/sec |
|
||||||
|
| `item in list` (missing) | 3,915 ns | 255k ops/sec |
|
||||||
|
|
||||||
|
### The 200x Cliff Explained
|
||||||
|
|
||||||
|
```
|
||||||
|
Set membership (any position): ~19 ns O(1)
|
||||||
|
List membership (worst case): ~3,915 ns O(n)
|
||||||
|
________
|
||||||
|
206x slower
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Crossover Analysis
|
||||||
|
|
||||||
|
**Crossover point**: A list with ~15-20 items will match set performance for a full scan. Below that, lists may actually be faster due to lower overhead.
|
||||||
|
|
||||||
|
### When to Use Each Type
|
||||||
|
|
||||||
|
**Use Set when:**
|
||||||
|
- Collection has more than ~20 items
|
||||||
|
- Checking membership more than once
|
||||||
|
- Order does not matter
|
||||||
|
|
||||||
|
**Use Dict when:**
|
||||||
|
- You need to associate values with keys
|
||||||
|
- Checking membership AND need to retrieve associated data
|
||||||
|
|
||||||
|
**Use List when:**
|
||||||
|
- Collection is very small (< 20 items)
|
||||||
|
- You iterate but rarely check membership
|
||||||
|
- Items might not be hashable
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Practical Rules for Coding Agents
|
||||||
|
|
||||||
|
### Rule 1: Default to Set for Membership
|
||||||
|
|
||||||
|
```python
|
||||||
|
# PREFER
|
||||||
|
allowed_values = {'a', 'b', 'c'}
|
||||||
|
if value in allowed_values:
|
||||||
|
|
||||||
|
# AVOID
|
||||||
|
allowed_values = ['a', 'b', 'c']
|
||||||
|
if value in allowed_values:
|
||||||
|
```
|
||||||
|
|
||||||
|
### Rule 2: Convert Lists Before Repeated Lookups
|
||||||
|
|
||||||
|
```python
|
||||||
|
def process_items(items: list, valid_ids: list):
|
||||||
|
valid_set = set(valid_ids) # Convert once
|
||||||
|
return [item for item in items if item.id in valid_set]
|
||||||
|
```
|
||||||
|
|
||||||
|
### Rule 3: Prefer `dict.get()` Over Check-then-Access
|
||||||
|
|
||||||
|
```python
|
||||||
|
# AVOID (double lookup)
|
||||||
|
if key in config:
|
||||||
|
value = config[key]
|
||||||
|
|
||||||
|
# PREFER (single lookup)
|
||||||
|
value = config.get(key, default)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary Table
|
||||||
|
|
||||||
|
| Scenario | Best Choice | Why |
|
||||||
|
|----------|-------------|-----|
|
||||||
|
| Membership test on 1000+ items | Set | 200x faster than list |
|
||||||
|
| Key-value lookup | Dict | O(1) access with associated data |
|
||||||
|
| Ordered collection, rare membership | List | Lower memory, maintains order |
|
||||||
|
| Very small collection (< 20 items) | List or Set | Negligible difference |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Benchmark source: python-numbers-everyone-should-know*
|
||||||
121
papers/database-patterns.md
Normal file
121
papers/database-patterns.md
Normal file
@@ -0,0 +1,121 @@
|
|||||||
|
# Database Patterns: SQLite, DiskCache, and MongoDB
|
||||||
|
|
||||||
|
**Domain:** Persistence and data access patterns in Python
|
||||||
|
**Source:** python-numbers-everyone-should-know benchmarks
|
||||||
|
**Date:** 2026-01-03
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Executive Summary
|
||||||
|
|
||||||
|
**Reads are cheap, writes are expensive.** SQLite commits dominate write latency (192 microseconds with commit vs 3 microseconds without). For read-heavy workloads, SQLite achieves 280K ops/sec by primary key. For write-heavy workloads, consider diskcache (8x faster writes) or batch operations.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key Findings
|
||||||
|
|
||||||
|
### The Numbers
|
||||||
|
|
||||||
|
| Operation | SQLite | DiskCache | MongoDB |
|
||||||
|
|-----------|--------|-----------|---------|
|
||||||
|
| **Write one object** | 192 us (5.2k/s) | 24 us (42k/s) | 119 us (8.4k/s) |
|
||||||
|
| **Read by key/id** | 3.6 us (280k/s) | 4.3 us (236k/s) | 121 us (8.2k/s) |
|
||||||
|
|
||||||
|
### Finding 1: The Commit Tax
|
||||||
|
|
||||||
|
SQLite writes with commit: **192 microseconds**
|
||||||
|
SQLite writes without commit: **3 microseconds**
|
||||||
|
|
||||||
|
The commit operation accounts for **98.4% of write latency**.
|
||||||
|
|
||||||
|
### Finding 2: DiskCache Wins for Simple Key-Value
|
||||||
|
|
||||||
|
| Operation | SQLite Raw | DiskCache |
|
||||||
|
|-----------|------------|-----------|
|
||||||
|
| Write | 192 us | 24 us |
|
||||||
|
| Read | 3.6 us | 4.3 us |
|
||||||
|
|
||||||
|
DiskCache achieves **8x faster writes** with comparable read performance.
|
||||||
|
|
||||||
|
### Finding 3: Batching Provides 9x Throughput
|
||||||
|
|
||||||
|
SQLite `executemany()` 10 rows: **215 microseconds total** (21.5 us/row)
|
||||||
|
10 individual inserts: **1,920 microseconds** (192 us/row)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## When to Use Each Storage Option
|
||||||
|
|
||||||
|
### SQLite
|
||||||
|
- Read-heavy workloads (100:1 read/write ratio)
|
||||||
|
- Need to query inside JSON with `json_extract()`
|
||||||
|
- ACID guarantees matter
|
||||||
|
|
||||||
|
### DiskCache
|
||||||
|
- Key-value storage with automatic serialization
|
||||||
|
- Cache patterns (TTL, LRU eviction)
|
||||||
|
- Agent state persistence
|
||||||
|
|
||||||
|
### MongoDB
|
||||||
|
- Distributed, multi-node deployments
|
||||||
|
- Complex aggregation pipelines
|
||||||
|
- Full-text search requirements
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Practical Rules for Coding Agents
|
||||||
|
|
||||||
|
### Rule 1: Default to DiskCache for Agent State
|
||||||
|
```python
|
||||||
|
from diskcache import Cache
|
||||||
|
cache = Cache('/tmp/agent-cache')
|
||||||
|
cache.set('conversation:123', messages) # 24 microseconds
|
||||||
|
```
|
||||||
|
|
||||||
|
### Rule 2: Batch SQLite Writes
|
||||||
|
```python
|
||||||
|
# GOOD: Batch with executemany
|
||||||
|
conn.executemany('INSERT INTO items (data) VALUES (?)', items_list)
|
||||||
|
conn.commit() # One commit for all items
|
||||||
|
```
|
||||||
|
|
||||||
|
### Rule 3: Use Transactions for Multi-Step Operations
|
||||||
|
```python
|
||||||
|
conn.execute('BEGIN')
|
||||||
|
conn.execute('INSERT INTO users ...')
|
||||||
|
conn.execute('INSERT INTO audit_log ...')
|
||||||
|
conn.commit() # One fsync
|
||||||
|
```
|
||||||
|
|
||||||
|
### Rule 4: Lazy Import for CLI Tools
|
||||||
|
```python
|
||||||
|
def save_to_db(data):
|
||||||
|
import sqlite3 # 1.63ms only when needed
|
||||||
|
conn = sqlite3.connect('app.db')
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
| Metric | SQLite | DiskCache | MongoDB |
|
||||||
|
|--------|--------|-----------|---------|
|
||||||
|
| Write latency | 192 us | 24 us | 119 us |
|
||||||
|
| Read latency | 3.6 us | 4.3 us | 121 us |
|
||||||
|
| Writes/sec | 5.2k | 42k | 8.4k |
|
||||||
|
| Reads/sec | 280k | 236k | 8.2k |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## The Bottom Line
|
||||||
|
|
||||||
|
1. **Reads are cheap everywhere** - Optimize for write patterns
|
||||||
|
2. **SQLite commits dominate latency** - Batch or use transactions
|
||||||
|
3. **DiskCache for key-value** - 8x faster writes, automatic serialization
|
||||||
|
4. **MongoDB for distribution** - Not for local performance
|
||||||
|
|
||||||
|
*The tortoise way: Measure, understand the cost, choose deliberately.*
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Benchmark source: python-numbers-everyone-should-know*
|
||||||
111
papers/exception-flow.md
Normal file
111
papers/exception-flow.md
Normal file
@@ -0,0 +1,111 @@
|
|||||||
|
# Exception Flow: Performance Patterns
|
||||||
|
|
||||||
|
**Domain:** Exception handling overhead
|
||||||
|
**Source:** python-numbers-everyone-should-know benchmarks (Python 3.14.2, Apple Silicon)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## TL;DR
|
||||||
|
|
||||||
|
- **try/except with no exception**: Nearly free (1.1 ns overhead)
|
||||||
|
- **Raising an exception**: 6.5x slower than the happy path (139 ns vs 21.5 ns)
|
||||||
|
- **EAFP is fine when exceptions are rare** (<5% of calls)
|
||||||
|
- **Use LBYL for expected failures** (dict key lookup, file existence)
|
||||||
|
- **Never use exceptions for normal control flow**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## The Numbers
|
||||||
|
|
||||||
|
### Happy Path (No Exception Raised)
|
||||||
|
|
||||||
|
| Operation | Time | Overhead vs Baseline |
|
||||||
|
|-----------|------|---------------------|
|
||||||
|
| Function call (no try/except) | 20.4 ns | baseline |
|
||||||
|
| try/except (no exception raised) | 21.5 ns | +1.1 ns (+5%) |
|
||||||
|
| try/except ValueError (specific) | 22.9 ns | +2.5 ns (+12%) |
|
||||||
|
| try/except/finally | 22.1 ns | +1.7 ns (+8%) |
|
||||||
|
|
||||||
|
**Key insight:** The try block itself is essentially free.
|
||||||
|
|
||||||
|
### Sad Path (Exception Raised)
|
||||||
|
|
||||||
|
| Operation | Time | Slowdown vs Happy Path |
|
||||||
|
|-----------|------|----------------------|
|
||||||
|
| raise + catch ValueError | 139 ns | **6.5x slower** |
|
||||||
|
| raise + catch (base Exception) | 140 ns | 6.5x slower |
|
||||||
|
| raise + catch custom exception | 146 ns | 6.8x slower |
|
||||||
|
| raise + catch with `as e` | 148 ns | 6.9x slower |
|
||||||
|
|
||||||
|
**Key insight:** The 6.5x overhead comes from:
|
||||||
|
1. Creating the exception object (~40 ns)
|
||||||
|
2. Capturing the traceback (~70 ns)
|
||||||
|
3. Stack unwinding and handler lookup (~30 ns)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## EAFP vs LBYL: When to Use Which
|
||||||
|
|
||||||
|
### EAFP (Easier to Ask Forgiveness than Permission)
|
||||||
|
|
||||||
|
```python
|
||||||
|
try:
|
||||||
|
value = data[key]
|
||||||
|
except KeyError:
|
||||||
|
value = default
|
||||||
|
```
|
||||||
|
|
||||||
|
**Use when:** Exceptions are rare (<5% of calls)
|
||||||
|
|
||||||
|
### LBYL (Look Before You Leap)
|
||||||
|
|
||||||
|
```python
|
||||||
|
if key in data:
|
||||||
|
value = data[key]
|
||||||
|
else:
|
||||||
|
value = default
|
||||||
|
```
|
||||||
|
|
||||||
|
**Use when:** The failure case is common (>15% of calls)
|
||||||
|
|
||||||
|
### Crossover Point
|
||||||
|
|
||||||
|
**Rule of thumb:** If exceptions occur more than 15% of the time, use LBYL.
|
||||||
|
|
||||||
|
### dict.get() Beats Both
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Best: Use .get() - 26.3 ns, no exception possible
|
||||||
|
config = settings.get('database', {})
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Practical Rules for Coding Agents
|
||||||
|
|
||||||
|
1. **try/except blocks are free** - don't avoid them for performance
|
||||||
|
2. **Raising exceptions costs 6.5x** - only raise for truly exceptional cases
|
||||||
|
3. **Use .get() for dicts** - beats both EAFP and LBYL
|
||||||
|
4. **Return Optional for expected missing** - not exceptions
|
||||||
|
5. **EAFP for file ops** - TOCTOU protection matters more than perf
|
||||||
|
6. **LBYL when failures are common** (>15% of calls)
|
||||||
|
7. **Never use exceptions for control flow**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
| Scenario | Recommendation |
|
||||||
|
|----------|----------------|
|
||||||
|
| Exception rate <5% | EAFP (try/except) |
|
||||||
|
| Exception rate >15% | LBYL (check first) |
|
||||||
|
| Dict key lookup | Use `.get()` |
|
||||||
|
| Optional return value | Return `None`, not exception |
|
||||||
|
| File operations | EAFP (TOCTOU protection) |
|
||||||
|
| Control flow | Never use exceptions |
|
||||||
|
|
||||||
|
**The core insight:** try/except is free; raising is not. Design APIs to minimize raises, not to avoid try blocks.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Benchmark source: python-numbers-everyone-should-know*
|
||||||
104
papers/import-optimization.md
Normal file
104
papers/import-optimization.md
Normal file
@@ -0,0 +1,104 @@
|
|||||||
|
# Import Optimization
|
||||||
|
|
||||||
|
**Domain Paper: Python Performance ADRs**
|
||||||
|
**Date:** 2026-01-03
|
||||||
|
**Source:** python-numbers-everyone-should-know benchmarks
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Executive Summary
|
||||||
|
|
||||||
|
Import costs range from **sub-microsecond (cached) to 100+ milliseconds** (large frameworks). For CLI tools and short-lived scripts, import time can dominate total execution.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Benchmark Data (First Import, Fresh Process)
|
||||||
|
|
||||||
|
### Built-in Modules
|
||||||
|
|
||||||
|
| Module | First Import |
|
||||||
|
|--------|-------------|
|
||||||
|
| `sys` | 0.2 us |
|
||||||
|
| `os` | 0.2 us |
|
||||||
|
| `math` | 24 us |
|
||||||
|
|
||||||
|
### Standard Library
|
||||||
|
|
||||||
|
| Module | First Import |
|
||||||
|
|--------|-------------|
|
||||||
|
| `datetime` | 72 us |
|
||||||
|
| `typing` | 2.0 ms |
|
||||||
|
| `json` | 2.9 ms |
|
||||||
|
| `dataclasses` | 6.0 ms |
|
||||||
|
| `logging` | 10.5 ms |
|
||||||
|
| `asyncio` | 17.7 ms |
|
||||||
|
|
||||||
|
### External Packages
|
||||||
|
|
||||||
|
| Package | First Import |
|
||||||
|
|---------|-------------|
|
||||||
|
| `pydantic` | 15.8 ms |
|
||||||
|
| `flask` | 47.3 ms |
|
||||||
|
| `fastapi` | 104.4 ms |
|
||||||
|
|
||||||
|
**Key insight:** FastAPI takes 100ms just to import. For a CLI tool that runs in 50ms, this is unacceptable overhead.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Lazy Import Patterns
|
||||||
|
|
||||||
|
### Pattern 1: Function-Level Import
|
||||||
|
```python
|
||||||
|
def process_data(data):
|
||||||
|
import pandas as pd # Only when needed
|
||||||
|
return pd.DataFrame(data)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Pattern 2: TYPE_CHECKING Guard
|
||||||
|
```python
|
||||||
|
from typing import TYPE_CHECKING
|
||||||
|
|
||||||
|
if TYPE_CHECKING:
|
||||||
|
import pandas as pd
|
||||||
|
|
||||||
|
def process(data: "pd.DataFrame"):
|
||||||
|
import pandas as pd
|
||||||
|
return pd.DataFrame(data)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Practical Rules for Coding Agents
|
||||||
|
|
||||||
|
### MUST
|
||||||
|
|
||||||
|
1. **Use `TYPE_CHECKING` for type-only imports** when the type is from a heavy package
|
||||||
|
2. **Use function-level imports for rarely-used code paths**
|
||||||
|
3. **Never import heavy packages at module level in CLI tools**
|
||||||
|
|
||||||
|
### SHOULD
|
||||||
|
|
||||||
|
4. **Use `from __future__ import annotations`** for cleaner TYPE_CHECKING
|
||||||
|
5. **Profile import time for new dependencies:**
|
||||||
|
```bash
|
||||||
|
python -c "import time; s=time.perf_counter(); import PACKAGE; print(f'{(time.perf_counter()-s)*1000:.1f}ms')"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary Table
|
||||||
|
|
||||||
|
| Scenario | Pattern | Example |
|
||||||
|
|----------|---------|---------|
|
||||||
|
| Type hints for heavy types | `TYPE_CHECKING` | pandas, numpy types |
|
||||||
|
| Rarely-used function | Function-level import | Error handling paths |
|
||||||
|
| CLI fast path | Defer until needed | `--version`, `--help` |
|
||||||
|
| Serverless cold start | Minimize top-level | Lambda/Cloud Functions |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Import costs are hidden taxes. Pay them lazily.*
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Benchmark source: python-numbers-everyone-should-know*
|
||||||
93
papers/json-serialization.md
Normal file
93
papers/json-serialization.md
Normal file
@@ -0,0 +1,93 @@
|
|||||||
|
# JSON Serialization Performance in Python
|
||||||
|
|
||||||
|
**Domain Paper: Python Performance ADRs**
|
||||||
|
**Date:** 2026-01-03
|
||||||
|
**Source:** Python Numbers Everyone Should Know benchmarks
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Executive Summary
|
||||||
|
|
||||||
|
Alternative JSON libraries like `orjson` and `msgspec` deliver **8-12x faster serialization** and **2-7x faster deserialization** compared to stdlib `json`. The performance gap is consistent across payload sizes.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key Findings
|
||||||
|
|
||||||
|
### Serialization Performance (dumps)
|
||||||
|
|
||||||
|
| Library | Simple Object | Complex Object | Speedup vs stdlib |
|
||||||
|
|---------|--------------|----------------|-------------------|
|
||||||
|
| `json.dumps()` | 708 ns | 2.65 us | 1x (baseline) |
|
||||||
|
| `orjson.dumps()` | 61 ns | 310 ns | **11.6x / 8.5x** |
|
||||||
|
| `msgspec.encode()` | 92 ns | 445 ns | 7.7x / 6.0x |
|
||||||
|
| `ujson.dumps()` | 264 ns | 1.64 us | 2.7x / 1.6x |
|
||||||
|
|
||||||
|
### Deserialization Performance (loads)
|
||||||
|
|
||||||
|
| Library | Simple Object | Complex Object | Speedup vs stdlib |
|
||||||
|
|---------|--------------|----------------|-------------------|
|
||||||
|
| `json.loads()` | 714 ns | 2.22 us | 1x (baseline) |
|
||||||
|
| `orjson.loads()` | 106 ns | 839 ns | **6.7x / 2.6x** |
|
||||||
|
| `msgspec.decode()` | 101 ns | 850 ns | 7.1x / 2.6x |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## When to Use Each Library
|
||||||
|
|
||||||
|
### Use stdlib `json` when:
|
||||||
|
- Zero dependencies required
|
||||||
|
- Need custom JSONEncoder subclass
|
||||||
|
- Compatibility is paramount
|
||||||
|
|
||||||
|
### Use `orjson` when:
|
||||||
|
- Maximum performance needed
|
||||||
|
- You can accept bytes output
|
||||||
|
- You need datetime/UUID support
|
||||||
|
|
||||||
|
### Use `msgspec` when:
|
||||||
|
- You need typed decoding
|
||||||
|
- You want MessagePack too
|
||||||
|
- Memory efficiency matters
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Practical Rules for Coding Agents
|
||||||
|
|
||||||
|
### Rule 1: Default to orjson for new projects
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Instead of:
|
||||||
|
import json
|
||||||
|
data = json.dumps(obj)
|
||||||
|
|
||||||
|
# Prefer:
|
||||||
|
import orjson
|
||||||
|
data = orjson.dumps(obj) # Returns bytes
|
||||||
|
```
|
||||||
|
|
||||||
|
### Rule 2: Use stdlib json only when explicitly needed
|
||||||
|
|
||||||
|
Acceptable reasons:
|
||||||
|
- Must avoid external dependencies
|
||||||
|
- Need custom JSONEncoder subclass
|
||||||
|
- Working in constrained environment
|
||||||
|
|
||||||
|
### Rule 3: Profile before optimizing JSON
|
||||||
|
|
||||||
|
At 2-3 microseconds per operation, JSON serialization is rarely the bottleneck unless you're doing thousands of operations per second.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary Table
|
||||||
|
|
||||||
|
| Scenario | Recommendation | Expected Speedup |
|
||||||
|
|----------|----------------|------------------|
|
||||||
|
| General use | orjson | 8x serialization, 2.5x deserialization |
|
||||||
|
| Typed data | msgspec | 6x + type safety |
|
||||||
|
| Drop-in replacement | ujson | 1.5-2x |
|
||||||
|
| Zero dependencies | json (stdlib) | Baseline |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Benchmark source: python-numbers-everyone-should-know*
|
||||||
102
papers/memory-slots.md
Normal file
102
papers/memory-slots.md
Normal file
@@ -0,0 +1,102 @@
|
|||||||
|
# Memory Optimization with __slots__ in Python
|
||||||
|
|
||||||
|
**Domain Paper: Python Performance ADRs**
|
||||||
|
**Date:** 2026-01-03
|
||||||
|
**Source:** python-numbers-everyone-should-know benchmarks
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Executive Summary
|
||||||
|
|
||||||
|
Python's `__slots__` mechanism provides **52-70% memory reduction** when creating many instances of the same class.
|
||||||
|
|
||||||
|
**Key Finding**: For a class with 5 attributes, `__slots__` reduces instance memory from 694 bytes to 212 bytes (69% reduction).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Benchmark Results: Memory Footprint
|
||||||
|
|
||||||
|
### Single Instance Memory (5 Attributes)
|
||||||
|
|
||||||
|
| Type | Memory (bytes) | vs Regular Class |
|
||||||
|
|------|----------------|------------------|
|
||||||
|
| Regular class | 694 | baseline |
|
||||||
|
| `__slots__` class | 212 | -69% |
|
||||||
|
| dataclass | 694 | same as regular |
|
||||||
|
| `@dataclass(slots=True)` | 212 | -69% |
|
||||||
|
| namedtuple | 228 | -67% |
|
||||||
|
|
||||||
|
### At Scale (1,000 Instances)
|
||||||
|
|
||||||
|
| Type | Total Memory |
|
||||||
|
|------|--------------|
|
||||||
|
| List of 1,000 regular class | 165.2 KB |
|
||||||
|
| List of 1,000 `__slots__` class | 79.1 KB |
|
||||||
|
|
||||||
|
**Memory Savings**: 52% reduction at scale
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Attribute Access Speed (Virtually Identical)
|
||||||
|
|
||||||
|
| Operation | Regular | `__slots__` |
|
||||||
|
|-----------|---------|-------------|
|
||||||
|
| Read attr | 14.1 ns | 14.1 ns |
|
||||||
|
| Write attr | 15.7 ns | 16.4 ns |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Trade-offs
|
||||||
|
|
||||||
|
### What __slots__ Prevents
|
||||||
|
|
||||||
|
1. **No dynamic attribute assignment**
|
||||||
|
2. **No `__dict__` access** (`vars()` doesn't work)
|
||||||
|
3. **Inheritance complications**
|
||||||
|
4. **No weak references by default**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Practical Rules for Coding Agents
|
||||||
|
|
||||||
|
### Rule 1: Instance Count Threshold
|
||||||
|
```
|
||||||
|
IF creating > 100 instances of the same class
|
||||||
|
AND attributes are fixed at design time
|
||||||
|
THEN consider __slots__ or @dataclass(slots=True)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Rule 2: Prefer Slots Dataclass (Python 3.10+)
|
||||||
|
```python
|
||||||
|
@dataclass(slots=True)
|
||||||
|
class User:
|
||||||
|
id: int
|
||||||
|
name: str
|
||||||
|
email: str
|
||||||
|
```
|
||||||
|
|
||||||
|
### Rule 3: Don't Optimize Prematurely
|
||||||
|
For < 100 instances, use regular classes for flexibility.
|
||||||
|
|
||||||
|
### Rule 4: Document the Trade-off
|
||||||
|
```python
|
||||||
|
# Using __slots__ for memory efficiency (1000+ instances expected)
|
||||||
|
__slots__ = ['x', 'y', 'z']
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
| Aspect | Regular Class | `__slots__` Class |
|
||||||
|
|--------|---------------|-------------------|
|
||||||
|
| Memory (5 attrs) | 694 bytes | 212 bytes |
|
||||||
|
| Read speed | 14.1 ns | 14.1 ns |
|
||||||
|
| Dynamic attributes | Yes | No |
|
||||||
|
| Best for | Flexibility | Many instances |
|
||||||
|
|
||||||
|
**Bottom Line**: Use `__slots__` (or `@dataclass(slots=True)`) when creating many instances of fixed-attribute classes. For small numbers, stick with regular classes.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Benchmark source: python-numbers-everyone-should-know*
|
||||||
111
papers/string-formatting.md
Normal file
111
papers/string-formatting.md
Normal file
@@ -0,0 +1,111 @@
|
|||||||
|
# String Formatting: Domain Exploration
|
||||||
|
|
||||||
|
**Date:** 2026-01-03
|
||||||
|
**Source:** python-numbers-everyone-should-know benchmarks
|
||||||
|
**Python Version:** 3.14.2 (CPython, ARM64 macOS)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Executive Summary
|
||||||
|
|
||||||
|
String formatting performance: **simple concatenation is fastest for trivial joins**, while **f-strings offer the best balance of readability and performance** for interpolation use cases.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Raw Benchmark Results
|
||||||
|
|
||||||
|
| Operation | Time (ns) | Throughput |
|
||||||
|
|-----------|-----------|------------|
|
||||||
|
| `concat_small` | 39.1 ns | 25.6M ops/sec |
|
||||||
|
| `f_string` | 64.9 ns | 15.4M ops/sec |
|
||||||
|
| `percent_formatting` | 89.8 ns | 11.1M ops/sec |
|
||||||
|
| `format_method` | 103 ns | 9.7M ops/sec |
|
||||||
|
|
||||||
|
### Relative Performance
|
||||||
|
|
||||||
|
| Method | vs f-string |
|
||||||
|
|--------|-------------|
|
||||||
|
| `concat_small` | 1.66x faster |
|
||||||
|
| `f_string` | 1.00x (reference) |
|
||||||
|
| `percent_formatting` | 0.72x slower |
|
||||||
|
| `format_method` | 0.63x slower |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Why F-Strings Are Fast
|
||||||
|
|
||||||
|
F-strings are parsed at **compile time**, not runtime:
|
||||||
|
|
||||||
|
1. **No method lookup**: F-strings don't call `.format()` at runtime
|
||||||
|
2. **No tuple creation**: `%` formatting requires `(name,)` tuple
|
||||||
|
3. **Specialized bytecode**: `FORMAT_VALUE` and `BUILD_STRING` are optimized
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## When to Use Each Method
|
||||||
|
|
||||||
|
### Concatenation Wins
|
||||||
|
For 2-3 literal strings with no formatting:
|
||||||
|
```python
|
||||||
|
path = base_dir + '/' + filename # Simpler, faster
|
||||||
|
```
|
||||||
|
|
||||||
|
### % Formatting for Logging
|
||||||
|
```python
|
||||||
|
# Deferred evaluation - string built only if debug enabled
|
||||||
|
logger.debug('Processing %s items', count)
|
||||||
|
|
||||||
|
# f-string - string ALWAYS built, then discarded
|
||||||
|
logger.debug(f'Processing {count} items') # Wasteful
|
||||||
|
```
|
||||||
|
|
||||||
|
### .format() for Dynamic Templates
|
||||||
|
```python
|
||||||
|
template = get_template_from_config() # Returns 'User: {name}'
|
||||||
|
result = template.format(name=user.name)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Practical Rules for Coding Agents
|
||||||
|
|
||||||
|
### Rule 1: Default to F-Strings
|
||||||
|
```python
|
||||||
|
# Preferred
|
||||||
|
message = f'User {user.name} logged in at {timestamp}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Rule 2: Use Concatenation for Trivial Joins
|
||||||
|
```python
|
||||||
|
url = base_url + endpoint # Fine - simpler and faster
|
||||||
|
```
|
||||||
|
|
||||||
|
### Rule 3: Use join() for Multiple Parts
|
||||||
|
```python
|
||||||
|
# Correct - O(n) time
|
||||||
|
result = ''.join([part1, part2, part3, part4])
|
||||||
|
|
||||||
|
# Inefficient - O(n^2) time
|
||||||
|
result = part1 + part2 + part3 + part4
|
||||||
|
```
|
||||||
|
|
||||||
|
### Rule 4: Keep % for Logging
|
||||||
|
```python
|
||||||
|
logger.info('Processed %d records in %.2fs', count, elapsed)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
| Scenario | Best Choice | Reason |
|
||||||
|
|----------|-------------|--------|
|
||||||
|
| Variable interpolation | f-string | 1.6x faster than `.format()` |
|
||||||
|
| Simple 2-part join | Concatenation | 1.7x faster than f-string |
|
||||||
|
| Building from many parts | `''.join()` | O(n) vs O(n^2) |
|
||||||
|
| Logging statements | `%` style | Deferred evaluation |
|
||||||
|
| Dynamic templates | `.format()` | Template flexibility |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Benchmark source: python-numbers-everyone-should-know*
|
||||||
Reference in New Issue
Block a user