feat: Add 8 domain papers and RULEBOOK.md
Domain papers distilled from python-numbers-everyone-should-know: - async-overhead: 1,400x sync vs async overhead - collection-membership: 200x set vs list at 1000 items - json-serialization: 8x orjson vs stdlib - exception-flow: 6.5x exception overhead (try/except free) - string-formatting: f-strings > % > .format() - memory-slots: 69% memory reduction with __slots__ - import-optimization: 100ms+ for heavy packages - database-patterns: 98% commit overhead in SQLite RULEBOOK.md: ~200 token distillation for coding subagents 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
32
RULEBOOK.md
Normal file
32
RULEBOOK.md
Normal file
@@ -0,0 +1,32 @@
|
||||
# RULEBOOK: Python Performance
|
||||
|
||||
**Load into coding subagents. ~200 tokens. See `/papers/*.md` for deep dives.**
|
||||
|
||||
## The Numbers
|
||||
- Async overhead: **1,400x** (sync 20ns, async 28us)
|
||||
- Set vs list membership: **200x** at 1000 items
|
||||
- orjson vs stdlib json: **8x** faster
|
||||
- Exception raise: **6.5x** (try/except is free)
|
||||
- SQLite commit: **98%** of write latency
|
||||
|
||||
## Rules
|
||||
|
||||
**Collections**: Default to `set` for membership. Use `dict.get()` not check-then-access.
|
||||
|
||||
**Async**: Default sync. Only async for I/O >1ms. Use `gather()` not loops.
|
||||
|
||||
**JSON**: Default `orjson`. stdlib only for zero-deps requirement.
|
||||
|
||||
**Exceptions**: try/except is free. Use `.get()` for dicts. EAFP for <15% failure rate.
|
||||
|
||||
**Strings**: f-strings default. `%` for logging (deferred eval). `join()` for many parts.
|
||||
|
||||
**Memory**: `@dataclass(slots=True)` for 100+ instances.
|
||||
|
||||
**Imports**: `TYPE_CHECKING` for heavy types. Lazy imports in CLI tools.
|
||||
|
||||
**Database**: DiskCache for key-value. Batch SQLite writes. Reads cheap, writes expensive.
|
||||
|
||||
---
|
||||
|
||||
*Source: python-numbers-everyone-should-know (Python 3.14.2, Apple Silicon)*
|
||||
126
papers/async-overhead.md
Normal file
126
papers/async-overhead.md
Normal file
@@ -0,0 +1,126 @@
|
||||
# Async Overhead in Python: When the Cure is Worse Than the Disease
|
||||
|
||||
**Domain Paper: Python Performance ADRs**
|
||||
**Date:** 2026-01-03
|
||||
**Source:** Python Numbers Everyone Should Know benchmarks (Python 3.14.2, Apple Silicon)
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Async Python introduces a **1,400x overhead** for simple operations compared to synchronous equivalents. This overhead is fixed regardless of what work the function does. The critical insight: async only makes sense when you're waiting on I/O that takes orders of magnitude longer than this overhead.
|
||||
|
||||
**The Core Numbers:**
|
||||
- Sync function call: **20.3 ns**
|
||||
- Async equivalent via `run_until_complete`: **28.2 us** (28,200 ns)
|
||||
- **Ratio: 1,387x slower** (approximately 1,400x)
|
||||
|
||||
---
|
||||
|
||||
## What Was Benchmarked
|
||||
|
||||
### Methodology
|
||||
|
||||
The benchmarks measured pure async machinery overhead using CPython 3.14.2 on Apple Silicon. Each operation was run thousands of times with warmup periods, reporting median values.
|
||||
|
||||
### Test Functions
|
||||
|
||||
```python
|
||||
# The async function being tested
|
||||
async def return_value_coro():
|
||||
return 42
|
||||
|
||||
# The sync equivalent
|
||||
def sync_function():
|
||||
return 42
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Findings
|
||||
|
||||
### Coroutine Creation (Cheap)
|
||||
|
||||
| Operation | Time |
|
||||
|-----------|------|
|
||||
| Create coroutine object | 47.0 ns |
|
||||
|
||||
**Key insight:** Creating a coroutine object is cheap (47 ns). The cost comes when you actually run it.
|
||||
|
||||
### Running Coroutines (Expensive)
|
||||
|
||||
| Operation | Time |
|
||||
|-----------|------|
|
||||
| `run_until_complete(empty)` | 27.6 us |
|
||||
| `run_until_complete(return value)` | 26.6 us |
|
||||
| Run nested await | 28.9 us |
|
||||
|
||||
**Key insight:** Every `run_until_complete` costs ~27 us regardless of coroutine complexity.
|
||||
|
||||
### The Critical Comparison
|
||||
|
||||
| Operation | Time | Ratio |
|
||||
|-----------|------|-------|
|
||||
| Sync function call | 20.3 ns | 1x |
|
||||
| Async equivalent | 28.2 us | **1,387x** |
|
||||
|
||||
---
|
||||
|
||||
## When Async IS Appropriate
|
||||
|
||||
### Good Use Cases
|
||||
|
||||
1. **Web servers handling concurrent connections** - FastAPI/Starlette: 115-125k req/sec
|
||||
2. **Concurrent network I/O** - Fetching data from multiple APIs simultaneously
|
||||
3. **High-latency operations with parallelism** - `asyncio.gather()` for multiple slow API calls
|
||||
|
||||
### Bad Use Cases
|
||||
|
||||
1. **Wrapping synchronous database drivers** - Use native async drivers or stay sync
|
||||
2. **CPU-bound computation** - Async doesn't parallelize CPU work (GIL)
|
||||
3. **Simple scripts with sequential operations** - CLI tools, data processing pipelines
|
||||
|
||||
---
|
||||
|
||||
## Practical Rules for Coding Agents
|
||||
|
||||
### Rule 1: Default to Sync
|
||||
Write synchronous code unless you have a specific, measurable need for async.
|
||||
|
||||
### Rule 2: The 1ms Threshold
|
||||
Only consider async when individual I/O operations take **>1 millisecond**.
|
||||
|
||||
### Rule 3: Batch Over Broadcast
|
||||
If you need async, gather operations together:
|
||||
|
||||
```python
|
||||
# Good: 27 us overhead ONCE
|
||||
results = await asyncio.gather(*[fetch(url) for url in urls])
|
||||
|
||||
# Bad: 27 us overhead PER call
|
||||
for url in urls:
|
||||
result = await fetch(url)
|
||||
```
|
||||
|
||||
### Rule 4: Stay in the Loop
|
||||
Avoid `run_until_complete` inside an already-running loop.
|
||||
|
||||
### Rule 5: Match Your I/O Library
|
||||
Use async libraries for async code, sync libraries for sync code.
|
||||
|
||||
---
|
||||
|
||||
## Summary Table
|
||||
|
||||
| Scenario | Recommendation | Reasoning |
|
||||
|----------|----------------|-----------|
|
||||
| Simple function returning data | Sync | Async adds 1,400x overhead |
|
||||
| In-memory operations | Sync | No I/O to wait on |
|
||||
| Single database query | Sync | Query time < async amortization |
|
||||
| Multiple independent API calls | Async + gather | Parallelism benefit outweighs overhead |
|
||||
| Web server (many connections) | Async framework | Concurrent handling essential |
|
||||
| CLI tool | Sync | Sequential operations, no benefit |
|
||||
|
||||
---
|
||||
|
||||
*Benchmark source: python-numbers-everyone-should-know (2026-01-01, Python 3.14.2, Apple Silicon)*
|
||||
109
papers/collection-membership.md
Normal file
109
papers/collection-membership.md
Normal file
@@ -0,0 +1,109 @@
|
||||
# Collection Membership: The 200x Performance Cliff
|
||||
|
||||
**Domain Paper: Python Collection Selection for Membership Testing**
|
||||
**Date:** 2026-01-03
|
||||
**Source:** Python Numbers Everyone Should Know benchmarks
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Membership testing (`x in collection`) is one of the most common operations in Python code. The choice of collection type can result in a **200x performance difference** at just 1,000 items.
|
||||
|
||||
**Key Finding**: At 1,000 items, checking if an item exists in a list takes 3.9 microseconds. The same check in a set takes 19 nanoseconds. That is a 206x difference.
|
||||
|
||||
---
|
||||
|
||||
## The Core Numbers
|
||||
|
||||
### Membership Testing Performance
|
||||
|
||||
| Operation | Time | Throughput |
|
||||
|-----------|------|------------|
|
||||
| `item in set` (existing) | 19.0 ns | 52.7M ops/sec |
|
||||
| `key in dict` (existing) | 20.8 ns | 48.1M ops/sec |
|
||||
| `item in list` (first) | 13.9 ns | 72.0M ops/sec |
|
||||
| `item in list` (middle, 500th) | 1,956 ns | 511k ops/sec |
|
||||
| `item in list` (last, 999th) | 3,852 ns | 260k ops/sec |
|
||||
| `item in list` (missing) | 3,915 ns | 255k ops/sec |
|
||||
|
||||
### The 200x Cliff Explained
|
||||
|
||||
```
|
||||
Set membership (any position): ~19 ns O(1)
|
||||
List membership (worst case): ~3,915 ns O(n)
|
||||
________
|
||||
206x slower
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Crossover Analysis
|
||||
|
||||
**Crossover point**: A list with ~15-20 items will match set performance for a full scan. Below that, lists may actually be faster due to lower overhead.
|
||||
|
||||
### When to Use Each Type
|
||||
|
||||
**Use Set when:**
|
||||
- Collection has more than ~20 items
|
||||
- Checking membership more than once
|
||||
- Order does not matter
|
||||
|
||||
**Use Dict when:**
|
||||
- You need to associate values with keys
|
||||
- Checking membership AND need to retrieve associated data
|
||||
|
||||
**Use List when:**
|
||||
- Collection is very small (< 20 items)
|
||||
- You iterate but rarely check membership
|
||||
- Items might not be hashable
|
||||
|
||||
---
|
||||
|
||||
## Practical Rules for Coding Agents
|
||||
|
||||
### Rule 1: Default to Set for Membership
|
||||
|
||||
```python
|
||||
# PREFER
|
||||
allowed_values = {'a', 'b', 'c'}
|
||||
if value in allowed_values:
|
||||
|
||||
# AVOID
|
||||
allowed_values = ['a', 'b', 'c']
|
||||
if value in allowed_values:
|
||||
```
|
||||
|
||||
### Rule 2: Convert Lists Before Repeated Lookups
|
||||
|
||||
```python
|
||||
def process_items(items: list, valid_ids: list):
|
||||
valid_set = set(valid_ids) # Convert once
|
||||
return [item for item in items if item.id in valid_set]
|
||||
```
|
||||
|
||||
### Rule 3: Prefer `dict.get()` Over Check-then-Access
|
||||
|
||||
```python
|
||||
# AVOID (double lookup)
|
||||
if key in config:
|
||||
value = config[key]
|
||||
|
||||
# PREFER (single lookup)
|
||||
value = config.get(key, default)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary Table
|
||||
|
||||
| Scenario | Best Choice | Why |
|
||||
|----------|-------------|-----|
|
||||
| Membership test on 1000+ items | Set | 200x faster than list |
|
||||
| Key-value lookup | Dict | O(1) access with associated data |
|
||||
| Ordered collection, rare membership | List | Lower memory, maintains order |
|
||||
| Very small collection (< 20 items) | List or Set | Negligible difference |
|
||||
|
||||
---
|
||||
|
||||
*Benchmark source: python-numbers-everyone-should-know*
|
||||
121
papers/database-patterns.md
Normal file
121
papers/database-patterns.md
Normal file
@@ -0,0 +1,121 @@
|
||||
# Database Patterns: SQLite, DiskCache, and MongoDB
|
||||
|
||||
**Domain:** Persistence and data access patterns in Python
|
||||
**Source:** python-numbers-everyone-should-know benchmarks
|
||||
**Date:** 2026-01-03
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
**Reads are cheap, writes are expensive.** SQLite commits dominate write latency (192 microseconds with commit vs 3 microseconds without). For read-heavy workloads, SQLite achieves 280K ops/sec by primary key. For write-heavy workloads, consider diskcache (8x faster writes) or batch operations.
|
||||
|
||||
---
|
||||
|
||||
## Key Findings
|
||||
|
||||
### The Numbers
|
||||
|
||||
| Operation | SQLite | DiskCache | MongoDB |
|
||||
|-----------|--------|-----------|---------|
|
||||
| **Write one object** | 192 us (5.2k/s) | 24 us (42k/s) | 119 us (8.4k/s) |
|
||||
| **Read by key/id** | 3.6 us (280k/s) | 4.3 us (236k/s) | 121 us (8.2k/s) |
|
||||
|
||||
### Finding 1: The Commit Tax
|
||||
|
||||
SQLite writes with commit: **192 microseconds**
|
||||
SQLite writes without commit: **3 microseconds**
|
||||
|
||||
The commit operation accounts for **98.4% of write latency**.
|
||||
|
||||
### Finding 2: DiskCache Wins for Simple Key-Value
|
||||
|
||||
| Operation | SQLite Raw | DiskCache |
|
||||
|-----------|------------|-----------|
|
||||
| Write | 192 us | 24 us |
|
||||
| Read | 3.6 us | 4.3 us |
|
||||
|
||||
DiskCache achieves **8x faster writes** with comparable read performance.
|
||||
|
||||
### Finding 3: Batching Provides 9x Throughput
|
||||
|
||||
SQLite `executemany()` 10 rows: **215 microseconds total** (21.5 us/row)
|
||||
10 individual inserts: **1,920 microseconds** (192 us/row)
|
||||
|
||||
---
|
||||
|
||||
## When to Use Each Storage Option
|
||||
|
||||
### SQLite
|
||||
- Read-heavy workloads (100:1 read/write ratio)
|
||||
- Need to query inside JSON with `json_extract()`
|
||||
- ACID guarantees matter
|
||||
|
||||
### DiskCache
|
||||
- Key-value storage with automatic serialization
|
||||
- Cache patterns (TTL, LRU eviction)
|
||||
- Agent state persistence
|
||||
|
||||
### MongoDB
|
||||
- Distributed, multi-node deployments
|
||||
- Complex aggregation pipelines
|
||||
- Full-text search requirements
|
||||
|
||||
---
|
||||
|
||||
## Practical Rules for Coding Agents
|
||||
|
||||
### Rule 1: Default to DiskCache for Agent State
|
||||
```python
|
||||
from diskcache import Cache
|
||||
cache = Cache('/tmp/agent-cache')
|
||||
cache.set('conversation:123', messages) # 24 microseconds
|
||||
```
|
||||
|
||||
### Rule 2: Batch SQLite Writes
|
||||
```python
|
||||
# GOOD: Batch with executemany
|
||||
conn.executemany('INSERT INTO items (data) VALUES (?)', items_list)
|
||||
conn.commit() # One commit for all items
|
||||
```
|
||||
|
||||
### Rule 3: Use Transactions for Multi-Step Operations
|
||||
```python
|
||||
conn.execute('BEGIN')
|
||||
conn.execute('INSERT INTO users ...')
|
||||
conn.execute('INSERT INTO audit_log ...')
|
||||
conn.commit() # One fsync
|
||||
```
|
||||
|
||||
### Rule 4: Lazy Import for CLI Tools
|
||||
```python
|
||||
def save_to_db(data):
|
||||
import sqlite3 # 1.63ms only when needed
|
||||
conn = sqlite3.connect('app.db')
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
| Metric | SQLite | DiskCache | MongoDB |
|
||||
|--------|--------|-----------|---------|
|
||||
| Write latency | 192 us | 24 us | 119 us |
|
||||
| Read latency | 3.6 us | 4.3 us | 121 us |
|
||||
| Writes/sec | 5.2k | 42k | 8.4k |
|
||||
| Reads/sec | 280k | 236k | 8.2k |
|
||||
|
||||
---
|
||||
|
||||
## The Bottom Line
|
||||
|
||||
1. **Reads are cheap everywhere** - Optimize for write patterns
|
||||
2. **SQLite commits dominate latency** - Batch or use transactions
|
||||
3. **DiskCache for key-value** - 8x faster writes, automatic serialization
|
||||
4. **MongoDB for distribution** - Not for local performance
|
||||
|
||||
*The tortoise way: Measure, understand the cost, choose deliberately.*
|
||||
|
||||
---
|
||||
|
||||
*Benchmark source: python-numbers-everyone-should-know*
|
||||
111
papers/exception-flow.md
Normal file
111
papers/exception-flow.md
Normal file
@@ -0,0 +1,111 @@
|
||||
# Exception Flow: Performance Patterns
|
||||
|
||||
**Domain:** Exception handling overhead
|
||||
**Source:** python-numbers-everyone-should-know benchmarks (Python 3.14.2, Apple Silicon)
|
||||
|
||||
---
|
||||
|
||||
## TL;DR
|
||||
|
||||
- **try/except with no exception**: Nearly free (1.1 ns overhead)
|
||||
- **Raising an exception**: 6.5x slower than the happy path (139 ns vs 21.5 ns)
|
||||
- **EAFP is fine when exceptions are rare** (<5% of calls)
|
||||
- **Use LBYL for expected failures** (dict key lookup, file existence)
|
||||
- **Never use exceptions for normal control flow**
|
||||
|
||||
---
|
||||
|
||||
## The Numbers
|
||||
|
||||
### Happy Path (No Exception Raised)
|
||||
|
||||
| Operation | Time | Overhead vs Baseline |
|
||||
|-----------|------|---------------------|
|
||||
| Function call (no try/except) | 20.4 ns | baseline |
|
||||
| try/except (no exception raised) | 21.5 ns | +1.1 ns (+5%) |
|
||||
| try/except ValueError (specific) | 22.9 ns | +2.5 ns (+12%) |
|
||||
| try/except/finally | 22.1 ns | +1.7 ns (+8%) |
|
||||
|
||||
**Key insight:** The try block itself is essentially free.
|
||||
|
||||
### Sad Path (Exception Raised)
|
||||
|
||||
| Operation | Time | Slowdown vs Happy Path |
|
||||
|-----------|------|----------------------|
|
||||
| raise + catch ValueError | 139 ns | **6.5x slower** |
|
||||
| raise + catch (base Exception) | 140 ns | 6.5x slower |
|
||||
| raise + catch custom exception | 146 ns | 6.8x slower |
|
||||
| raise + catch with `as e` | 148 ns | 6.9x slower |
|
||||
|
||||
**Key insight:** The 6.5x overhead comes from:
|
||||
1. Creating the exception object (~40 ns)
|
||||
2. Capturing the traceback (~70 ns)
|
||||
3. Stack unwinding and handler lookup (~30 ns)
|
||||
|
||||
---
|
||||
|
||||
## EAFP vs LBYL: When to Use Which
|
||||
|
||||
### EAFP (Easier to Ask Forgiveness than Permission)
|
||||
|
||||
```python
|
||||
try:
|
||||
value = data[key]
|
||||
except KeyError:
|
||||
value = default
|
||||
```
|
||||
|
||||
**Use when:** Exceptions are rare (<5% of calls)
|
||||
|
||||
### LBYL (Look Before You Leap)
|
||||
|
||||
```python
|
||||
if key in data:
|
||||
value = data[key]
|
||||
else:
|
||||
value = default
|
||||
```
|
||||
|
||||
**Use when:** The failure case is common (>15% of calls)
|
||||
|
||||
### Crossover Point
|
||||
|
||||
**Rule of thumb:** If exceptions occur more than 15% of the time, use LBYL.
|
||||
|
||||
### dict.get() Beats Both
|
||||
|
||||
```python
|
||||
# Best: Use .get() - 26.3 ns, no exception possible
|
||||
config = settings.get('database', {})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Practical Rules for Coding Agents
|
||||
|
||||
1. **try/except blocks are free** - don't avoid them for performance
|
||||
2. **Raising exceptions costs 6.5x** - only raise for truly exceptional cases
|
||||
3. **Use .get() for dicts** - beats both EAFP and LBYL
|
||||
4. **Return Optional for expected missing** - not exceptions
|
||||
5. **EAFP for file ops** - TOCTOU protection matters more than perf
|
||||
6. **LBYL when failures are common** (>15% of calls)
|
||||
7. **Never use exceptions for control flow**
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
| Scenario | Recommendation |
|
||||
|----------|----------------|
|
||||
| Exception rate <5% | EAFP (try/except) |
|
||||
| Exception rate >15% | LBYL (check first) |
|
||||
| Dict key lookup | Use `.get()` |
|
||||
| Optional return value | Return `None`, not exception |
|
||||
| File operations | EAFP (TOCTOU protection) |
|
||||
| Control flow | Never use exceptions |
|
||||
|
||||
**The core insight:** try/except is free; raising is not. Design APIs to minimize raises, not to avoid try blocks.
|
||||
|
||||
---
|
||||
|
||||
*Benchmark source: python-numbers-everyone-should-know*
|
||||
104
papers/import-optimization.md
Normal file
104
papers/import-optimization.md
Normal file
@@ -0,0 +1,104 @@
|
||||
# Import Optimization
|
||||
|
||||
**Domain Paper: Python Performance ADRs**
|
||||
**Date:** 2026-01-03
|
||||
**Source:** python-numbers-everyone-should-know benchmarks
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Import costs range from **sub-microsecond (cached) to 100+ milliseconds** (large frameworks). For CLI tools and short-lived scripts, import time can dominate total execution.
|
||||
|
||||
---
|
||||
|
||||
## Benchmark Data (First Import, Fresh Process)
|
||||
|
||||
### Built-in Modules
|
||||
|
||||
| Module | First Import |
|
||||
|--------|-------------|
|
||||
| `sys` | 0.2 us |
|
||||
| `os` | 0.2 us |
|
||||
| `math` | 24 us |
|
||||
|
||||
### Standard Library
|
||||
|
||||
| Module | First Import |
|
||||
|--------|-------------|
|
||||
| `datetime` | 72 us |
|
||||
| `typing` | 2.0 ms |
|
||||
| `json` | 2.9 ms |
|
||||
| `dataclasses` | 6.0 ms |
|
||||
| `logging` | 10.5 ms |
|
||||
| `asyncio` | 17.7 ms |
|
||||
|
||||
### External Packages
|
||||
|
||||
| Package | First Import |
|
||||
|---------|-------------|
|
||||
| `pydantic` | 15.8 ms |
|
||||
| `flask` | 47.3 ms |
|
||||
| `fastapi` | 104.4 ms |
|
||||
|
||||
**Key insight:** FastAPI takes 100ms just to import. For a CLI tool that runs in 50ms, this is unacceptable overhead.
|
||||
|
||||
---
|
||||
|
||||
## Lazy Import Patterns
|
||||
|
||||
### Pattern 1: Function-Level Import
|
||||
```python
|
||||
def process_data(data):
|
||||
import pandas as pd # Only when needed
|
||||
return pd.DataFrame(data)
|
||||
```
|
||||
|
||||
### Pattern 2: TYPE_CHECKING Guard
|
||||
```python
|
||||
from typing import TYPE_CHECKING
|
||||
|
||||
if TYPE_CHECKING:
|
||||
import pandas as pd
|
||||
|
||||
def process(data: "pd.DataFrame"):
|
||||
import pandas as pd
|
||||
return pd.DataFrame(data)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Practical Rules for Coding Agents
|
||||
|
||||
### MUST
|
||||
|
||||
1. **Use `TYPE_CHECKING` for type-only imports** when the type is from a heavy package
|
||||
2. **Use function-level imports for rarely-used code paths**
|
||||
3. **Never import heavy packages at module level in CLI tools**
|
||||
|
||||
### SHOULD
|
||||
|
||||
4. **Use `from __future__ import annotations`** for cleaner TYPE_CHECKING
|
||||
5. **Profile import time for new dependencies:**
|
||||
```bash
|
||||
python -c "import time; s=time.perf_counter(); import PACKAGE; print(f'{(time.perf_counter()-s)*1000:.1f}ms')"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary Table
|
||||
|
||||
| Scenario | Pattern | Example |
|
||||
|----------|---------|---------|
|
||||
| Type hints for heavy types | `TYPE_CHECKING` | pandas, numpy types |
|
||||
| Rarely-used function | Function-level import | Error handling paths |
|
||||
| CLI fast path | Defer until needed | `--version`, `--help` |
|
||||
| Serverless cold start | Minimize top-level | Lambda/Cloud Functions |
|
||||
|
||||
---
|
||||
|
||||
*Import costs are hidden taxes. Pay them lazily.*
|
||||
|
||||
---
|
||||
|
||||
*Benchmark source: python-numbers-everyone-should-know*
|
||||
93
papers/json-serialization.md
Normal file
93
papers/json-serialization.md
Normal file
@@ -0,0 +1,93 @@
|
||||
# JSON Serialization Performance in Python
|
||||
|
||||
**Domain Paper: Python Performance ADRs**
|
||||
**Date:** 2026-01-03
|
||||
**Source:** Python Numbers Everyone Should Know benchmarks
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Alternative JSON libraries like `orjson` and `msgspec` deliver **8-12x faster serialization** and **2-7x faster deserialization** compared to stdlib `json`. The performance gap is consistent across payload sizes.
|
||||
|
||||
---
|
||||
|
||||
## Key Findings
|
||||
|
||||
### Serialization Performance (dumps)
|
||||
|
||||
| Library | Simple Object | Complex Object | Speedup vs stdlib |
|
||||
|---------|--------------|----------------|-------------------|
|
||||
| `json.dumps()` | 708 ns | 2.65 us | 1x (baseline) |
|
||||
| `orjson.dumps()` | 61 ns | 310 ns | **11.6x / 8.5x** |
|
||||
| `msgspec.encode()` | 92 ns | 445 ns | 7.7x / 6.0x |
|
||||
| `ujson.dumps()` | 264 ns | 1.64 us | 2.7x / 1.6x |
|
||||
|
||||
### Deserialization Performance (loads)
|
||||
|
||||
| Library | Simple Object | Complex Object | Speedup vs stdlib |
|
||||
|---------|--------------|----------------|-------------------|
|
||||
| `json.loads()` | 714 ns | 2.22 us | 1x (baseline) |
|
||||
| `orjson.loads()` | 106 ns | 839 ns | **6.7x / 2.6x** |
|
||||
| `msgspec.decode()` | 101 ns | 850 ns | 7.1x / 2.6x |
|
||||
|
||||
---
|
||||
|
||||
## When to Use Each Library
|
||||
|
||||
### Use stdlib `json` when:
|
||||
- Zero dependencies required
|
||||
- Need custom JSONEncoder subclass
|
||||
- Compatibility is paramount
|
||||
|
||||
### Use `orjson` when:
|
||||
- Maximum performance needed
|
||||
- You can accept bytes output
|
||||
- You need datetime/UUID support
|
||||
|
||||
### Use `msgspec` when:
|
||||
- You need typed decoding
|
||||
- You want MessagePack too
|
||||
- Memory efficiency matters
|
||||
|
||||
---
|
||||
|
||||
## Practical Rules for Coding Agents
|
||||
|
||||
### Rule 1: Default to orjson for new projects
|
||||
|
||||
```python
|
||||
# Instead of:
|
||||
import json
|
||||
data = json.dumps(obj)
|
||||
|
||||
# Prefer:
|
||||
import orjson
|
||||
data = orjson.dumps(obj) # Returns bytes
|
||||
```
|
||||
|
||||
### Rule 2: Use stdlib json only when explicitly needed
|
||||
|
||||
Acceptable reasons:
|
||||
- Must avoid external dependencies
|
||||
- Need custom JSONEncoder subclass
|
||||
- Working in constrained environment
|
||||
|
||||
### Rule 3: Profile before optimizing JSON
|
||||
|
||||
At 2-3 microseconds per operation, JSON serialization is rarely the bottleneck unless you're doing thousands of operations per second.
|
||||
|
||||
---
|
||||
|
||||
## Summary Table
|
||||
|
||||
| Scenario | Recommendation | Expected Speedup |
|
||||
|----------|----------------|------------------|
|
||||
| General use | orjson | 8x serialization, 2.5x deserialization |
|
||||
| Typed data | msgspec | 6x + type safety |
|
||||
| Drop-in replacement | ujson | 1.5-2x |
|
||||
| Zero dependencies | json (stdlib) | Baseline |
|
||||
|
||||
---
|
||||
|
||||
*Benchmark source: python-numbers-everyone-should-know*
|
||||
102
papers/memory-slots.md
Normal file
102
papers/memory-slots.md
Normal file
@@ -0,0 +1,102 @@
|
||||
# Memory Optimization with __slots__ in Python
|
||||
|
||||
**Domain Paper: Python Performance ADRs**
|
||||
**Date:** 2026-01-03
|
||||
**Source:** python-numbers-everyone-should-know benchmarks
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Python's `__slots__` mechanism provides **52-70% memory reduction** when creating many instances of the same class.
|
||||
|
||||
**Key Finding**: For a class with 5 attributes, `__slots__` reduces instance memory from 694 bytes to 212 bytes (69% reduction).
|
||||
|
||||
---
|
||||
|
||||
## Benchmark Results: Memory Footprint
|
||||
|
||||
### Single Instance Memory (5 Attributes)
|
||||
|
||||
| Type | Memory (bytes) | vs Regular Class |
|
||||
|------|----------------|------------------|
|
||||
| Regular class | 694 | baseline |
|
||||
| `__slots__` class | 212 | -69% |
|
||||
| dataclass | 694 | same as regular |
|
||||
| `@dataclass(slots=True)` | 212 | -69% |
|
||||
| namedtuple | 228 | -67% |
|
||||
|
||||
### At Scale (1,000 Instances)
|
||||
|
||||
| Type | Total Memory |
|
||||
|------|--------------|
|
||||
| List of 1,000 regular class | 165.2 KB |
|
||||
| List of 1,000 `__slots__` class | 79.1 KB |
|
||||
|
||||
**Memory Savings**: 52% reduction at scale
|
||||
|
||||
---
|
||||
|
||||
## Attribute Access Speed (Virtually Identical)
|
||||
|
||||
| Operation | Regular | `__slots__` |
|
||||
|-----------|---------|-------------|
|
||||
| Read attr | 14.1 ns | 14.1 ns |
|
||||
| Write attr | 15.7 ns | 16.4 ns |
|
||||
|
||||
---
|
||||
|
||||
## Trade-offs
|
||||
|
||||
### What __slots__ Prevents
|
||||
|
||||
1. **No dynamic attribute assignment**
|
||||
2. **No `__dict__` access** (`vars()` doesn't work)
|
||||
3. **Inheritance complications**
|
||||
4. **No weak references by default**
|
||||
|
||||
---
|
||||
|
||||
## Practical Rules for Coding Agents
|
||||
|
||||
### Rule 1: Instance Count Threshold
|
||||
```
|
||||
IF creating > 100 instances of the same class
|
||||
AND attributes are fixed at design time
|
||||
THEN consider __slots__ or @dataclass(slots=True)
|
||||
```
|
||||
|
||||
### Rule 2: Prefer Slots Dataclass (Python 3.10+)
|
||||
```python
|
||||
@dataclass(slots=True)
|
||||
class User:
|
||||
id: int
|
||||
name: str
|
||||
email: str
|
||||
```
|
||||
|
||||
### Rule 3: Don't Optimize Prematurely
|
||||
For < 100 instances, use regular classes for flexibility.
|
||||
|
||||
### Rule 4: Document the Trade-off
|
||||
```python
|
||||
# Using __slots__ for memory efficiency (1000+ instances expected)
|
||||
__slots__ = ['x', 'y', 'z']
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
| Aspect | Regular Class | `__slots__` Class |
|
||||
|--------|---------------|-------------------|
|
||||
| Memory (5 attrs) | 694 bytes | 212 bytes |
|
||||
| Read speed | 14.1 ns | 14.1 ns |
|
||||
| Dynamic attributes | Yes | No |
|
||||
| Best for | Flexibility | Many instances |
|
||||
|
||||
**Bottom Line**: Use `__slots__` (or `@dataclass(slots=True)`) when creating many instances of fixed-attribute classes. For small numbers, stick with regular classes.
|
||||
|
||||
---
|
||||
|
||||
*Benchmark source: python-numbers-everyone-should-know*
|
||||
111
papers/string-formatting.md
Normal file
111
papers/string-formatting.md
Normal file
@@ -0,0 +1,111 @@
|
||||
# String Formatting: Domain Exploration
|
||||
|
||||
**Date:** 2026-01-03
|
||||
**Source:** python-numbers-everyone-should-know benchmarks
|
||||
**Python Version:** 3.14.2 (CPython, ARM64 macOS)
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
String formatting performance: **simple concatenation is fastest for trivial joins**, while **f-strings offer the best balance of readability and performance** for interpolation use cases.
|
||||
|
||||
---
|
||||
|
||||
## Raw Benchmark Results
|
||||
|
||||
| Operation | Time (ns) | Throughput |
|
||||
|-----------|-----------|------------|
|
||||
| `concat_small` | 39.1 ns | 25.6M ops/sec |
|
||||
| `f_string` | 64.9 ns | 15.4M ops/sec |
|
||||
| `percent_formatting` | 89.8 ns | 11.1M ops/sec |
|
||||
| `format_method` | 103 ns | 9.7M ops/sec |
|
||||
|
||||
### Relative Performance
|
||||
|
||||
| Method | vs f-string |
|
||||
|--------|-------------|
|
||||
| `concat_small` | 1.66x faster |
|
||||
| `f_string` | 1.00x (reference) |
|
||||
| `percent_formatting` | 0.72x slower |
|
||||
| `format_method` | 0.63x slower |
|
||||
|
||||
---
|
||||
|
||||
## Why F-Strings Are Fast
|
||||
|
||||
F-strings are parsed at **compile time**, not runtime:
|
||||
|
||||
1. **No method lookup**: F-strings don't call `.format()` at runtime
|
||||
2. **No tuple creation**: `%` formatting requires `(name,)` tuple
|
||||
3. **Specialized bytecode**: `FORMAT_VALUE` and `BUILD_STRING` are optimized
|
||||
|
||||
---
|
||||
|
||||
## When to Use Each Method
|
||||
|
||||
### Concatenation Wins
|
||||
For 2-3 literal strings with no formatting:
|
||||
```python
|
||||
path = base_dir + '/' + filename # Simpler, faster
|
||||
```
|
||||
|
||||
### % Formatting for Logging
|
||||
```python
|
||||
# Deferred evaluation - string built only if debug enabled
|
||||
logger.debug('Processing %s items', count)
|
||||
|
||||
# f-string - string ALWAYS built, then discarded
|
||||
logger.debug(f'Processing {count} items') # Wasteful
|
||||
```
|
||||
|
||||
### .format() for Dynamic Templates
|
||||
```python
|
||||
template = get_template_from_config() # Returns 'User: {name}'
|
||||
result = template.format(name=user.name)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Practical Rules for Coding Agents
|
||||
|
||||
### Rule 1: Default to F-Strings
|
||||
```python
|
||||
# Preferred
|
||||
message = f'User {user.name} logged in at {timestamp}'
|
||||
```
|
||||
|
||||
### Rule 2: Use Concatenation for Trivial Joins
|
||||
```python
|
||||
url = base_url + endpoint # Fine - simpler and faster
|
||||
```
|
||||
|
||||
### Rule 3: Use join() for Multiple Parts
|
||||
```python
|
||||
# Correct - O(n) time
|
||||
result = ''.join([part1, part2, part3, part4])
|
||||
|
||||
# Inefficient - O(n^2) time
|
||||
result = part1 + part2 + part3 + part4
|
||||
```
|
||||
|
||||
### Rule 4: Keep % for Logging
|
||||
```python
|
||||
logger.info('Processed %d records in %.2fs', count, elapsed)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
| Scenario | Best Choice | Reason |
|
||||
|----------|-------------|--------|
|
||||
| Variable interpolation | f-string | 1.6x faster than `.format()` |
|
||||
| Simple 2-part join | Concatenation | 1.7x faster than f-string |
|
||||
| Building from many parts | `''.join()` | O(n) vs O(n^2) |
|
||||
| Logging statements | `%` style | Deferred evaluation |
|
||||
| Dynamic templates | `.format()` | Template flexibility |
|
||||
|
||||
---
|
||||
|
||||
*Benchmark source: python-numbers-everyone-should-know*
|
||||
Reference in New Issue
Block a user