feat: Add 8 domain papers and RULEBOOK.md

Domain papers distilled from python-numbers-everyone-should-know:
- async-overhead: 1,400x sync vs async overhead
- collection-membership: 200x set vs list at 1000 items
- json-serialization: 8x orjson vs stdlib
- exception-flow: 6.5x exception overhead (try/except free)
- string-formatting: f-strings > % > .format()
- memory-slots: 69% memory reduction with __slots__
- import-optimization: 100ms+ for heavy packages
- database-patterns: 98% commit overhead in SQLite

RULEBOOK.md: ~200 token distillation for coding subagents

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
dafit
2026-01-03 14:31:40 +01:00
parent 4def3b46c2
commit 7efd1368d1
9 changed files with 909 additions and 0 deletions

121
papers/database-patterns.md Normal file
View File

@@ -0,0 +1,121 @@
# Database Patterns: SQLite, DiskCache, and MongoDB
**Domain:** Persistence and data access patterns in Python
**Source:** python-numbers-everyone-should-know benchmarks
**Date:** 2026-01-03
---
## Executive Summary
**Reads are cheap, writes are expensive.** SQLite commits dominate write latency (192 microseconds with commit vs 3 microseconds without). For read-heavy workloads, SQLite achieves 280K ops/sec by primary key. For write-heavy workloads, consider diskcache (8x faster writes) or batch operations.
---
## Key Findings
### The Numbers
| Operation | SQLite | DiskCache | MongoDB |
|-----------|--------|-----------|---------|
| **Write one object** | 192 us (5.2k/s) | 24 us (42k/s) | 119 us (8.4k/s) |
| **Read by key/id** | 3.6 us (280k/s) | 4.3 us (236k/s) | 121 us (8.2k/s) |
### Finding 1: The Commit Tax
SQLite writes with commit: **192 microseconds**
SQLite writes without commit: **3 microseconds**
The commit operation accounts for **98.4% of write latency**.
### Finding 2: DiskCache Wins for Simple Key-Value
| Operation | SQLite Raw | DiskCache |
|-----------|------------|-----------|
| Write | 192 us | 24 us |
| Read | 3.6 us | 4.3 us |
DiskCache achieves **8x faster writes** with comparable read performance.
### Finding 3: Batching Provides 9x Throughput
SQLite `executemany()` 10 rows: **215 microseconds total** (21.5 us/row)
10 individual inserts: **1,920 microseconds** (192 us/row)
---
## When to Use Each Storage Option
### SQLite
- Read-heavy workloads (100:1 read/write ratio)
- Need to query inside JSON with `json_extract()`
- ACID guarantees matter
### DiskCache
- Key-value storage with automatic serialization
- Cache patterns (TTL, LRU eviction)
- Agent state persistence
### MongoDB
- Distributed, multi-node deployments
- Complex aggregation pipelines
- Full-text search requirements
---
## Practical Rules for Coding Agents
### Rule 1: Default to DiskCache for Agent State
```python
from diskcache import Cache
cache = Cache('/tmp/agent-cache')
cache.set('conversation:123', messages) # 24 microseconds
```
### Rule 2: Batch SQLite Writes
```python
# GOOD: Batch with executemany
conn.executemany('INSERT INTO items (data) VALUES (?)', items_list)
conn.commit() # One commit for all items
```
### Rule 3: Use Transactions for Multi-Step Operations
```python
conn.execute('BEGIN')
conn.execute('INSERT INTO users ...')
conn.execute('INSERT INTO audit_log ...')
conn.commit() # One fsync
```
### Rule 4: Lazy Import for CLI Tools
```python
def save_to_db(data):
import sqlite3 # 1.63ms only when needed
conn = sqlite3.connect('app.db')
```
---
## Summary
| Metric | SQLite | DiskCache | MongoDB |
|--------|--------|-----------|---------|
| Write latency | 192 us | 24 us | 119 us |
| Read latency | 3.6 us | 4.3 us | 121 us |
| Writes/sec | 5.2k | 42k | 8.4k |
| Reads/sec | 280k | 236k | 8.2k |
---
## The Bottom Line
1. **Reads are cheap everywhere** - Optimize for write patterns
2. **SQLite commits dominate latency** - Batch or use transactions
3. **DiskCache for key-value** - 8x faster writes, automatic serialization
4. **MongoDB for distribution** - Not for local performance
*The tortoise way: Measure, understand the cost, choose deliberately.*
---
*Benchmark source: python-numbers-everyone-should-know*