feat: Add 8 domain papers and RULEBOOK.md
Domain papers distilled from python-numbers-everyone-should-know: - async-overhead: 1,400x sync vs async overhead - collection-membership: 200x set vs list at 1000 items - json-serialization: 8x orjson vs stdlib - exception-flow: 6.5x exception overhead (try/except free) - string-formatting: f-strings > % > .format() - memory-slots: 69% memory reduction with __slots__ - import-optimization: 100ms+ for heavy packages - database-patterns: 98% commit overhead in SQLite RULEBOOK.md: ~200 token distillation for coding subagents 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
121
papers/database-patterns.md
Normal file
121
papers/database-patterns.md
Normal file
@@ -0,0 +1,121 @@
|
||||
# Database Patterns: SQLite, DiskCache, and MongoDB
|
||||
|
||||
**Domain:** Persistence and data access patterns in Python
|
||||
**Source:** python-numbers-everyone-should-know benchmarks
|
||||
**Date:** 2026-01-03
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
**Reads are cheap, writes are expensive.** SQLite commits dominate write latency (192 microseconds with commit vs 3 microseconds without). For read-heavy workloads, SQLite achieves 280K ops/sec by primary key. For write-heavy workloads, consider diskcache (8x faster writes) or batch operations.
|
||||
|
||||
---
|
||||
|
||||
## Key Findings
|
||||
|
||||
### The Numbers
|
||||
|
||||
| Operation | SQLite | DiskCache | MongoDB |
|
||||
|-----------|--------|-----------|---------|
|
||||
| **Write one object** | 192 us (5.2k/s) | 24 us (42k/s) | 119 us (8.4k/s) |
|
||||
| **Read by key/id** | 3.6 us (280k/s) | 4.3 us (236k/s) | 121 us (8.2k/s) |
|
||||
|
||||
### Finding 1: The Commit Tax
|
||||
|
||||
SQLite writes with commit: **192 microseconds**
|
||||
SQLite writes without commit: **3 microseconds**
|
||||
|
||||
The commit operation accounts for **98.4% of write latency**.
|
||||
|
||||
### Finding 2: DiskCache Wins for Simple Key-Value
|
||||
|
||||
| Operation | SQLite Raw | DiskCache |
|
||||
|-----------|------------|-----------|
|
||||
| Write | 192 us | 24 us |
|
||||
| Read | 3.6 us | 4.3 us |
|
||||
|
||||
DiskCache achieves **8x faster writes** with comparable read performance.
|
||||
|
||||
### Finding 3: Batching Provides 9x Throughput
|
||||
|
||||
SQLite `executemany()` 10 rows: **215 microseconds total** (21.5 us/row)
|
||||
10 individual inserts: **1,920 microseconds** (192 us/row)
|
||||
|
||||
---
|
||||
|
||||
## When to Use Each Storage Option
|
||||
|
||||
### SQLite
|
||||
- Read-heavy workloads (100:1 read/write ratio)
|
||||
- Need to query inside JSON with `json_extract()`
|
||||
- ACID guarantees matter
|
||||
|
||||
### DiskCache
|
||||
- Key-value storage with automatic serialization
|
||||
- Cache patterns (TTL, LRU eviction)
|
||||
- Agent state persistence
|
||||
|
||||
### MongoDB
|
||||
- Distributed, multi-node deployments
|
||||
- Complex aggregation pipelines
|
||||
- Full-text search requirements
|
||||
|
||||
---
|
||||
|
||||
## Practical Rules for Coding Agents
|
||||
|
||||
### Rule 1: Default to DiskCache for Agent State
|
||||
```python
|
||||
from diskcache import Cache
|
||||
cache = Cache('/tmp/agent-cache')
|
||||
cache.set('conversation:123', messages) # 24 microseconds
|
||||
```
|
||||
|
||||
### Rule 2: Batch SQLite Writes
|
||||
```python
|
||||
# GOOD: Batch with executemany
|
||||
conn.executemany('INSERT INTO items (data) VALUES (?)', items_list)
|
||||
conn.commit() # One commit for all items
|
||||
```
|
||||
|
||||
### Rule 3: Use Transactions for Multi-Step Operations
|
||||
```python
|
||||
conn.execute('BEGIN')
|
||||
conn.execute('INSERT INTO users ...')
|
||||
conn.execute('INSERT INTO audit_log ...')
|
||||
conn.commit() # One fsync
|
||||
```
|
||||
|
||||
### Rule 4: Lazy Import for CLI Tools
|
||||
```python
|
||||
def save_to_db(data):
|
||||
import sqlite3 # 1.63ms only when needed
|
||||
conn = sqlite3.connect('app.db')
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
| Metric | SQLite | DiskCache | MongoDB |
|
||||
|--------|--------|-----------|---------|
|
||||
| Write latency | 192 us | 24 us | 119 us |
|
||||
| Read latency | 3.6 us | 4.3 us | 121 us |
|
||||
| Writes/sec | 5.2k | 42k | 8.4k |
|
||||
| Reads/sec | 280k | 236k | 8.2k |
|
||||
|
||||
---
|
||||
|
||||
## The Bottom Line
|
||||
|
||||
1. **Reads are cheap everywhere** - Optimize for write patterns
|
||||
2. **SQLite commits dominate latency** - Batch or use transactions
|
||||
3. **DiskCache for key-value** - 8x faster writes, automatic serialization
|
||||
4. **MongoDB for distribution** - Not for local performance
|
||||
|
||||
*The tortoise way: Measure, understand the cost, choose deliberately.*
|
||||
|
||||
---
|
||||
|
||||
*Benchmark source: python-numbers-everyone-should-know*
|
||||
Reference in New Issue
Block a user