Domain papers distilled from python-numbers-everyone-should-know: - async-overhead: 1,400x sync vs async overhead - collection-membership: 200x set vs list at 1000 items - json-serialization: 8x orjson vs stdlib - exception-flow: 6.5x exception overhead (try/except free) - string-formatting: f-strings > % > .format() - memory-slots: 69% memory reduction with __slots__ - import-optimization: 100ms+ for heavy packages - database-patterns: 98% commit overhead in SQLite RULEBOOK.md: ~200 token distillation for coding subagents 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
122 lines
3.2 KiB
Markdown
122 lines
3.2 KiB
Markdown
# Database Patterns: SQLite, DiskCache, and MongoDB
|
|
|
|
**Domain:** Persistence and data access patterns in Python
|
|
**Source:** python-numbers-everyone-should-know benchmarks
|
|
**Date:** 2026-01-03
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
**Reads are cheap, writes are expensive.** SQLite commits dominate write latency (192 microseconds with commit vs 3 microseconds without). For read-heavy workloads, SQLite achieves 280K ops/sec by primary key. For write-heavy workloads, consider diskcache (8x faster writes) or batch operations.
|
|
|
|
---
|
|
|
|
## Key Findings
|
|
|
|
### The Numbers
|
|
|
|
| Operation | SQLite | DiskCache | MongoDB |
|
|
|-----------|--------|-----------|---------|
|
|
| **Write one object** | 192 us (5.2k/s) | 24 us (42k/s) | 119 us (8.4k/s) |
|
|
| **Read by key/id** | 3.6 us (280k/s) | 4.3 us (236k/s) | 121 us (8.2k/s) |
|
|
|
|
### Finding 1: The Commit Tax
|
|
|
|
SQLite writes with commit: **192 microseconds**
|
|
SQLite writes without commit: **3 microseconds**
|
|
|
|
The commit operation accounts for **98.4% of write latency**.
|
|
|
|
### Finding 2: DiskCache Wins for Simple Key-Value
|
|
|
|
| Operation | SQLite Raw | DiskCache |
|
|
|-----------|------------|-----------|
|
|
| Write | 192 us | 24 us |
|
|
| Read | 3.6 us | 4.3 us |
|
|
|
|
DiskCache achieves **8x faster writes** with comparable read performance.
|
|
|
|
### Finding 3: Batching Provides 9x Throughput
|
|
|
|
SQLite `executemany()` 10 rows: **215 microseconds total** (21.5 us/row)
|
|
10 individual inserts: **1,920 microseconds** (192 us/row)
|
|
|
|
---
|
|
|
|
## When to Use Each Storage Option
|
|
|
|
### SQLite
|
|
- Read-heavy workloads (100:1 read/write ratio)
|
|
- Need to query inside JSON with `json_extract()`
|
|
- ACID guarantees matter
|
|
|
|
### DiskCache
|
|
- Key-value storage with automatic serialization
|
|
- Cache patterns (TTL, LRU eviction)
|
|
- Agent state persistence
|
|
|
|
### MongoDB
|
|
- Distributed, multi-node deployments
|
|
- Complex aggregation pipelines
|
|
- Full-text search requirements
|
|
|
|
---
|
|
|
|
## Practical Rules for Coding Agents
|
|
|
|
### Rule 1: Default to DiskCache for Agent State
|
|
```python
|
|
from diskcache import Cache
|
|
cache = Cache('/tmp/agent-cache')
|
|
cache.set('conversation:123', messages) # 24 microseconds
|
|
```
|
|
|
|
### Rule 2: Batch SQLite Writes
|
|
```python
|
|
# GOOD: Batch with executemany
|
|
conn.executemany('INSERT INTO items (data) VALUES (?)', items_list)
|
|
conn.commit() # One commit for all items
|
|
```
|
|
|
|
### Rule 3: Use Transactions for Multi-Step Operations
|
|
```python
|
|
conn.execute('BEGIN')
|
|
conn.execute('INSERT INTO users ...')
|
|
conn.execute('INSERT INTO audit_log ...')
|
|
conn.commit() # One fsync
|
|
```
|
|
|
|
### Rule 4: Lazy Import for CLI Tools
|
|
```python
|
|
def save_to_db(data):
|
|
import sqlite3 # 1.63ms only when needed
|
|
conn = sqlite3.connect('app.db')
|
|
```
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
| Metric | SQLite | DiskCache | MongoDB |
|
|
|--------|--------|-----------|---------|
|
|
| Write latency | 192 us | 24 us | 119 us |
|
|
| Read latency | 3.6 us | 4.3 us | 121 us |
|
|
| Writes/sec | 5.2k | 42k | 8.4k |
|
|
| Reads/sec | 280k | 236k | 8.2k |
|
|
|
|
---
|
|
|
|
## The Bottom Line
|
|
|
|
1. **Reads are cheap everywhere** - Optimize for write patterns
|
|
2. **SQLite commits dominate latency** - Batch or use transactions
|
|
3. **DiskCache for key-value** - 8x faster writes, automatic serialization
|
|
4. **MongoDB for distribution** - Not for local performance
|
|
|
|
*The tortoise way: Measure, understand the cost, choose deliberately.*
|
|
|
|
---
|
|
|
|
*Benchmark source: python-numbers-everyone-should-know*
|