Files

dafit 7efd1368d1 feat: Add 8 domain papers and RULEBOOK.md

Domain papers distilled from python-numbers-everyone-should-know:
- async-overhead: 1,400x sync vs async overhead
- collection-membership: 200x set vs list at 1000 items
- json-serialization: 8x orjson vs stdlib
- exception-flow: 6.5x exception overhead (try/except free)
- string-formatting: f-strings > % > .format()
- memory-slots: 69% memory reduction with __slots__
- import-optimization: 100ms+ for heavy packages
- database-patterns: 98% commit overhead in SQLite

RULEBOOK.md: ~200 token distillation for coding subagents

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-03 14:31:40 +01:00

3.2 KiB

Raw Permalink Blame History

Database Patterns: SQLite, DiskCache, and MongoDB

Domain: Persistence and data access patterns in Python Source: python-numbers-everyone-should-know benchmarks Date: 2026-01-03

Executive Summary

Reads are cheap, writes are expensive. SQLite commits dominate write latency (192 microseconds with commit vs 3 microseconds without). For read-heavy workloads, SQLite achieves 280K ops/sec by primary key. For write-heavy workloads, consider diskcache (8x faster writes) or batch operations.

Key Findings

The Numbers

Operation	SQLite	DiskCache	MongoDB
Write one object	192 us (5.2k/s)	24 us (42k/s)	119 us (8.4k/s)
Read by key/id	3.6 us (280k/s)	4.3 us (236k/s)	121 us (8.2k/s)

Finding 1: The Commit Tax

SQLite writes with commit: 192 microseconds SQLite writes without commit: 3 microseconds

The commit operation accounts for 98.4% of write latency.

Finding 2: DiskCache Wins for Simple Key-Value

Operation	SQLite Raw	DiskCache
Write	192 us	24 us
Read	3.6 us	4.3 us

DiskCache achieves 8x faster writes with comparable read performance.

Finding 3: Batching Provides 9x Throughput

SQLite executemany() 10 rows: 215 microseconds total (21.5 us/row) 10 individual inserts: 1,920 microseconds (192 us/row)

When to Use Each Storage Option

SQLite

Read-heavy workloads (100:1 read/write ratio)
Need to query inside JSON with json_extract()
ACID guarantees matter

DiskCache

Key-value storage with automatic serialization
Cache patterns (TTL, LRU eviction)
Agent state persistence

MongoDB

Distributed, multi-node deployments
Complex aggregation pipelines
Full-text search requirements

Practical Rules for Coding Agents

Rule 1: Default to DiskCache for Agent State

from diskcache import Cache
cache = Cache('/tmp/agent-cache')
cache.set('conversation:123', messages)  # 24 microseconds

Rule 2: Batch SQLite Writes

# GOOD: Batch with executemany
conn.executemany('INSERT INTO items (data) VALUES (?)', items_list)
conn.commit()  # One commit for all items

Rule 3: Use Transactions for Multi-Step Operations

conn.execute('BEGIN')
conn.execute('INSERT INTO users ...')
conn.execute('INSERT INTO audit_log ...')
conn.commit()  # One fsync

Rule 4: Lazy Import for CLI Tools

def save_to_db(data):
    import sqlite3  # 1.63ms only when needed
    conn = sqlite3.connect('app.db')

Summary

Metric	SQLite	DiskCache	MongoDB
Write latency	192 us	24 us	119 us
Read latency	3.6 us	4.3 us	121 us
Writes/sec	5.2k	42k	8.4k
Reads/sec	280k	236k	8.2k

The Bottom Line

Reads are cheap everywhere - Optimize for write patterns
SQLite commits dominate latency - Batch or use transactions
DiskCache for key-value - 8x faster writes, automatic serialization
MongoDB for distribution - Not for local performance

The tortoise way: Measure, understand the cost, choose deliberately.

Benchmark source: python-numbers-everyone-should-know

3.2 KiB Raw Permalink Blame History