Domain papers distilled from python-numbers-everyone-should-know: - async-overhead: 1,400x sync vs async overhead - collection-membership: 200x set vs list at 1000 items - json-serialization: 8x orjson vs stdlib - exception-flow: 6.5x exception overhead (try/except free) - string-formatting: f-strings > % > .format() - memory-slots: 69% memory reduction with __slots__ - import-optimization: 100ms+ for heavy packages - database-patterns: 98% commit overhead in SQLite RULEBOOK.md: ~200 token distillation for coding subagents 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
3.7 KiB
Async Overhead in Python: When the Cure is Worse Than the Disease
Domain Paper: Python Performance ADRs Date: 2026-01-03 Source: Python Numbers Everyone Should Know benchmarks (Python 3.14.2, Apple Silicon)
Executive Summary
Async Python introduces a 1,400x overhead for simple operations compared to synchronous equivalents. This overhead is fixed regardless of what work the function does. The critical insight: async only makes sense when you're waiting on I/O that takes orders of magnitude longer than this overhead.
The Core Numbers:
- Sync function call: 20.3 ns
- Async equivalent via
run_until_complete: 28.2 us (28,200 ns) - Ratio: 1,387x slower (approximately 1,400x)
What Was Benchmarked
Methodology
The benchmarks measured pure async machinery overhead using CPython 3.14.2 on Apple Silicon. Each operation was run thousands of times with warmup periods, reporting median values.
Test Functions
# The async function being tested
async def return_value_coro():
return 42
# The sync equivalent
def sync_function():
return 42
Key Findings
Coroutine Creation (Cheap)
| Operation | Time |
|---|---|
| Create coroutine object | 47.0 ns |
Key insight: Creating a coroutine object is cheap (47 ns). The cost comes when you actually run it.
Running Coroutines (Expensive)
| Operation | Time |
|---|---|
run_until_complete(empty) |
27.6 us |
run_until_complete(return value) |
26.6 us |
| Run nested await | 28.9 us |
Key insight: Every run_until_complete costs ~27 us regardless of coroutine complexity.
The Critical Comparison
| Operation | Time | Ratio |
|---|---|---|
| Sync function call | 20.3 ns | 1x |
| Async equivalent | 28.2 us | 1,387x |
When Async IS Appropriate
Good Use Cases
- Web servers handling concurrent connections - FastAPI/Starlette: 115-125k req/sec
- Concurrent network I/O - Fetching data from multiple APIs simultaneously
- High-latency operations with parallelism -
asyncio.gather()for multiple slow API calls
Bad Use Cases
- Wrapping synchronous database drivers - Use native async drivers or stay sync
- CPU-bound computation - Async doesn't parallelize CPU work (GIL)
- Simple scripts with sequential operations - CLI tools, data processing pipelines
Practical Rules for Coding Agents
Rule 1: Default to Sync
Write synchronous code unless you have a specific, measurable need for async.
Rule 2: The 1ms Threshold
Only consider async when individual I/O operations take >1 millisecond.
Rule 3: Batch Over Broadcast
If you need async, gather operations together:
# Good: 27 us overhead ONCE
results = await asyncio.gather(*[fetch(url) for url in urls])
# Bad: 27 us overhead PER call
for url in urls:
result = await fetch(url)
Rule 4: Stay in the Loop
Avoid run_until_complete inside an already-running loop.
Rule 5: Match Your I/O Library
Use async libraries for async code, sync libraries for sync code.
Summary Table
| Scenario | Recommendation | Reasoning |
|---|---|---|
| Simple function returning data | Sync | Async adds 1,400x overhead |
| In-memory operations | Sync | No I/O to wait on |
| Single database query | Sync | Query time < async amortization |
| Multiple independent API calls | Async + gather | Parallelism benefit outweighs overhead |
| Web server (many connections) | Async framework | Concurrent handling essential |
| CLI tool | Sync | Sequential operations, no benefit |
Benchmark source: python-numbers-everyone-should-know (2026-01-01, Python 3.14.2, Apple Silicon)