python-performance-adrs/papers/async-overhead.md

# Async Overhead in Python: When the Cure is Worse Than the Disease

**Domain Paper: Python Performance ADRs**
**Date:** 2026-01-03
**Source:** Python Numbers Everyone Should Know benchmarks (Python 3.14.2, Apple Silicon)

---

## Executive Summary

Async Python introduces a **1,400x overhead** for simple operations compared to synchronous equivalents. This overhead is fixed regardless of what work the function does. The critical insight: async only makes sense when you're waiting on I/O that takes orders of magnitude longer than this overhead.

**The Core Numbers:**
- Sync function call: **20.3 ns**
- Async equivalent via `run_until_complete`: **28.2 us** (28,200 ns)
- **Ratio: 1,387x slower** (approximately 1,400x)

---

## What Was Benchmarked

### Methodology

The benchmarks measured pure async machinery overhead using CPython 3.14.2 on Apple Silicon. Each operation was run thousands of times with warmup periods, reporting median values.

### Test Functions

```python
# The async function being tested
async def return_value_coro():
    return 42

# The sync equivalent
def sync_function():
    return 42
```

---

## Key Findings

### Coroutine Creation (Cheap)

| Operation | Time |
|-----------|------|
| Create coroutine object | 47.0 ns |

**Key insight:** Creating a coroutine object is cheap (47 ns). The cost comes when you actually run it.

### Running Coroutines (Expensive)

| Operation | Time |
|-----------|------|
| `run_until_complete(empty)` | 27.6 us |
| `run_until_complete(return value)` | 26.6 us |
| Run nested await | 28.9 us |

**Key insight:** Every `run_until_complete` costs ~27 us regardless of coroutine complexity.

### The Critical Comparison

| Operation | Time | Ratio |
|-----------|------|-------|
| Sync function call | 20.3 ns | 1x |
| Async equivalent | 28.2 us | **1,387x** |

---

## When Async IS Appropriate

### Good Use Cases

1. **Web servers handling concurrent connections** - FastAPI/Starlette: 115-125k req/sec
2. **Concurrent network I/O** - Fetching data from multiple APIs simultaneously
3. **High-latency operations with parallelism** - `asyncio.gather()` for multiple slow API calls

### Bad Use Cases

1. **Wrapping synchronous database drivers** - Use native async drivers or stay sync
2. **CPU-bound computation** - Async doesn't parallelize CPU work (GIL)
3. **Simple scripts with sequential operations** - CLI tools, data processing pipelines

---

## Practical Rules for Coding Agents

### Rule 1: Default to Sync
Write synchronous code unless you have a specific, measurable need for async.

### Rule 2: The 1ms Threshold
Only consider async when individual I/O operations take **>1 millisecond**.

### Rule 3: Batch Over Broadcast
If you need async, gather operations together:

```python
# Good: 27 us overhead ONCE
results = await asyncio.gather(*[fetch(url) for url in urls])

# Bad: 27 us overhead PER call
for url in urls:
    result = await fetch(url)
```

### Rule 4: Stay in the Loop
Avoid `run_until_complete` inside an already-running loop.

### Rule 5: Match Your I/O Library
Use async libraries for async code, sync libraries for sync code.

---

## Summary Table

| Scenario | Recommendation | Reasoning |
|----------|----------------|-----------|
| Simple function returning data | Sync | Async adds 1,400x overhead |
| In-memory operations | Sync | No I/O to wait on |
| Single database query | Sync | Query time < async amortization |
| Multiple independent API calls | Async + gather | Parallelism benefit outweighs overhead |
| Web server (many connections) | Async framework | Concurrent handling essential |
| CLI tool | Sync | Sequential operations, no benefit |

---

*Benchmark source: python-numbers-everyone-should-know (2026-01-01, Python 3.14.2, Apple Silicon)*