diff --git a/RULEBOOK.md b/RULEBOOK.md new file mode 100644 index 0000000..09feb47 --- /dev/null +++ b/RULEBOOK.md @@ -0,0 +1,32 @@ +# RULEBOOK: Python Performance + +**Load into coding subagents. ~200 tokens. See `/papers/*.md` for deep dives.** + +## The Numbers +- Async overhead: **1,400x** (sync 20ns, async 28us) +- Set vs list membership: **200x** at 1000 items +- orjson vs stdlib json: **8x** faster +- Exception raise: **6.5x** (try/except is free) +- SQLite commit: **98%** of write latency + +## Rules + +**Collections**: Default to `set` for membership. Use `dict.get()` not check-then-access. + +**Async**: Default sync. Only async for I/O >1ms. Use `gather()` not loops. + +**JSON**: Default `orjson`. stdlib only for zero-deps requirement. + +**Exceptions**: try/except is free. Use `.get()` for dicts. EAFP for <15% failure rate. + +**Strings**: f-strings default. `%` for logging (deferred eval). `join()` for many parts. + +**Memory**: `@dataclass(slots=True)` for 100+ instances. + +**Imports**: `TYPE_CHECKING` for heavy types. Lazy imports in CLI tools. + +**Database**: DiskCache for key-value. Batch SQLite writes. Reads cheap, writes expensive. + +--- + +*Source: python-numbers-everyone-should-know (Python 3.14.2, Apple Silicon)* diff --git a/papers/async-overhead.md b/papers/async-overhead.md new file mode 100644 index 0000000..4aca9cf --- /dev/null +++ b/papers/async-overhead.md @@ -0,0 +1,126 @@ +# Async Overhead in Python: When the Cure is Worse Than the Disease + +**Domain Paper: Python Performance ADRs** +**Date:** 2026-01-03 +**Source:** Python Numbers Everyone Should Know benchmarks (Python 3.14.2, Apple Silicon) + +--- + +## Executive Summary + +Async Python introduces a **1,400x overhead** for simple operations compared to synchronous equivalents. This overhead is fixed regardless of what work the function does. The critical insight: async only makes sense when you're waiting on I/O that takes orders of magnitude longer than this overhead. + +**The Core Numbers:** +- Sync function call: **20.3 ns** +- Async equivalent via `run_until_complete`: **28.2 us** (28,200 ns) +- **Ratio: 1,387x slower** (approximately 1,400x) + +--- + +## What Was Benchmarked + +### Methodology + +The benchmarks measured pure async machinery overhead using CPython 3.14.2 on Apple Silicon. Each operation was run thousands of times with warmup periods, reporting median values. + +### Test Functions + +```python +# The async function being tested +async def return_value_coro(): + return 42 + +# The sync equivalent +def sync_function(): + return 42 +``` + +--- + +## Key Findings + +### Coroutine Creation (Cheap) + +| Operation | Time | +|-----------|------| +| Create coroutine object | 47.0 ns | + +**Key insight:** Creating a coroutine object is cheap (47 ns). The cost comes when you actually run it. + +### Running Coroutines (Expensive) + +| Operation | Time | +|-----------|------| +| `run_until_complete(empty)` | 27.6 us | +| `run_until_complete(return value)` | 26.6 us | +| Run nested await | 28.9 us | + +**Key insight:** Every `run_until_complete` costs ~27 us regardless of coroutine complexity. + +### The Critical Comparison + +| Operation | Time | Ratio | +|-----------|------|-------| +| Sync function call | 20.3 ns | 1x | +| Async equivalent | 28.2 us | **1,387x** | + +--- + +## When Async IS Appropriate + +### Good Use Cases + +1. **Web servers handling concurrent connections** - FastAPI/Starlette: 115-125k req/sec +2. **Concurrent network I/O** - Fetching data from multiple APIs simultaneously +3. **High-latency operations with parallelism** - `asyncio.gather()` for multiple slow API calls + +### Bad Use Cases + +1. **Wrapping synchronous database drivers** - Use native async drivers or stay sync +2. **CPU-bound computation** - Async doesn't parallelize CPU work (GIL) +3. **Simple scripts with sequential operations** - CLI tools, data processing pipelines + +--- + +## Practical Rules for Coding Agents + +### Rule 1: Default to Sync +Write synchronous code unless you have a specific, measurable need for async. + +### Rule 2: The 1ms Threshold +Only consider async when individual I/O operations take **>1 millisecond**. + +### Rule 3: Batch Over Broadcast +If you need async, gather operations together: + +```python +# Good: 27 us overhead ONCE +results = await asyncio.gather(*[fetch(url) for url in urls]) + +# Bad: 27 us overhead PER call +for url in urls: + result = await fetch(url) +``` + +### Rule 4: Stay in the Loop +Avoid `run_until_complete` inside an already-running loop. + +### Rule 5: Match Your I/O Library +Use async libraries for async code, sync libraries for sync code. + +--- + +## Summary Table + +| Scenario | Recommendation | Reasoning | +|----------|----------------|-----------| +| Simple function returning data | Sync | Async adds 1,400x overhead | +| In-memory operations | Sync | No I/O to wait on | +| Single database query | Sync | Query time < async amortization | +| Multiple independent API calls | Async + gather | Parallelism benefit outweighs overhead | +| Web server (many connections) | Async framework | Concurrent handling essential | +| CLI tool | Sync | Sequential operations, no benefit | + +--- + +*Benchmark source: python-numbers-everyone-should-know (2026-01-01, Python 3.14.2, Apple Silicon)* diff --git a/papers/collection-membership.md b/papers/collection-membership.md new file mode 100644 index 0000000..c3e39ac --- /dev/null +++ b/papers/collection-membership.md @@ -0,0 +1,109 @@ +# Collection Membership: The 200x Performance Cliff + +**Domain Paper: Python Collection Selection for Membership Testing** +**Date:** 2026-01-03 +**Source:** Python Numbers Everyone Should Know benchmarks + +--- + +## Executive Summary + +Membership testing (`x in collection`) is one of the most common operations in Python code. The choice of collection type can result in a **200x performance difference** at just 1,000 items. + +**Key Finding**: At 1,000 items, checking if an item exists in a list takes 3.9 microseconds. The same check in a set takes 19 nanoseconds. That is a 206x difference. + +--- + +## The Core Numbers + +### Membership Testing Performance + +| Operation | Time | Throughput | +|-----------|------|------------| +| `item in set` (existing) | 19.0 ns | 52.7M ops/sec | +| `key in dict` (existing) | 20.8 ns | 48.1M ops/sec | +| `item in list` (first) | 13.9 ns | 72.0M ops/sec | +| `item in list` (middle, 500th) | 1,956 ns | 511k ops/sec | +| `item in list` (last, 999th) | 3,852 ns | 260k ops/sec | +| `item in list` (missing) | 3,915 ns | 255k ops/sec | + +### The 200x Cliff Explained + +``` +Set membership (any position): ~19 ns O(1) +List membership (worst case): ~3,915 ns O(n) + ________ + 206x slower +``` + +--- + +## Crossover Analysis + +**Crossover point**: A list with ~15-20 items will match set performance for a full scan. Below that, lists may actually be faster due to lower overhead. + +### When to Use Each Type + +**Use Set when:** +- Collection has more than ~20 items +- Checking membership more than once +- Order does not matter + +**Use Dict when:** +- You need to associate values with keys +- Checking membership AND need to retrieve associated data + +**Use List when:** +- Collection is very small (< 20 items) +- You iterate but rarely check membership +- Items might not be hashable + +--- + +## Practical Rules for Coding Agents + +### Rule 1: Default to Set for Membership + +```python +# PREFER +allowed_values = {'a', 'b', 'c'} +if value in allowed_values: + +# AVOID +allowed_values = ['a', 'b', 'c'] +if value in allowed_values: +``` + +### Rule 2: Convert Lists Before Repeated Lookups + +```python +def process_items(items: list, valid_ids: list): + valid_set = set(valid_ids) # Convert once + return [item for item in items if item.id in valid_set] +``` + +### Rule 3: Prefer `dict.get()` Over Check-then-Access + +```python +# AVOID (double lookup) +if key in config: + value = config[key] + +# PREFER (single lookup) +value = config.get(key, default) +``` + +--- + +## Summary Table + +| Scenario | Best Choice | Why | +|----------|-------------|-----| +| Membership test on 1000+ items | Set | 200x faster than list | +| Key-value lookup | Dict | O(1) access with associated data | +| Ordered collection, rare membership | List | Lower memory, maintains order | +| Very small collection (< 20 items) | List or Set | Negligible difference | + +--- + +*Benchmark source: python-numbers-everyone-should-know* diff --git a/papers/database-patterns.md b/papers/database-patterns.md new file mode 100644 index 0000000..2a18849 --- /dev/null +++ b/papers/database-patterns.md @@ -0,0 +1,121 @@ +# Database Patterns: SQLite, DiskCache, and MongoDB + +**Domain:** Persistence and data access patterns in Python +**Source:** python-numbers-everyone-should-know benchmarks +**Date:** 2026-01-03 + +--- + +## Executive Summary + +**Reads are cheap, writes are expensive.** SQLite commits dominate write latency (192 microseconds with commit vs 3 microseconds without). For read-heavy workloads, SQLite achieves 280K ops/sec by primary key. For write-heavy workloads, consider diskcache (8x faster writes) or batch operations. + +--- + +## Key Findings + +### The Numbers + +| Operation | SQLite | DiskCache | MongoDB | +|-----------|--------|-----------|---------| +| **Write one object** | 192 us (5.2k/s) | 24 us (42k/s) | 119 us (8.4k/s) | +| **Read by key/id** | 3.6 us (280k/s) | 4.3 us (236k/s) | 121 us (8.2k/s) | + +### Finding 1: The Commit Tax + +SQLite writes with commit: **192 microseconds** +SQLite writes without commit: **3 microseconds** + +The commit operation accounts for **98.4% of write latency**. + +### Finding 2: DiskCache Wins for Simple Key-Value + +| Operation | SQLite Raw | DiskCache | +|-----------|------------|-----------| +| Write | 192 us | 24 us | +| Read | 3.6 us | 4.3 us | + +DiskCache achieves **8x faster writes** with comparable read performance. + +### Finding 3: Batching Provides 9x Throughput + +SQLite `executemany()` 10 rows: **215 microseconds total** (21.5 us/row) +10 individual inserts: **1,920 microseconds** (192 us/row) + +--- + +## When to Use Each Storage Option + +### SQLite +- Read-heavy workloads (100:1 read/write ratio) +- Need to query inside JSON with `json_extract()` +- ACID guarantees matter + +### DiskCache +- Key-value storage with automatic serialization +- Cache patterns (TTL, LRU eviction) +- Agent state persistence + +### MongoDB +- Distributed, multi-node deployments +- Complex aggregation pipelines +- Full-text search requirements + +--- + +## Practical Rules for Coding Agents + +### Rule 1: Default to DiskCache for Agent State +```python +from diskcache import Cache +cache = Cache('/tmp/agent-cache') +cache.set('conversation:123', messages) # 24 microseconds +``` + +### Rule 2: Batch SQLite Writes +```python +# GOOD: Batch with executemany +conn.executemany('INSERT INTO items (data) VALUES (?)', items_list) +conn.commit() # One commit for all items +``` + +### Rule 3: Use Transactions for Multi-Step Operations +```python +conn.execute('BEGIN') +conn.execute('INSERT INTO users ...') +conn.execute('INSERT INTO audit_log ...') +conn.commit() # One fsync +``` + +### Rule 4: Lazy Import for CLI Tools +```python +def save_to_db(data): + import sqlite3 # 1.63ms only when needed + conn = sqlite3.connect('app.db') +``` + +--- + +## Summary + +| Metric | SQLite | DiskCache | MongoDB | +|--------|--------|-----------|---------| +| Write latency | 192 us | 24 us | 119 us | +| Read latency | 3.6 us | 4.3 us | 121 us | +| Writes/sec | 5.2k | 42k | 8.4k | +| Reads/sec | 280k | 236k | 8.2k | + +--- + +## The Bottom Line + +1. **Reads are cheap everywhere** - Optimize for write patterns +2. **SQLite commits dominate latency** - Batch or use transactions +3. **DiskCache for key-value** - 8x faster writes, automatic serialization +4. **MongoDB for distribution** - Not for local performance + +*The tortoise way: Measure, understand the cost, choose deliberately.* + +--- + +*Benchmark source: python-numbers-everyone-should-know* diff --git a/papers/exception-flow.md b/papers/exception-flow.md new file mode 100644 index 0000000..2dd0678 --- /dev/null +++ b/papers/exception-flow.md @@ -0,0 +1,111 @@ +# Exception Flow: Performance Patterns + +**Domain:** Exception handling overhead +**Source:** python-numbers-everyone-should-know benchmarks (Python 3.14.2, Apple Silicon) + +--- + +## TL;DR + +- **try/except with no exception**: Nearly free (1.1 ns overhead) +- **Raising an exception**: 6.5x slower than the happy path (139 ns vs 21.5 ns) +- **EAFP is fine when exceptions are rare** (<5% of calls) +- **Use LBYL for expected failures** (dict key lookup, file existence) +- **Never use exceptions for normal control flow** + +--- + +## The Numbers + +### Happy Path (No Exception Raised) + +| Operation | Time | Overhead vs Baseline | +|-----------|------|---------------------| +| Function call (no try/except) | 20.4 ns | baseline | +| try/except (no exception raised) | 21.5 ns | +1.1 ns (+5%) | +| try/except ValueError (specific) | 22.9 ns | +2.5 ns (+12%) | +| try/except/finally | 22.1 ns | +1.7 ns (+8%) | + +**Key insight:** The try block itself is essentially free. + +### Sad Path (Exception Raised) + +| Operation | Time | Slowdown vs Happy Path | +|-----------|------|----------------------| +| raise + catch ValueError | 139 ns | **6.5x slower** | +| raise + catch (base Exception) | 140 ns | 6.5x slower | +| raise + catch custom exception | 146 ns | 6.8x slower | +| raise + catch with `as e` | 148 ns | 6.9x slower | + +**Key insight:** The 6.5x overhead comes from: +1. Creating the exception object (~40 ns) +2. Capturing the traceback (~70 ns) +3. Stack unwinding and handler lookup (~30 ns) + +--- + +## EAFP vs LBYL: When to Use Which + +### EAFP (Easier to Ask Forgiveness than Permission) + +```python +try: + value = data[key] +except KeyError: + value = default +``` + +**Use when:** Exceptions are rare (<5% of calls) + +### LBYL (Look Before You Leap) + +```python +if key in data: + value = data[key] +else: + value = default +``` + +**Use when:** The failure case is common (>15% of calls) + +### Crossover Point + +**Rule of thumb:** If exceptions occur more than 15% of the time, use LBYL. + +### dict.get() Beats Both + +```python +# Best: Use .get() - 26.3 ns, no exception possible +config = settings.get('database', {}) +``` + +--- + +## Practical Rules for Coding Agents + +1. **try/except blocks are free** - don't avoid them for performance +2. **Raising exceptions costs 6.5x** - only raise for truly exceptional cases +3. **Use .get() for dicts** - beats both EAFP and LBYL +4. **Return Optional for expected missing** - not exceptions +5. **EAFP for file ops** - TOCTOU protection matters more than perf +6. **LBYL when failures are common** (>15% of calls) +7. **Never use exceptions for control flow** + +--- + +## Summary + +| Scenario | Recommendation | +|----------|----------------| +| Exception rate <5% | EAFP (try/except) | +| Exception rate >15% | LBYL (check first) | +| Dict key lookup | Use `.get()` | +| Optional return value | Return `None`, not exception | +| File operations | EAFP (TOCTOU protection) | +| Control flow | Never use exceptions | + +**The core insight:** try/except is free; raising is not. Design APIs to minimize raises, not to avoid try blocks. + +--- + +*Benchmark source: python-numbers-everyone-should-know* diff --git a/papers/import-optimization.md b/papers/import-optimization.md new file mode 100644 index 0000000..e769478 --- /dev/null +++ b/papers/import-optimization.md @@ -0,0 +1,104 @@ +# Import Optimization + +**Domain Paper: Python Performance ADRs** +**Date:** 2026-01-03 +**Source:** python-numbers-everyone-should-know benchmarks + +--- + +## Executive Summary + +Import costs range from **sub-microsecond (cached) to 100+ milliseconds** (large frameworks). For CLI tools and short-lived scripts, import time can dominate total execution. + +--- + +## Benchmark Data (First Import, Fresh Process) + +### Built-in Modules + +| Module | First Import | +|--------|-------------| +| `sys` | 0.2 us | +| `os` | 0.2 us | +| `math` | 24 us | + +### Standard Library + +| Module | First Import | +|--------|-------------| +| `datetime` | 72 us | +| `typing` | 2.0 ms | +| `json` | 2.9 ms | +| `dataclasses` | 6.0 ms | +| `logging` | 10.5 ms | +| `asyncio` | 17.7 ms | + +### External Packages + +| Package | First Import | +|---------|-------------| +| `pydantic` | 15.8 ms | +| `flask` | 47.3 ms | +| `fastapi` | 104.4 ms | + +**Key insight:** FastAPI takes 100ms just to import. For a CLI tool that runs in 50ms, this is unacceptable overhead. + +--- + +## Lazy Import Patterns + +### Pattern 1: Function-Level Import +```python +def process_data(data): + import pandas as pd # Only when needed + return pd.DataFrame(data) +``` + +### Pattern 2: TYPE_CHECKING Guard +```python +from typing import TYPE_CHECKING + +if TYPE_CHECKING: + import pandas as pd + +def process(data: "pd.DataFrame"): + import pandas as pd + return pd.DataFrame(data) +``` + +--- + +## Practical Rules for Coding Agents + +### MUST + +1. **Use `TYPE_CHECKING` for type-only imports** when the type is from a heavy package +2. **Use function-level imports for rarely-used code paths** +3. **Never import heavy packages at module level in CLI tools** + +### SHOULD + +4. **Use `from __future__ import annotations`** for cleaner TYPE_CHECKING +5. **Profile import time for new dependencies:** + ```bash + python -c "import time; s=time.perf_counter(); import PACKAGE; print(f'{(time.perf_counter()-s)*1000:.1f}ms')" + ``` + +--- + +## Summary Table + +| Scenario | Pattern | Example | +|----------|---------|---------| +| Type hints for heavy types | `TYPE_CHECKING` | pandas, numpy types | +| Rarely-used function | Function-level import | Error handling paths | +| CLI fast path | Defer until needed | `--version`, `--help` | +| Serverless cold start | Minimize top-level | Lambda/Cloud Functions | + +--- + +*Import costs are hidden taxes. Pay them lazily.* + +--- + +*Benchmark source: python-numbers-everyone-should-know* diff --git a/papers/json-serialization.md b/papers/json-serialization.md new file mode 100644 index 0000000..8b81a7b --- /dev/null +++ b/papers/json-serialization.md @@ -0,0 +1,93 @@ +# JSON Serialization Performance in Python + +**Domain Paper: Python Performance ADRs** +**Date:** 2026-01-03 +**Source:** Python Numbers Everyone Should Know benchmarks + +--- + +## Executive Summary + +Alternative JSON libraries like `orjson` and `msgspec` deliver **8-12x faster serialization** and **2-7x faster deserialization** compared to stdlib `json`. The performance gap is consistent across payload sizes. + +--- + +## Key Findings + +### Serialization Performance (dumps) + +| Library | Simple Object | Complex Object | Speedup vs stdlib | +|---------|--------------|----------------|-------------------| +| `json.dumps()` | 708 ns | 2.65 us | 1x (baseline) | +| `orjson.dumps()` | 61 ns | 310 ns | **11.6x / 8.5x** | +| `msgspec.encode()` | 92 ns | 445 ns | 7.7x / 6.0x | +| `ujson.dumps()` | 264 ns | 1.64 us | 2.7x / 1.6x | + +### Deserialization Performance (loads) + +| Library | Simple Object | Complex Object | Speedup vs stdlib | +|---------|--------------|----------------|-------------------| +| `json.loads()` | 714 ns | 2.22 us | 1x (baseline) | +| `orjson.loads()` | 106 ns | 839 ns | **6.7x / 2.6x** | +| `msgspec.decode()` | 101 ns | 850 ns | 7.1x / 2.6x | + +--- + +## When to Use Each Library + +### Use stdlib `json` when: +- Zero dependencies required +- Need custom JSONEncoder subclass +- Compatibility is paramount + +### Use `orjson` when: +- Maximum performance needed +- You can accept bytes output +- You need datetime/UUID support + +### Use `msgspec` when: +- You need typed decoding +- You want MessagePack too +- Memory efficiency matters + +--- + +## Practical Rules for Coding Agents + +### Rule 1: Default to orjson for new projects + +```python +# Instead of: +import json +data = json.dumps(obj) + +# Prefer: +import orjson +data = orjson.dumps(obj) # Returns bytes +``` + +### Rule 2: Use stdlib json only when explicitly needed + +Acceptable reasons: +- Must avoid external dependencies +- Need custom JSONEncoder subclass +- Working in constrained environment + +### Rule 3: Profile before optimizing JSON + +At 2-3 microseconds per operation, JSON serialization is rarely the bottleneck unless you're doing thousands of operations per second. + +--- + +## Summary Table + +| Scenario | Recommendation | Expected Speedup | +|----------|----------------|------------------| +| General use | orjson | 8x serialization, 2.5x deserialization | +| Typed data | msgspec | 6x + type safety | +| Drop-in replacement | ujson | 1.5-2x | +| Zero dependencies | json (stdlib) | Baseline | + +--- + +*Benchmark source: python-numbers-everyone-should-know* diff --git a/papers/memory-slots.md b/papers/memory-slots.md new file mode 100644 index 0000000..c8c47a8 --- /dev/null +++ b/papers/memory-slots.md @@ -0,0 +1,102 @@ +# Memory Optimization with __slots__ in Python + +**Domain Paper: Python Performance ADRs** +**Date:** 2026-01-03 +**Source:** python-numbers-everyone-should-know benchmarks + +--- + +## Executive Summary + +Python's `__slots__` mechanism provides **52-70% memory reduction** when creating many instances of the same class. + +**Key Finding**: For a class with 5 attributes, `__slots__` reduces instance memory from 694 bytes to 212 bytes (69% reduction). + +--- + +## Benchmark Results: Memory Footprint + +### Single Instance Memory (5 Attributes) + +| Type | Memory (bytes) | vs Regular Class | +|------|----------------|------------------| +| Regular class | 694 | baseline | +| `__slots__` class | 212 | -69% | +| dataclass | 694 | same as regular | +| `@dataclass(slots=True)` | 212 | -69% | +| namedtuple | 228 | -67% | + +### At Scale (1,000 Instances) + +| Type | Total Memory | +|------|--------------| +| List of 1,000 regular class | 165.2 KB | +| List of 1,000 `__slots__` class | 79.1 KB | + +**Memory Savings**: 52% reduction at scale + +--- + +## Attribute Access Speed (Virtually Identical) + +| Operation | Regular | `__slots__` | +|-----------|---------|-------------| +| Read attr | 14.1 ns | 14.1 ns | +| Write attr | 15.7 ns | 16.4 ns | + +--- + +## Trade-offs + +### What __slots__ Prevents + +1. **No dynamic attribute assignment** +2. **No `__dict__` access** (`vars()` doesn't work) +3. **Inheritance complications** +4. **No weak references by default** + +--- + +## Practical Rules for Coding Agents + +### Rule 1: Instance Count Threshold +``` +IF creating > 100 instances of the same class +AND attributes are fixed at design time +THEN consider __slots__ or @dataclass(slots=True) +``` + +### Rule 2: Prefer Slots Dataclass (Python 3.10+) +```python +@dataclass(slots=True) +class User: + id: int + name: str + email: str +``` + +### Rule 3: Don't Optimize Prematurely +For < 100 instances, use regular classes for flexibility. + +### Rule 4: Document the Trade-off +```python +# Using __slots__ for memory efficiency (1000+ instances expected) +__slots__ = ['x', 'y', 'z'] +``` + +--- + +## Summary + +| Aspect | Regular Class | `__slots__` Class | +|--------|---------------|-------------------| +| Memory (5 attrs) | 694 bytes | 212 bytes | +| Read speed | 14.1 ns | 14.1 ns | +| Dynamic attributes | Yes | No | +| Best for | Flexibility | Many instances | + +**Bottom Line**: Use `__slots__` (or `@dataclass(slots=True)`) when creating many instances of fixed-attribute classes. For small numbers, stick with regular classes. + +--- + +*Benchmark source: python-numbers-everyone-should-know* diff --git a/papers/string-formatting.md b/papers/string-formatting.md new file mode 100644 index 0000000..03c1391 --- /dev/null +++ b/papers/string-formatting.md @@ -0,0 +1,111 @@ +# String Formatting: Domain Exploration + +**Date:** 2026-01-03 +**Source:** python-numbers-everyone-should-know benchmarks +**Python Version:** 3.14.2 (CPython, ARM64 macOS) + +--- + +## Executive Summary + +String formatting performance: **simple concatenation is fastest for trivial joins**, while **f-strings offer the best balance of readability and performance** for interpolation use cases. + +--- + +## Raw Benchmark Results + +| Operation | Time (ns) | Throughput | +|-----------|-----------|------------| +| `concat_small` | 39.1 ns | 25.6M ops/sec | +| `f_string` | 64.9 ns | 15.4M ops/sec | +| `percent_formatting` | 89.8 ns | 11.1M ops/sec | +| `format_method` | 103 ns | 9.7M ops/sec | + +### Relative Performance + +| Method | vs f-string | +|--------|-------------| +| `concat_small` | 1.66x faster | +| `f_string` | 1.00x (reference) | +| `percent_formatting` | 0.72x slower | +| `format_method` | 0.63x slower | + +--- + +## Why F-Strings Are Fast + +F-strings are parsed at **compile time**, not runtime: + +1. **No method lookup**: F-strings don't call `.format()` at runtime +2. **No tuple creation**: `%` formatting requires `(name,)` tuple +3. **Specialized bytecode**: `FORMAT_VALUE` and `BUILD_STRING` are optimized + +--- + +## When to Use Each Method + +### Concatenation Wins +For 2-3 literal strings with no formatting: +```python +path = base_dir + '/' + filename # Simpler, faster +``` + +### % Formatting for Logging +```python +# Deferred evaluation - string built only if debug enabled +logger.debug('Processing %s items', count) + +# f-string - string ALWAYS built, then discarded +logger.debug(f'Processing {count} items') # Wasteful +``` + +### .format() for Dynamic Templates +```python +template = get_template_from_config() # Returns 'User: {name}' +result = template.format(name=user.name) +``` + +--- + +## Practical Rules for Coding Agents + +### Rule 1: Default to F-Strings +```python +# Preferred +message = f'User {user.name} logged in at {timestamp}' +``` + +### Rule 2: Use Concatenation for Trivial Joins +```python +url = base_url + endpoint # Fine - simpler and faster +``` + +### Rule 3: Use join() for Multiple Parts +```python +# Correct - O(n) time +result = ''.join([part1, part2, part3, part4]) + +# Inefficient - O(n^2) time +result = part1 + part2 + part3 + part4 +``` + +### Rule 4: Keep % for Logging +```python +logger.info('Processed %d records in %.2fs', count, elapsed) +``` + +--- + +## Summary + +| Scenario | Best Choice | Reason | +|----------|-------------|--------| +| Variable interpolation | f-string | 1.6x faster than `.format()` | +| Simple 2-part join | Concatenation | 1.7x faster than f-string | +| Building from many parts | `''.join()` | O(n) vs O(n^2) | +| Logging statements | `%` style | Deferred evaluation | +| Dynamic templates | `.format()` | Template flexibility | + +--- + +*Benchmark source: python-numbers-everyone-should-know*