split nimmerworld architecture: phase 1 of 3 — vertical-by-domain directories Create 9 domain directories under nimmerworld/, each owning an architecture.md that will eventually sit alongside server/client/schema/test code for that domain (vertical-by-domain rather than horizontal-by-language).
This commit is contained in:
158
scale-and-transport/architecture.md
Normal file
158
scale-and-transport/architecture.md
Normal file
@@ -0,0 +1,158 @@
|
||||
# Scale and Transport
|
||||
|
||||
> *Running the architecture at MMO size: compute-allocation budgets across model-tiers, horizontal-scale primitives (UID-keyed routing, stateless Compositors, ephemeral Director-routines, sharded GMs, pruning at every layer), pgnats-native transport with the JetStream republish + replay refinement, district-distribution fallback.*
|
||||
>
|
||||
> *Companion to: `architecture-broad.md` (executive summary + global meta-lists), `narrative-composition/architecture.md` (Compositor is the load-bearing horizontal-scale actor), `inference-and-memory/architecture.md` (local-first memory architecture sits on this transport substrate). Sections in this file were split from the monolithic architecture-broad.md v0.7 on 2026-04-26.*
|
||||
|
||||
## Compute allocation
|
||||
|
||||
- Active zones in 100-NPC city: ~5–15 with LLM-dialog slots
|
||||
- **Theia-tier (deep model)** — deep slots + tier-1 moments (few concurrent)
|
||||
- **Driver-tier (small model with trait-LoRAs)** — majority of dialog slots
|
||||
- **Saturn (small classifiers)** — voice-selection, trait-salience, audit-overseer classification, ternary-gate dynamics
|
||||
- **Director / overseer logic** — deterministic + small classifiers; no LLM for orchestration
|
||||
- **Claude-as-API (future)** — hivemind/imperium broadcast tier
|
||||
- **Outer rails** — graph-pathfinding, cheap, LOD-trivial
|
||||
- **Pipe / off-shift NPCs** — sparse simulation, event-driven scale-up
|
||||
- **Interior navmesh** — only currently-occupied interiors active
|
||||
- **Liminal / imperial-net rendering** — shader-preset swap, no geometry duplication
|
||||
|
||||
## Horizontal scale architecture
|
||||
|
||||
The architecture must scale to MMO size — many concurrent players across many districts, many concurrent events, many local LLMs firing at axis-rate. Vertically-scaled monolithic AI-NPC systems break under this load. **Nimmerworld is built horizontally-scalable from the ground up**, with the primitives that make horizontal scale work: UID-keyed routing, stateless workers, ephemeral actors per scope, sharded service mesh, pruning-on-completion at every layer.
|
||||
|
||||
### UID-keyed routing as the load-bearing primitive
|
||||
|
||||
Hierarchical UIDs (`gm_event_uid > district_uid > scene_sub_uid > slot_id`) carry enough information for *any* worker to pick up *any* unit of work without shared in-memory state. UID-as-routing-key is what lets every layer scale independently.
|
||||
|
||||
This is the same primitive that Cassandra/DynamoDB/etc. built around (partition keys); we re-derive it for narrative composition because the underlying constraint is the same — many concurrent actors operating on private state with cross-actor coordination via typed events at known boundaries.
|
||||
|
||||
### Compositors on demand
|
||||
|
||||
Compositor instances are **stateless workers**. Any compositor can pick up any `event_uid` from the transient-waiting-flag — state lives in phoebe + per-player SQLites; workers are pure functions of `(event_uid → composition)`. Spin up more on queue-depth; spin down on drain. This is the autoscaling-worker pattern (Celery / Lambda / Sidekiq) applied to narrative.
|
||||
|
||||
### Directors as ephemeral routines per UID
|
||||
|
||||
Each event / event-chain spawns a **director-routine** scoped to that UID. The routine lives only as long as the event lives in the active-register; on completion it prunes. This is the actor model (Erlang / Akka / Orleans) — supervised lightweight processes, thousands concurrent, failure-recovery via supervisor respawn from register-state.
|
||||
|
||||
### Sharded GMs
|
||||
|
||||
A single GM is a scalability bottleneck. Multi-GM shards across:
|
||||
|
||||
- **Geography** — GM-per-region/continent/world-zone
|
||||
- **Theme** — political-narratives / economic-narratives / personal-narratives
|
||||
- **Tier** — major-arcs vs everyday-events
|
||||
- **Faction** — each major faction has its own GM-shard
|
||||
|
||||
GM-shards share the catalogue, the trait-axis vocabulary, the verifier-flag system. They communicate via NATS pub/sub at GM-tier. *Equilibrium-seeking becomes a distributed consensus across GM-shards* — the same epistemological shift that distributed databases went through (CAP theorem, eventual consistency, partition tolerance).
|
||||
|
||||
This is the most architecturally aggressive move and is *not* a free lunch. Multi-GM consensus on equilibrium is real engineering: Paxos/Raft for strong consistency; CRDT-style for eventual; faction-sharding so shards own different equilibria. Well-trodden ground; cost not zero.
|
||||
|
||||
### Pruning as cleanup
|
||||
|
||||
Each layer prunes on completion: directors prune at event-end, register entries prune on Compositor pickup, transient-waiting-flag drains at cycle, even player memory prunes by class. **Garbage collection is a first-class structural concern**, not an afterthought. Most "smart NPC" prototypes accumulate state forever; at MMO scale that becomes unrunnable in months.
|
||||
|
||||
### Transport — pgnats native, with district-distribution fallback
|
||||
|
||||
The Compositor's forward-prop and back-write traffic — and the canon distribution to participants — is the highest-volume path in the system. Two transport options exist.
|
||||
|
||||
**Option A — pgnats native serialization (preferred):**
|
||||
|
||||
```
|
||||
COMPOSITOR (writes SQL into phoebe)
|
||||
│ pgnats: SQL INSERT/UPDATE → NATS subject publish, automatic, transactional
|
||||
│ subject hierarchy mirrors UID hierarchy:
|
||||
│ nimmerverse.events.<gm_event_uid>.district.<district_uid>.scene.<scene_uid>
|
||||
▼
|
||||
NATS JetStream (durable, replay-capable, at-least-once)
|
||||
│ subscribers route by subject pattern (wildcards supported)
|
||||
▼
|
||||
PLAYER CLIENT (NATS subscriber)
|
||||
│ receives row-shaped messages, INSERT into local SQLite under matching event_uid
|
||||
```
|
||||
|
||||
**Why preferred:** transactional-outbox pattern *native to the database* — no separate publisher service, no schema-duplication between wire format and storage format, single source of truth. Subject-as-routing using UID hierarchy means players who participated in `gm_event` 7af2 → `district` 9c1 → `scene` 3e8 subscribe to `nimmerverse.events.7af2.>` and receive *only* relevant events. *No application-level routing logic.*
|
||||
|
||||
**Option B — district-as-distribution-coordinator (fallback):**
|
||||
|
||||
```
|
||||
COMPOSITOR writes to phoebe (canon authored)
|
||||
▼
|
||||
DISTRICT GAMESERVER pulls / receives back-write package
|
||||
│ runs distribution checks: who participated, who's online, who needs queue-on-login
|
||||
│ retries; peer-shares with neighboring districts if needed
|
||||
▼
|
||||
PARTICIPANTS receive from their district (the authoritative local hub)
|
||||
```
|
||||
|
||||
**Why fallback:** if pgnats can't carry the load (functional bug, scale ceiling, durability gap), district-as-distribution is the natural retreat — district is authoritative within its scope, simpler than full P2P, more flexible than central-push.
|
||||
|
||||
**The asymmetry of the bet:**
|
||||
|
||||
| | pgnats works | pgnats fails |
|
||||
|---|---|---|
|
||||
| Code volume | Few hundred lines of SQL + subject patterns | ~10–20K lines of broker/outbox/subscriber logic |
|
||||
| Services to operate | phoebe + NATS (already in stack) | + outbox-reader + custom-publisher + custom-applier |
|
||||
| Schema management | Single source (Postgres DDL) | Postgres + Protobuf/msgpack + version-skew handling |
|
||||
| Latency to client | Sub-millisecond NATS hop + apply | + serialize step + queue drain + apply |
|
||||
| Time to ship | Weeks | Months |
|
||||
|
||||
**This is the most leveraged engineering decision in the architecture currently open.** The pgnats evaluation task in `nimmerverse_tasks` (under `nimmerverse-core`) is therefore load-bearing; its outcome decides whether transport is 200 lines of SQL or 20,000 lines of Go.
|
||||
|
||||
### NATS republish + replay — the pull-from-checkpoint refinement
|
||||
|
||||
JetStream's `republish` feature combined with `replay` semantics is a **promising refinement on top of Option A** that turns back-write delivery from push-to-listeners into pull-from-checkpoint. Worth nailing down concretely in the pgnats evaluation; if it carries our delivery patterns under load, it is a significant performance and complexity multiplier.
|
||||
|
||||
**How it works:**
|
||||
|
||||
- Compositor publishes canon ONCE to the event-keyed subject (e.g., `nimmerverse.events.<gm_event_uid>.scene.<sub_uid>`)
|
||||
- A `republish` rule on the JetStream stream fans the message out to derived subjects automatically (per-participant, per-faction, per-district — any derived stream-shape we configure)
|
||||
- Each derived subject has durable JetStream consumers per recipient; recipients replay from their own checkpoint (last-delivered-sequence) when they reconnect
|
||||
|
||||
**Why it's faster and simpler:**
|
||||
|
||||
- **Compositor never tracks recipients.** Publishes once; NATS republishes to N derived subjects per the configured rule. No application-level fan-out logic; no "for each participant, if online deliver else queue" code-path.
|
||||
- **Player connectivity is decoupled from delivery.** Offline players don't slow down the publish path; their queue accumulates in JetStream. On reconnect, they replay from checkpoint.
|
||||
- **Late joiners catch up for free.** New consumer subscribes with replay-from-sequence; gets full history relevant to their subject. No special reconcile code-path.
|
||||
- **Backpressure is built in.** Slow consumer's queue grows in JetStream — doesn't affect publishers or other consumers. Flow-control, ack-policy, max-bytes/max-msgs limits all native to JetStream.
|
||||
- **Re-keying is a config change.** Add a new republish rule for a new derived stream-shape (e.g., per-faction canon-feed); no application code changes for the new consumer-tier.
|
||||
- **Replay = audit-trail for free.** Replay from sequence 0 reconstructs entire canon history. Disaster recovery, debugging, time-travel queries are all free side-effects.
|
||||
|
||||
**Architectural effect:**
|
||||
|
||||
```
|
||||
Compositor → publishes canon to event_uid subject (ONCE)
|
||||
│
|
||||
▼ (JetStream republish-rule fan-out — config, not code)
|
||||
Per-player subjects: nimmerverse.player.<id>.canon
|
||||
Per-faction subjects: nimmerverse.faction.<name>.canon
|
||||
Per-district subjects: nimmerverse.district.<id>.canon
|
||||
│
|
||||
▼
|
||||
Each consumer replays from its own checkpoint when ready
|
||||
```
|
||||
|
||||
Compositor's mental model collapses to *"I publish to the canonical event-stream and forget"*; recipient-mental-model collapses to *"I replay my own consumer-subject from my last sequence"*. **The delivery problem disappears into NATS.**
|
||||
|
||||
**Specific evaluation criteria** (to add to the pgnats evaluation task):
|
||||
|
||||
- Republish rule expressiveness — can we route by UID hierarchy (`events.<gm_uid>.>` → `player.<participant_id>.canon`)?
|
||||
- Replay performance — what's the cost of a consumer replaying N hours of missed canon on reconnect?
|
||||
- Durability under broker failure — does the republish-derived subject survive primary-broker loss?
|
||||
- Schema-evolution behavior — can we add new derived subjects without disrupting existing consumers?
|
||||
- Cost at scale — disk, memory, file-handles for many durable consumers (one-per-active-player)?
|
||||
|
||||
If `republish + replay` carries the load, the back-write transport is *substantially* simpler than the original Option A description and a much harder competitor to beat with Option B.
|
||||
|
||||
### What this retires
|
||||
|
||||
- Vertically-scaled monolithic AI-NPC backend → horizontally-scaled stateless-workers + ephemeral-actors + sharded-services
|
||||
- Centralized in-memory event-state → UID-keyed registers + transient-waiting-flag buffer
|
||||
- Single global GM as bottleneck → sharded GMs with cross-shard equilibrium-consensus
|
||||
- Manual outbox-reader services → pgnats native serialization (preferred); district-distribution fallback if pgnats can't carry
|
||||
- "AI-NPC scale" framed as inference budget alone → AI-NPC scale framed as transport + state + composition + sharding, with inference as one of many concurrently-scaling axes
|
||||
|
||||
|
||||
---
|
||||
|
||||
**Version:** 0.7.0 | **Created:** 2026-04-26 | **Updated:** 2026-04-26 | **Origin:** Split from architecture-broad.md v0.7 (2026-04-26)
|
||||
Reference in New Issue
Block a user