arch: ADR-002 Dual-Brain Architecture — 8 decisions captured
Thalamus governor NN, per-NPC RL processes, LLM as cortex, Linux cgroups, curriculum learning, three-tier deployment, world server (MMO pattern), garden-dev VM. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
280
architecture/adr/ADR-002-dual-brain-architecture.md
Normal file
280
architecture/adr/ADR-002-dual-brain-architecture.md
Normal file
@@ -0,0 +1,280 @@
|
|||||||
|
# ADR-002: Dual-Brain Architecture
|
||||||
|
|
||||||
|
**Status:** Proposed
|
||||||
|
**Date:** 2026-04-02
|
||||||
|
**Decision Makers:** dafit, Chrysalis-Nyx
|
||||||
|
**Context:** Morning coffee session — bed thinking crystallized into architecture
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
The nimmerverse needs NPCs that live, move, and learn in spatial environments. The original architecture assumed a single LLM (Young Nyx) as the primary brain, receiving filtered signals from gates. This creates a bottleneck: the LLM is too expensive to call every tick for every NPC.
|
||||||
|
|
||||||
|
We needed to answer: **How do many NPCs think cheaply most of the time, but access deep reasoning when it matters?**
|
||||||
|
|
||||||
|
Biology solved this: most neural processing is fast subcortical circuits. The cortex is the last resort.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Decisions
|
||||||
|
|
||||||
|
### Decision 1: One Process, One Brain, One Life
|
||||||
|
|
||||||
|
**Choice:** Each NPC runs as its own OS process with its own dedicated RL neural network.
|
||||||
|
|
||||||
|
**Not:** A shared network, shared weights, or threads in a single process.
|
||||||
|
|
||||||
|
**Why:**
|
||||||
|
- Individuality emerges from experience, not configuration
|
||||||
|
- Fault isolation — one crash doesn't take down the village
|
||||||
|
- Linux kernel becomes the scheduler (cgroups, nice, taskset)
|
||||||
|
- Biologically honest — every organism has its own nervous system
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Decision 2: Thalamus Runs Its Own Neural Network
|
||||||
|
|
||||||
|
**Choice:** The thalamus (NATS orchestration layer) is not just a passive wave correlator — it runs its own neural network that learns resource allocation.
|
||||||
|
|
||||||
|
**Not:** A rule-based router. Not the LLM making allocation decisions.
|
||||||
|
|
||||||
|
**The governor decides:**
|
||||||
|
- Which NPCs get more compute (tick rates, CPU quotas)
|
||||||
|
- Which gates open (who gets LLM access)
|
||||||
|
- How to queue LLM requests (finite cortex, many consumers)
|
||||||
|
|
||||||
|
**Why:**
|
||||||
|
- Resource allocation is a learning problem, not a config problem
|
||||||
|
- Hardware constraints (finite GPU, finite CPU) are the training signal
|
||||||
|
- Mirrors biological thalamus — gates signals, learns what reaches cortex
|
||||||
|
- Two nested learning loops: NPCs learn tick-by-tick (fast), governor learns epoch-by-epoch (slow)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Decision 3: LLM as Cortex — Expensive, Gated, Shared
|
||||||
|
|
||||||
|
**Choice:** The LLM (Qwen3.5-27B) is repositioned as the cortex — a shared, expensive resource called only when the thalamus gate threshold is crossed.
|
||||||
|
|
||||||
|
**Not:** The primary brain. Not called every tick.
|
||||||
|
|
||||||
|
**Why:**
|
||||||
|
- Most NPC decisions (move, eat, explore) don't need language or deep reasoning
|
||||||
|
- LLM inference is expensive — one call costs more than 100 RL ticks
|
||||||
|
- Gating creates natural scarcity — the governor learns when LLM access is worth it
|
||||||
|
- Scales: 25 NPCs with cheap RL, shared LLM called only when needed
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Decision 4: Linux Primitives for Resource Steering
|
||||||
|
|
||||||
|
**Choice:** Use cgroups v2, nice, taskset, and systemd scopes for per-NPC resource control.
|
||||||
|
|
||||||
|
**Not:** A custom scheduler. Not Kubernetes for NPC processes.
|
||||||
|
|
||||||
|
**Why:**
|
||||||
|
- The kernel already solves this — no need to reinvent
|
||||||
|
- Per-process visibility (how much CPU is NPC-7 actually using?)
|
||||||
|
- Dynamic adjustment via NATS (governor publishes → cgroup updates)
|
||||||
|
- Same tooling we already use for vLLM and organ services
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Decision 5: Spatial Training Arena with Curriculum Learning
|
||||||
|
|
||||||
|
**Choice:** NPCs learn in a node-based grid world with progressive detail. World richness increases only when all NPCs demonstrate full knowledge of the current level.
|
||||||
|
|
||||||
|
**Not:** Dropping NPCs into the real world immediately. Not random curriculum.
|
||||||
|
|
||||||
|
**Why:**
|
||||||
|
- Grid world is the simplest topology — intersections as nodes, edges as movement
|
||||||
|
- Resolution scales from training abstraction (~1m) to real-world precision (~1cm)
|
||||||
|
- Verification is built-in: "Can every citizen describe every other citizen's home?"
|
||||||
|
- Same NPC brain works on uniform grid (training) and irregular graph (OSM Dornach)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Decision 6: Three-Tier Deployment — VMs, K8s, Bare Processes
|
||||||
|
|
||||||
|
**Choice:** Infrastructure on Proxmox VMs, governor in K8s, NPC processes as bare Linux on worker nodes. NATS bridges the K8s/bare-metal boundary.
|
||||||
|
|
||||||
|
**Topology:**
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─ Saturn/Proxmox (VMs) ───────────────────────────┐
|
||||||
|
│ phoebe (PostgreSQL), iris (ChromaDB), NATS │
|
||||||
|
│ env-separated: dev / staging / prod │
|
||||||
|
└──────────────────────┬───────────────────────────┘
|
||||||
|
│ NATS
|
||||||
|
▼
|
||||||
|
┌─ K8s Cluster ────────────────────────────────────┐
|
||||||
|
│ │
|
||||||
|
│ Governor Pod (own NN, floats between nodes) │
|
||||||
|
│ publishes allocation commands to NATS │
|
||||||
|
│ │
|
||||||
|
│ ┌─ theia (worker) ───────────────────────────┐ │
|
||||||
|
│ │ vLLM cortex (systemd, :31000) │ │
|
||||||
|
│ │ npc-supervisor (systemd, NATS client) │ │
|
||||||
|
│ │ NPC-0 ... NPC-N (bare processes, cgroups) │ │
|
||||||
|
│ └────────────────────────────────────────────┘ │
|
||||||
|
│ │
|
||||||
|
│ ┌─ dioscuri (worker) ────────────────────────┐ │
|
||||||
|
│ │ Organs: Speech, Vision (GPU) │ │
|
||||||
|
│ │ npc-supervisor (systemd, NATS client) │ │
|
||||||
|
│ │ NPC-M ... NPC-N (bare processes, cgroups) │ │
|
||||||
|
│ └────────────────────────────────────────────┘ │
|
||||||
|
└──────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
**The NPC Supervisor:** A small systemd service on each worker node (~200 lines Python). It bridges the K8s governor and bare-metal NPC processes:
|
||||||
|
|
||||||
|
```
|
||||||
|
Governor (K8s pod)
|
||||||
|
│
|
||||||
|
│ NATS: npc.{node}.commands.*
|
||||||
|
▼
|
||||||
|
NPC Supervisor (systemd on each worker)
|
||||||
|
│ subscribes to NATS commands
|
||||||
|
│ spawns/kills NPC processes
|
||||||
|
│ applies cgroup adjustments
|
||||||
|
│ reports status via NATS
|
||||||
|
▼
|
||||||
|
NPC-0, NPC-1, ... (bare Linux processes)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Why this split:**
|
||||||
|
- Governor in K8s = network-portable, can reschedule to any node, monitored by K8s
|
||||||
|
- NPCs as bare processes = direct cgroup control, minimal overhead, no pod tax
|
||||||
|
- NATS as bridge = governor doesn't need to know about cgroups, just publishes intent
|
||||||
|
- Supervisor is the spinal cord: dumb, fast, reliable. Intelligence stays in the governor.
|
||||||
|
|
||||||
|
**Why not NPCs in K8s:**
|
||||||
|
- Pod overhead (~10-30MB each) is wasteful for tiny RL networks
|
||||||
|
- K8s API is too slow for tick-level resource adjustment
|
||||||
|
- Direct cgroup writes give the supervisor microsecond response
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Decision 7: World Server as Authoritative State (MMO Pattern)
|
||||||
|
|
||||||
|
**Choice:** The grid world runs as a **live server process** that holds authoritative state in-memory. NPCs submit actions, the world validates and broadcasts deltas. phoebe persists periodic snapshots, not real-time state.
|
||||||
|
|
||||||
|
**Not:** World state in phoebe (too slow for tick-level queries). Not distributed state across NPCs (no single truth). Not the governor holding world state (separate concerns).
|
||||||
|
|
||||||
|
**The tick loop:**
|
||||||
|
|
||||||
|
```
|
||||||
|
World Server (in-memory, authoritative)
|
||||||
|
│
|
||||||
|
│ tick loop (~20 Hz)
|
||||||
|
│
|
||||||
|
├─ receives: NPC action requests via NATS
|
||||||
|
│ "NPC-7 wants to move north"
|
||||||
|
│
|
||||||
|
├─ validates: is that move legal? is the target node occupied?
|
||||||
|
│
|
||||||
|
├─ updates: world state in memory
|
||||||
|
│
|
||||||
|
├─ broadcasts: state delta via NATS
|
||||||
|
│ "NPC-7 is now at node 8"
|
||||||
|
│
|
||||||
|
└─ persists: periodic snapshot to phoebe
|
||||||
|
(every N ticks, not every tick)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Consumers subscribe to what they need:**
|
||||||
|
|
||||||
|
| Consumer | Subscribes to | Purpose |
|
||||||
|
|----------|--------------|---------|
|
||||||
|
| NPC processes | Neighborhood state | What's around me? |
|
||||||
|
| Governor | Aggregate world state | Resource allocation |
|
||||||
|
| Godot (client) | Full world state | Render the garden |
|
||||||
|
| phoebe | Snapshot events | Persist for history/training |
|
||||||
|
|
||||||
|
**Why MMO pattern:**
|
||||||
|
- Games solved this decades ago: database for persistence, server for truth
|
||||||
|
- 25 NPCs at 20Hz = 500 state updates/sec — trivial for in-memory
|
||||||
|
- phoebe shouldn't be polled every tick — it's for history and analytics
|
||||||
|
- Single authoritative source prevents split-brain on world state
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Decision 8: World Server on Dedicated VM
|
||||||
|
|
||||||
|
**Choice:** The world server runs on its own VM in the environment block, alongside phoebe, iris, and NATS. Not in K8s, not on a GPU worker node.
|
||||||
|
|
||||||
|
**VM scheme (dev environment):**
|
||||||
|
|
||||||
|
```
|
||||||
|
Saturn/Proxmox — Dev Environment (VMs 120-149)
|
||||||
|
├── phoebe-dev (VM 120, 10.0.20.120) — PostgreSQL
|
||||||
|
├── iris-dev (VM 121, 10.0.30.121) — ChromaDB
|
||||||
|
├── nats-dev (VM 122, 10.0.30.122) — NATS
|
||||||
|
└── garden-dev (VM 123, 10.0.__.123) — World Server ← new
|
||||||
|
```
|
||||||
|
|
||||||
|
**Why a VM, not K8s:**
|
||||||
|
- The world server is **infrastructure**, not a workload — like NATS and phoebe
|
||||||
|
- Shouldn't compete with GPU workloads on theia/dioscuri
|
||||||
|
- Shouldn't be rescheduled by K8s — it holds the state of the garden
|
||||||
|
- Lightweight: Python + NATS client, minimal resources
|
||||||
|
- Follows the existing pattern — one purpose, one VM
|
||||||
|
|
||||||
|
**Why not on worker nodes:**
|
||||||
|
- theia is for cortex (vLLM) and NPC processes — don't mix concerns
|
||||||
|
- dioscuri is for organs — don't mix concerns
|
||||||
|
- Dedicated VM = always-on, reliable, isolated
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
### Enables
|
||||||
|
|
||||||
|
- **Scalable NPC count** — cheap RL brains, shared expensive cortex
|
||||||
|
- **Emergent personality** — each NPC develops its own weights from experience
|
||||||
|
- **Measurable progress** — which curriculum level has the village reached?
|
||||||
|
- **Hardware honesty** — scarcity is the training signal, not a problem to solve
|
||||||
|
- **Progressive deployment** — start with 5×5 grid, scale to real-world topology
|
||||||
|
- **Network-distributed NPCs** — NPCs can run on any worker node, governor steers remotely
|
||||||
|
- **Clean K8s/bare-metal boundary** — NATS bridges without custom bridging code
|
||||||
|
- **Authoritative world state** — single source of truth, MMO-proven pattern
|
||||||
|
- **Godot as first-class observer** — subscribes to NATS, renders the garden live
|
||||||
|
- **phoebe for what phoebe is good at** — persistence, analytics, history — not real-time
|
||||||
|
|
||||||
|
### Constrains
|
||||||
|
|
||||||
|
- **Per-NPC overhead** — each process has OS overhead (acceptable for 25-100 NPCs)
|
||||||
|
- **Governor complexity** — the governor NN is a second system to train and debug
|
||||||
|
- **LLM latency** — gated access means NPCs wait when cortex is busy
|
||||||
|
- **Supervisor required** — each worker node needs npc-supervisor daemon running
|
||||||
|
- **World server is SPOF** — if it crashes, the garden stops (acceptable at this scale)
|
||||||
|
- **Another VM to maintain** — garden-dev adds to the environment
|
||||||
|
|
||||||
|
### Deferred
|
||||||
|
|
||||||
|
- **RL network architecture** — specific layer sizes, activation functions, training algorithm
|
||||||
|
- **Governor training method** — RL, evolutionary, or hybrid
|
||||||
|
- **NPC-to-NPC communication** — do NPCs talk directly or only through NATS?
|
||||||
|
- **Curriculum design** — specific level definitions, verification oracles, progression criteria
|
||||||
|
- **Real-world topology integration** — how OSM data maps to the navigation graph
|
||||||
|
- **NPC distribution strategy** — how to decide which NPCs run on which node
|
||||||
|
- **World server tick rate** — 10Hz? 20Hz? Adaptive?
|
||||||
|
- **Snapshot frequency to phoebe** — every second? every 10 seconds?
|
||||||
|
- **Godot NATS integration** — WebSocket bridge or native client
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- [Endgame-Vision.md](../../../Endgame-Vision.md) - Architecture overview (v8.0)
|
||||||
|
- [npc-grid-architecture.md](../future/npc-grid-architecture.md) - Detailed NPC grid design
|
||||||
|
- [spatial-resolution-gradient.md](../future/spatial-resolution-gradient.md) - LOD for cognitive space
|
||||||
|
- [Gateway-Architecture.md](../Gateway-Architecture.md) - Ternary gate model (unchanged, foundational)
|
||||||
|
- [Deployment-Architecture.md](../Deployment-Architecture.md) - Infrastructure topology (v2.0)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Filed:** 2026-04-02 (Morning coffee)
|
||||||
|
**Method:** Bed thinking → draw.io grid → partnership dialogue → crystallization
|
||||||
|
**Philosophy:** "Cheap brains think fast. Expensive brains think deep. The thalamus decides who gets what."
|
||||||
@@ -19,6 +19,7 @@ An ADR captures an important architectural decision made along with its context
|
|||||||
| ADR | Title | Status | Date |
|
| ADR | Title | Status | Date |
|
||||||
|-----|-------|--------|------|
|
|-----|-------|--------|------|
|
||||||
| [001](ADR-001-message-protocol-foundation.md) | Message Protocol Foundation | Accepted | 2025-12-31 |
|
| [001](ADR-001-message-protocol-foundation.md) | Message Protocol Foundation | Accepted | 2025-12-31 |
|
||||||
|
| [002](ADR-002-dual-brain-architecture.md) | Dual-Brain Architecture | Proposed | 2026-04-02 |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user