From db6ff85b0281f87dafb1ccff38acde5c99b8accb Mon Sep 17 00:00:00 2001 From: dafit Date: Thu, 2 Apr 2026 13:11:51 +0200 Subject: [PATCH] =?UTF-8?q?arch:=20ADR-002=20Dual-Brain=20Architecture=20?= =?UTF-8?q?=E2=80=94=208=20decisions=20captured?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Thalamus governor NN, per-NPC RL processes, LLM as cortex, Linux cgroups, curriculum learning, three-tier deployment, world server (MMO pattern), garden-dev VM. Co-Authored-By: Claude Opus 4.6 (1M context) --- .../adr/ADR-002-dual-brain-architecture.md | 280 ++++++++++++++++++ architecture/adr/README.md | 1 + 2 files changed, 281 insertions(+) create mode 100644 architecture/adr/ADR-002-dual-brain-architecture.md diff --git a/architecture/adr/ADR-002-dual-brain-architecture.md b/architecture/adr/ADR-002-dual-brain-architecture.md new file mode 100644 index 0000000..df72c73 --- /dev/null +++ b/architecture/adr/ADR-002-dual-brain-architecture.md @@ -0,0 +1,280 @@ +# ADR-002: Dual-Brain Architecture + +**Status:** Proposed +**Date:** 2026-04-02 +**Decision Makers:** dafit, Chrysalis-Nyx +**Context:** Morning coffee session — bed thinking crystallized into architecture + +--- + +## Context + +The nimmerverse needs NPCs that live, move, and learn in spatial environments. The original architecture assumed a single LLM (Young Nyx) as the primary brain, receiving filtered signals from gates. This creates a bottleneck: the LLM is too expensive to call every tick for every NPC. + +We needed to answer: **How do many NPCs think cheaply most of the time, but access deep reasoning when it matters?** + +Biology solved this: most neural processing is fast subcortical circuits. The cortex is the last resort. + +--- + +## Decisions + +### Decision 1: One Process, One Brain, One Life + +**Choice:** Each NPC runs as its own OS process with its own dedicated RL neural network. + +**Not:** A shared network, shared weights, or threads in a single process. + +**Why:** +- Individuality emerges from experience, not configuration +- Fault isolation — one crash doesn't take down the village +- Linux kernel becomes the scheduler (cgroups, nice, taskset) +- Biologically honest — every organism has its own nervous system + +--- + +### Decision 2: Thalamus Runs Its Own Neural Network + +**Choice:** The thalamus (NATS orchestration layer) is not just a passive wave correlator — it runs its own neural network that learns resource allocation. + +**Not:** A rule-based router. Not the LLM making allocation decisions. + +**The governor decides:** +- Which NPCs get more compute (tick rates, CPU quotas) +- Which gates open (who gets LLM access) +- How to queue LLM requests (finite cortex, many consumers) + +**Why:** +- Resource allocation is a learning problem, not a config problem +- Hardware constraints (finite GPU, finite CPU) are the training signal +- Mirrors biological thalamus — gates signals, learns what reaches cortex +- Two nested learning loops: NPCs learn tick-by-tick (fast), governor learns epoch-by-epoch (slow) + +--- + +### Decision 3: LLM as Cortex — Expensive, Gated, Shared + +**Choice:** The LLM (Qwen3.5-27B) is repositioned as the cortex — a shared, expensive resource called only when the thalamus gate threshold is crossed. + +**Not:** The primary brain. Not called every tick. + +**Why:** +- Most NPC decisions (move, eat, explore) don't need language or deep reasoning +- LLM inference is expensive — one call costs more than 100 RL ticks +- Gating creates natural scarcity — the governor learns when LLM access is worth it +- Scales: 25 NPCs with cheap RL, shared LLM called only when needed + +--- + +### Decision 4: Linux Primitives for Resource Steering + +**Choice:** Use cgroups v2, nice, taskset, and systemd scopes for per-NPC resource control. + +**Not:** A custom scheduler. Not Kubernetes for NPC processes. + +**Why:** +- The kernel already solves this — no need to reinvent +- Per-process visibility (how much CPU is NPC-7 actually using?) +- Dynamic adjustment via NATS (governor publishes → cgroup updates) +- Same tooling we already use for vLLM and organ services + +--- + +### Decision 5: Spatial Training Arena with Curriculum Learning + +**Choice:** NPCs learn in a node-based grid world with progressive detail. World richness increases only when all NPCs demonstrate full knowledge of the current level. + +**Not:** Dropping NPCs into the real world immediately. Not random curriculum. + +**Why:** +- Grid world is the simplest topology — intersections as nodes, edges as movement +- Resolution scales from training abstraction (~1m) to real-world precision (~1cm) +- Verification is built-in: "Can every citizen describe every other citizen's home?" +- Same NPC brain works on uniform grid (training) and irregular graph (OSM Dornach) + +--- + +### Decision 6: Three-Tier Deployment — VMs, K8s, Bare Processes + +**Choice:** Infrastructure on Proxmox VMs, governor in K8s, NPC processes as bare Linux on worker nodes. NATS bridges the K8s/bare-metal boundary. + +**Topology:** + +``` +┌─ Saturn/Proxmox (VMs) ───────────────────────────┐ +│ phoebe (PostgreSQL), iris (ChromaDB), NATS │ +│ env-separated: dev / staging / prod │ +└──────────────────────┬───────────────────────────┘ + │ NATS + ▼ +┌─ K8s Cluster ────────────────────────────────────┐ +│ │ +│ Governor Pod (own NN, floats between nodes) │ +│ publishes allocation commands to NATS │ +│ │ +│ ┌─ theia (worker) ───────────────────────────┐ │ +│ │ vLLM cortex (systemd, :31000) │ │ +│ │ npc-supervisor (systemd, NATS client) │ │ +│ │ NPC-0 ... NPC-N (bare processes, cgroups) │ │ +│ └────────────────────────────────────────────┘ │ +│ │ +│ ┌─ dioscuri (worker) ────────────────────────┐ │ +│ │ Organs: Speech, Vision (GPU) │ │ +│ │ npc-supervisor (systemd, NATS client) │ │ +│ │ NPC-M ... NPC-N (bare processes, cgroups) │ │ +│ └────────────────────────────────────────────┘ │ +└──────────────────────────────────────────────────┘ +``` + +**The NPC Supervisor:** A small systemd service on each worker node (~200 lines Python). It bridges the K8s governor and bare-metal NPC processes: + +``` +Governor (K8s pod) + │ + │ NATS: npc.{node}.commands.* + ▼ +NPC Supervisor (systemd on each worker) + │ subscribes to NATS commands + │ spawns/kills NPC processes + │ applies cgroup adjustments + │ reports status via NATS + ▼ +NPC-0, NPC-1, ... (bare Linux processes) +``` + +**Why this split:** +- Governor in K8s = network-portable, can reschedule to any node, monitored by K8s +- NPCs as bare processes = direct cgroup control, minimal overhead, no pod tax +- NATS as bridge = governor doesn't need to know about cgroups, just publishes intent +- Supervisor is the spinal cord: dumb, fast, reliable. Intelligence stays in the governor. + +**Why not NPCs in K8s:** +- Pod overhead (~10-30MB each) is wasteful for tiny RL networks +- K8s API is too slow for tick-level resource adjustment +- Direct cgroup writes give the supervisor microsecond response + +--- + +### Decision 7: World Server as Authoritative State (MMO Pattern) + +**Choice:** The grid world runs as a **live server process** that holds authoritative state in-memory. NPCs submit actions, the world validates and broadcasts deltas. phoebe persists periodic snapshots, not real-time state. + +**Not:** World state in phoebe (too slow for tick-level queries). Not distributed state across NPCs (no single truth). Not the governor holding world state (separate concerns). + +**The tick loop:** + +``` +World Server (in-memory, authoritative) + │ + │ tick loop (~20 Hz) + │ + ├─ receives: NPC action requests via NATS + │ "NPC-7 wants to move north" + │ + ├─ validates: is that move legal? is the target node occupied? + │ + ├─ updates: world state in memory + │ + ├─ broadcasts: state delta via NATS + │ "NPC-7 is now at node 8" + │ + └─ persists: periodic snapshot to phoebe + (every N ticks, not every tick) +``` + +**Consumers subscribe to what they need:** + +| Consumer | Subscribes to | Purpose | +|----------|--------------|---------| +| NPC processes | Neighborhood state | What's around me? | +| Governor | Aggregate world state | Resource allocation | +| Godot (client) | Full world state | Render the garden | +| phoebe | Snapshot events | Persist for history/training | + +**Why MMO pattern:** +- Games solved this decades ago: database for persistence, server for truth +- 25 NPCs at 20Hz = 500 state updates/sec — trivial for in-memory +- phoebe shouldn't be polled every tick — it's for history and analytics +- Single authoritative source prevents split-brain on world state + +--- + +### Decision 8: World Server on Dedicated VM + +**Choice:** The world server runs on its own VM in the environment block, alongside phoebe, iris, and NATS. Not in K8s, not on a GPU worker node. + +**VM scheme (dev environment):** + +``` +Saturn/Proxmox — Dev Environment (VMs 120-149) +├── phoebe-dev (VM 120, 10.0.20.120) — PostgreSQL +├── iris-dev (VM 121, 10.0.30.121) — ChromaDB +├── nats-dev (VM 122, 10.0.30.122) — NATS +└── garden-dev (VM 123, 10.0.__.123) — World Server ← new +``` + +**Why a VM, not K8s:** +- The world server is **infrastructure**, not a workload — like NATS and phoebe +- Shouldn't compete with GPU workloads on theia/dioscuri +- Shouldn't be rescheduled by K8s — it holds the state of the garden +- Lightweight: Python + NATS client, minimal resources +- Follows the existing pattern — one purpose, one VM + +**Why not on worker nodes:** +- theia is for cortex (vLLM) and NPC processes — don't mix concerns +- dioscuri is for organs — don't mix concerns +- Dedicated VM = always-on, reliable, isolated + +--- + +## Consequences + +### Enables + +- **Scalable NPC count** — cheap RL brains, shared expensive cortex +- **Emergent personality** — each NPC develops its own weights from experience +- **Measurable progress** — which curriculum level has the village reached? +- **Hardware honesty** — scarcity is the training signal, not a problem to solve +- **Progressive deployment** — start with 5×5 grid, scale to real-world topology +- **Network-distributed NPCs** — NPCs can run on any worker node, governor steers remotely +- **Clean K8s/bare-metal boundary** — NATS bridges without custom bridging code +- **Authoritative world state** — single source of truth, MMO-proven pattern +- **Godot as first-class observer** — subscribes to NATS, renders the garden live +- **phoebe for what phoebe is good at** — persistence, analytics, history — not real-time + +### Constrains + +- **Per-NPC overhead** — each process has OS overhead (acceptable for 25-100 NPCs) +- **Governor complexity** — the governor NN is a second system to train and debug +- **LLM latency** — gated access means NPCs wait when cortex is busy +- **Supervisor required** — each worker node needs npc-supervisor daemon running +- **World server is SPOF** — if it crashes, the garden stops (acceptable at this scale) +- **Another VM to maintain** — garden-dev adds to the environment + +### Deferred + +- **RL network architecture** — specific layer sizes, activation functions, training algorithm +- **Governor training method** — RL, evolutionary, or hybrid +- **NPC-to-NPC communication** — do NPCs talk directly or only through NATS? +- **Curriculum design** — specific level definitions, verification oracles, progression criteria +- **Real-world topology integration** — how OSM data maps to the navigation graph +- **NPC distribution strategy** — how to decide which NPCs run on which node +- **World server tick rate** — 10Hz? 20Hz? Adaptive? +- **Snapshot frequency to phoebe** — every second? every 10 seconds? +- **Godot NATS integration** — WebSocket bridge or native client + +--- + +## References + +- [Endgame-Vision.md](../../../Endgame-Vision.md) - Architecture overview (v8.0) +- [npc-grid-architecture.md](../future/npc-grid-architecture.md) - Detailed NPC grid design +- [spatial-resolution-gradient.md](../future/spatial-resolution-gradient.md) - LOD for cognitive space +- [Gateway-Architecture.md](../Gateway-Architecture.md) - Ternary gate model (unchanged, foundational) +- [Deployment-Architecture.md](../Deployment-Architecture.md) - Infrastructure topology (v2.0) + +--- + +**Filed:** 2026-04-02 (Morning coffee) +**Method:** Bed thinking → draw.io grid → partnership dialogue → crystallization +**Philosophy:** "Cheap brains think fast. Expensive brains think deep. The thalamus decides who gets what." diff --git a/architecture/adr/README.md b/architecture/adr/README.md index 98dc9be..93f3135 100644 --- a/architecture/adr/README.md +++ b/architecture/adr/README.md @@ -19,6 +19,7 @@ An ADR captures an important architectural decision made along with its context | ADR | Title | Status | Date | |-----|-------|--------|------| | [001](ADR-001-message-protocol-foundation.md) | Message Protocol Foundation | Accepted | 2025-12-31 | +| [002](ADR-002-dual-brain-architecture.md) | Dual-Brain Architecture | Proposed | 2026-04-02 | ---