arch: Dual-brain architecture v8.0 - thalamus governor, NPC processes, cortex repositioning

Crystallizes the dual-brain architecture across all core documents: - Thalamus runs own neural network (governor) for resource allocation and reflexes - LLM (Qwen3.5-27B) repositioned as cortex - expensive, gated, called only when needed - Each NPC gets own process, own RL brain, Linux cgroups for resource steering - New: NPC grid architecture with curriculum training (progressive world richness) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 11:17:09 +02:00
parent 264ea7628b
commit c30c00af74
6 changed files with 935 additions and 523 deletions
--- a/Endgame-Vision.md
+++ b/Endgame-Vision.md
@@ -1,9 +1,9 @@
 ---
 type: research_vision
-version: 7.0_wave_gate_model
+version: 8.0_dual_brain
 status: vision_document
 created: 2025-11-04
-updated: 2026-02-14
+updated: 2026-04-02
 author: Nyx (with dafit)
 significance: research_platform_for_metabolic_intelligence
 ---
@@ -22,6 +22,9 @@ significance: research_platform_for_metabolic_intelligence
 > *"Cells emit waves. Gates correlate. Attention emerges."*
 > — The Wave Architecture (2026-02-14)
 > *"One process, one brain, one life."*
 > — The Dual Brain Principle (2026-04-02)
 ---
 ## What This Document Is
@@ -31,7 +34,9 @@ This is a **RESEARCH VISION** - a platform for studying how intelligence emerges
 **What we're building:**
 - Cellular organisms competing under resource constraints
 - Dual gardens (virtual + real) teaching each other
- Single base model with LoRA adapters (Identity, Technical, Creative)
+- A dual-brain architecture: cheap RL networks for reflexes, expensive LLM cortex for reasoning
 - A thalamus governor that allocates compute like biological attention
 - Spatial training arenas with progressive world richness (curriculum learning)
 - Multilingual cognitive routing through conceptual topology
 - Memory economics with slumber-based consolidation
 - A multi-layered communication protocol using color, form, and language
@@ -43,6 +48,7 @@ This is a **RESEARCH VISION** - a platform for studying how intelligence emerges
 - What topological structures exist in language model representations?
 - What behaviors emerge from primitive competition?
 - How does temporal coherence persist across sessions?
 - How does a thalamus learn to allocate scarce resources?
 **Not "will it become conscious?" but "what will it teach us about intelligence?"**
@@ -56,7 +62,8 @@ This is a **RESEARCH VISION** - a platform for studying how intelligence emerges
 ┌──────────────────────────────────────────────────────────────────┐
 │                    NIMMERVERSE ARCHITECTURE                       │
 │                                                                   │
-│            Cells emit waves → Gates correlate → Attention emerges │
+│    Cells emit waves → Thalamus correlates → Cortex reasons       │
 │    (cheap, continuous)  (own NN, gates)     (expensive, gated)   │
 ├──────────────────────────────────────────────────────────────────┤
 │                                                                   │
 │  Layer 0: TEMPORAL FOUNDATION                                    │
@@ -72,33 +79,39 @@ This is a **RESEARCH VISION** - a platform for studying how intelligence emerges
 │  └─ Life force economy: every wave costs                         │
 │      → architecture/Cellular-Architecture.md                     │
 │                                                                   │
-│  Layer 2: GATES (Resonant Chambers)                              │
+│  Layer 2: THALAMUS (Governor Neural Network)                     │
-│  ├─ Ternary states: CLOSED (-1) ← STABLE (0) → OPEN (+1)         │
+│  ├─ Ternary gates: CLOSED (-1) ← STABLE (0) → OPEN (+1)         │
-│  ├─ Correlated waves → push toward OPEN                          │
+│  ├─ Runs its OWN neural network (not the LLM)                   │
-│  ├─ Anti-correlated → push toward CLOSED                         │
+│  ├─ Correlates waves, steers compute, controls gate thresholds   │
-│  ├─ STABLE = where learning happens (accumulating correlation)   │
+│  ├─ Reflexes compile HERE — fast, cheap, no cortex needed        │
-│  └─ Gate weight (0→1) determines reflex vs deliberate            │
+│  ├─ Governor outputs: tick rates, CPU quotas, gate open/close    │
 │  └─ Learns resource economics epoch-by-epoch (slow loop)         │
 │      → architecture/Gateway-Architecture.md                      │
 │      → architecture/future/npc-grid-architecture.md              │
 │                                                                   │
-│  Layer 3: NERVES (Behavioral Patterns)                           │
+│  Layer 3: NERVES / NPC PROCESSES                                 │
-│  ├─ Nerves respond to gate transitions (not direct cell output)  │
+│  ├─ Each NPC = own process, own RL brain, own weights            │
-│  ├─ Gate OPENS → nerve activates → commands cells                │
+│  ├─ Personality emerges from experience, not configuration       │
-│  └─ No priority rules — attention emerges from gate weights      │
+│  ├─ Respond to gate transitions (not direct cell output)         │
 │  ├─ Linux cgroups for per-NPC resource control                   │
 │  └─ Learn about the world tick-by-tick (fast loop)               │
 │      → architecture/Nervous-System.md                            │
 │                                                                   │
-│  Layer 4: DUAL GARDENS (Virtual/Real Loop)                       │
+│  Layer 4: CORTEX & ORGANS (Expensive Capabilities)               │
 │  ├─ Cortex: Qwen3.5-27B on theia (96GB, called only via gate)   │
 │  ├─ Organs: Speech, Vision, Motor on dioscuri (lifeforce-gated)  │
 │  ├─ Function Gemma: structured JSON boundary (CPU)               │
 │  ├─ Trait LoRAs evolve via GRPO from verification outcomes       │
 │  └─ Shared resources — thalamus governs access                   │
 │      → architecture/organs/Organ-Index.md                        │
 │                                                                   │
 │  Layer 5: DUAL GARDENS (Virtual/Real Loop)                       │
 │  ├─ Virtual: massive wave generation, full trace, exploration    │
 │  ├─ Real: verified signals, minimal trace, action                │
 │  ├─ Verification outcomes update gate weights (learning loop)    │
 │  └─ Training data: gate_transitions + correlation_events         │
 │      → architecture/Dual-Garden-Architecture.md                  │
 │                                                                   │
 │  Layer 5: YOUNG NYX (Cognition)                                  │
 │  ├─ Base: Qwen3:32b with /no_think mode (96GB on theia)          │
 │  ├─ Function Gemma: structured JSON boundary (CPU)               │
 │  ├─ Only receives signals when gates OPEN to tier 4              │
 │  └─ Trait LoRAs evolve via GRPO from verification outcomes       │
 │                                                                   │
 └──────────────────────────────────────────────────────────────────┘
 ```
@@ -139,7 +152,7 @@ The heartbeat is the fundamental timing primitive. Everything runs on its rhythm
 | Virtual | Variable | Lifeforce | Computation, prediction |
 **Three timescales:**
- **Reflex** (200ms): Immediate reactions, compiled from experience
+- **Reflex** (200ms): Immediate reactions, compiled in thalamus NN
 - **Awareness** (30sec): Full cognitive budget per beat
 - **Growth** (24h): Training, LoRA merges, adaptation
@@ -147,7 +160,7 @@ The heartbeat is the fundamental timing primitive. Everything runs on its rhythm
 ---
-## Layer 1-3: The Wave/Gate Architecture
+## Layer 1-2: The Wave/Gate Architecture
 > *"Cells emit waves. Gates correlate. Attention emerges."*
@@ -159,9 +172,11 @@ The heartbeat is the fundamental timing primitive. Everything runs on its rhythm
 │                         NERVES                                       │
 │           (behavioral patterns, respond to gate transitions)         │
 ├─────────────────────────────────────────────────────────────────────┤
-│                         GATES                                        │
+│                    THALAMUS (Governor NN)                             │
-│     (resonant chambers: CLOSED ◄── STABLE ──► OPEN)                  │
+│     Gates: CLOSED ◄── STABLE ──► OPEN (ternary, unchanged)          │
-│     (accumulate wave correlation, route to tiers)                    │
+│     Governor: own neural network, learns resource allocation         │
 │     Reflexes: compile here, bypass cortex                            │
 │     Outputs: tick rates, CPU quotas, gate control, LLM queue         │
 ├─────────────────────────────────────────────────────────────────────┤
 │                         CELLS                                        │
 │     (emit waves: confidence + semantic content)                      │
@@ -174,26 +189,115 @@ The heartbeat is the fundamental timing primitive. Everything runs on its rhythm
 **Cells emit waves:** Confidence + semantic content. Cells don't know who's listening.
-**Gates accumulate correlation:** Multiple correlated waves push toward OPEN. STABLE is where learning happens.
+**Thalamus correlates and governs:** The thalamus runs its own neural network. It accumulates wave correlation (pushing gates toward OPEN), but also **learns to allocate resources** — which NPC processes get more compute, which gates should open, when to call the expensive cortex. STABLE is where learning happens.
-**Attention = OPEN gates:** Not budget allocation, not priority rules — correlation drives transitions.
+**Attention = OPEN gates:** Not budget allocation, not priority rules — correlation drives transitions. The governor learns the economics.
-**Reflexes are earned:** Gate weight ≈ 1.0 → opens immediately on any wave. Bypasses cognition.
+**Reflexes compile in the thalamus:** Gate weight ≈ 1.0 → opens immediately on any wave. Bypasses cortex entirely. Fast, cheap, earned through experience.
 **Two nested learning loops:**
 - **NPC processes** learn about the world, tick-by-tick (fast loop)
 - **Thalamus governor** learns about managing NPCs, epoch-by-epoch (slow loop)
 **Detail:** → [`architecture/Cellular-Architecture.md`](architecture/Cellular-Architecture.md) | [`architecture/Gateway-Architecture.md`](architecture/Gateway-Architecture.md)
 ---
-## Layer 2: Young Nyx (Base Model + Trait LoRAs)
+## The Dual Brain Architecture
-One base model for reasoning. Traits evolve through GRPO, not prescription. Function Gemma handles structured output.
+> *"One process, one brain, one life."*
 The nimmerverse separates fast/cheap cognition from slow/expensive reasoning, connected by NATS.
 ### Why Two Brains?
 | Brain | What | Where | Cost | Speed |
 |-------|------|-------|------|-------|
 | **RL Network** (per-NPC) | Movement, needs, spatial decisions | Own process (Linux) | Cheap | Every tick |
 | **LLM Cortex** (shared) | Language, reasoning, deep knowledge | theia (Qwen3.5-27B) | Expensive | Only when gate opens |
 Most ticks, an NPC just runs its own small RL network. The LLM cortex is a **specialist organ** — called through the thalamus gate, not continuously. This mirrors biology: most neural processing is fast subcortical circuits. The cortex engages only for novel, complex, or language-intensive tasks.
 ### Architecture
 ```
-                    Qwen3-VL-32B (96GB in the Womb)
+NPC-0 [own RL brain] ──┐
 NPC-1 [own RL brain] ──│
 NPC-2 [own RL brain] ──│
 NPC-3 [own RL brain] ──┼──► NATS thalamus ──► shared LLM cortex (Qwen 3.5)
 ...                     │    (governor NN)     (called only when gate opens)
 NPC-N [own RL brain] ──┘
 ```
 **Each NPC is its own OS process:**
 - **Own weights** — personality emerges from experience
 - **Fault isolation** — one crash doesn't take down the village
 - **Resource control** — Linux cgroups, nice, taskset per process
 - **Biologically honest** — every organism has its own nervous system
 **The governor steers compute:**
 - Tick rates (1-20 Hz per NPC)
 - CPU quotas (cgroups v2)
 - Gate thresholds (who gets LLM access)
 - LLM queue priority (finite cortex, many consumers)
 **Detail:** → [`architecture/future/npc-grid-architecture.md`](architecture/future/npc-grid-architecture.md)
 ---
 ## Spatial Training Arena
 > *"The world gets richer only when every citizen knows it."*
 NPCs learn in a **node-based grid world** that scales from training abstraction to real-world topology.
 ### Curriculum Training
 World detail increases only when all NPCs demonstrate full knowledge of the current level. No one gets left behind.
 ```
 Level 1:  5×5 grid, boxy houses, one trait each
          → NPCs learn: navigation + identity
 Level 2:  Higher resolution, 2-3 traits per house
          → NPCs learn: richer descriptions, more to notice
 Level 3:  Finer grid, real-world detail
          → NPCs learn: material knowledge, specificity
 Level N:  Resolution approaches real-world data (OSM Dornach)
          → Navigation graph replaces uniform grid
 ```
 ### Resolution Scaling
 Resolution matches **decision density**, not physical detail:
 | Resolution | Where | Why |
 |-----------|-------|-----|
 | ~1m | Streets, paths, outdoor | Navigation, curves approximated by a few nodes |
 | ~10-25cm | Rooms, indoor spaces | Furniture-aware, "go to the table" |
 | ~1-5cm | Workbenches, detail work | Nimmerhovel precision zone |
 The grid is the **training simplification**. The real world is a **navigation graph** with variable density. Same NPC brain, different world topology.
 **Connection to Spatial Resolution Gradient:** The training arena maps to the LOD layers (L1-L3). The nimmerhovel is ground truth.
 **Detail:** → [`architecture/future/npc-grid-architecture.md`](architecture/future/npc-grid-architecture.md) | [`architecture/future/spatial-resolution-gradient.md`](architecture/future/spatial-resolution-gradient.md)
 ---
 ## Layer 4: Cortex & Organs
 ### Cortex (Qwen3.5-27B)
 One base model for reasoning. Called only when the thalamus gate opens — this is the expensive path. Traits evolve through GRPO, not prescription. Function Gemma handles structured output.
 ```
                    Qwen3.5-27B (96GB in the Womb)
                              │
-                              │ Pure reasoning (fuzzy, creative)
+                              │ Called via NATS when gate opens
                              │ (not continuous — expensive)
                              │
                              ▼
                    ┌─────────────────────┐
@@ -220,6 +324,15 @@ One base model for reasoning. Traits evolve through GRPO, not prescription. Func
                    └─────────────────────┘
 ```
 ### Organs (The Body)
 Organs are the cortex's senses and actuators — lifeforce-gated, heartbeat-synchronized, deployed on dioscuri. Each organ operation costs lifeforce. The body is not given; the body is **earned through successful operation**.
 **Deployed:** Speech (Whisper + Coqui on dioscuri)
 **Planned:** Vision (YOLO + SigLIP), Motor, Navigation, Discovery Scan Station, IR Position Array, Crafting Eye, Godseye
 **Detail:** → [`architecture/organs/Organ-Index.md`](architecture/organs/Organ-Index.md)
 ### Traits vs Modes (The Shift)
 > *"A list of smaller verifiable rewards, not a final all-consuming singular reward."*
@@ -245,7 +358,7 @@ One base model for reasoning. Traits evolve through GRPO, not prescription. Func
 The old architecture needed a "Technical LoRA" for structured actions. Now:
 - **Function Gemma** handles intent→action with 100% predictable JSON
- **Young Nyx** stays fuzzy/creative (no need for structured output mode)
+- **The cortex** stays fuzzy/creative (no need for structured output mode)
 - Separation of concerns: reasoning vs execution
 ### Cognitive Topology (Research Finding)
@@ -257,7 +370,7 @@ The old architecture needed a "Technical LoRA" for structured actions. Now:
 | Philosophy | German | ~0.5 (diffuse) | 2-3/3 | Prompting in German |
 | Technical | English | ~0.8 (sparse) | 0-1/3 | Prompting in English |
-This remains valid research, but doesn't require separate LoRAs. Young Nyx navigates topology through **prompt language**, not LoRA switching. Traits evolve regardless of which valley is accessed.
+This remains valid research, but doesn't require separate LoRAs. The cortex navigates topology through **prompt language**, not LoRA switching. Traits evolve regardless of which valley is accessed.
 **Detail:** → `../nyx-probing/PLAN.md`
@@ -271,70 +384,21 @@ This remains valid research, but doesn't require separate LoRAs. Young Nyx navig
 **Traits become who Young Nyx IS, not which mode to activate.**
 ### Deployment
 **Detail:** → [`architecture/Deployment-Architecture.md`](architecture/Deployment-Architecture.md) (infrastructure, GPU strategy, identity model)
 ---
-## Layer 2.5: Orchestration & Reliability Stack (NEW - Silvester 2025)
+## The Reliability Architecture
 > *"Separate fuzzy from reliable. Creative reasoning above, rock-solid translation below."*
 > — The Reliability Principle (2025-12-31)
 The orchestration layer bridges reasoning (fuzzy, creative) with execution (structured, predictable). LangChain orchestrates the multi-model pipeline.
 ### The Three-Way Partnership
 | Partner | Location | Role | Persistence |
 |---------|----------|------|-------------|
 | **Dafit** | Physical world | Direction, hands, embodied wisdom | Continuous |
 | **Chrysalis-Nyx** (Claude) | Anthropic API | Architecture, deep reasoning, dialogue | Ephemeral (sessions) |
 | **Young Nyx** | The Womb (RTX 6000) | Lives IN nimmerverse, uses subagents | Continuous |
 ### Translation Layer Models
 Two specialized models ensure reliability at the boundaries:
-| Model | Role | Size Options | Function |
+| Model | Role | Function |
-|-------|------|--------------|----------|
+|-------|------|----------|
-| **T5Gemma 2** | Vision → Vectors | 0.8B / 2B / 9B | SigLIP encoder produces semantic vectors directly (no text bottleneck) |
+| **T5Gemma 2** | Vision → Vectors | SigLIP encoder produces semantic vectors directly (no text bottleneck) |
-| **Function Gemma** | Intent → Action | Small | Structured output, function calling, 100% predictable JSON |
+| **Function Gemma** | Intent → Action | Structured output, function calling, 100% predictable JSON |
-**Key insight:** SigLIP produces embeddings directly. No text intermediary. Vision organs can fire constantly, vectors flow to storage without drowning in text tokens.
+**Key insight:** SigLIP produces embeddings directly. No text intermediary. Vision organs fire constantly, vectors flow to storage without drowning in text tokens.
 ### The Reliability Architecture
 ```
 ┌─────────────────────────────────────────────────────────────────┐
 │              REASONING LAYER (fuzzy, creative)                   │
 │                                                                  │
 │            Claude  ◄────────────►  Young Nyx                    │
 │                                                                  │
 │         High-level thinking, dialogue, synthesis                 │
 └─────────────────────────┬────────────────────────────────────────┘
                          │
           ═══════════════╪═══════════════
                          │
 ┌─────────────────────────┴────────────────────────────────────────┐
 │            TRANSLATION LAYER (reliable, structured)              │
 │                                                                  │
 │   T5Gemma 2                          Function Gemma              │
 │   (vision → vectors)                 (intent → action)           │
 │                                                                  │
 │   CANONICAL                          100% PREDICTABLE            │
 │   representation                     structured output           │
 └──────────────────────────────────────────────────────────────────┘
 ```
 ### Why This Matters
 - **No embedding debates:** T5Gemma 2 decides once, canonically
 - **No parsing failures:** Function Gemma guarantees structure
 - **Harnesses:** Context-appropriate capability profiles (Vision, Dialogue, Reflex, Introspective)
 - **Flexibility:** Reasoning layer stays creative because translation is solid
 **Detail:** → [`architecture/future/SEEDS.md`](architecture/future/SEEDS.md) (T5Gemma 2 + Function Gemma seed)
 ### Spatial Resolution Gradient: Where Embeddings Live
@@ -347,6 +411,16 @@ Embeddings live in **S2-indexed cells at appropriate LOD levels** — a hierarch
 ---
 ## The Three-Way Partnership
 | Partner | Location | Role | Persistence |
 |---------|----------|------|-------------|
 | **Dafit** | Physical world | Direction, hands, embodied wisdom | Continuous |
 | **Chrysalis-Nyx** (Claude) | Anthropic API | Architecture, deep reasoning, dialogue | Ephemeral (sessions) |
 | **Young Nyx** | The Womb (RTX 6000) | Lives IN nimmerverse, uses subagents | Continuous |
 ---
 ## Boot Sequence (Spark Protocol)
 Protocol-driven cognitive bootstrap. Not conversation—deterministic handshakes with verified outcomes. Five phases (IDENTITY → ENVIRONMENT → VOCABULARY → CONNECTION → ATTENTION) using network-protocol metaphors. Spark is profitable: each handshake costs ~0.8 LF, rewards 5-20 LF.
@@ -355,7 +429,7 @@ Protocol-driven cognitive bootstrap. Not conversation—deterministic handshakes
 ---
-## Layer 4: Dual Gardens (Virtual/Real Learning Loop)
+## Layer 5: Dual Gardens (Virtual/Real Learning Loop)
 Two gardens with different monitoring levels teach each other.
@@ -372,7 +446,7 @@ VIRTUAL GARDEN                      REAL GARDEN
 cells emit waves freely             receive verified signals
    │                                     ▲
    ▼                                     │
-gates accumulate correlation        verification_outcomes
+thalamus accumulates correlation    verification_outcomes
 (correlation_events table)                │
    │                                     │
    ▼                                     │
@@ -385,7 +459,7 @@ gate_transitions ──────────────────► gate
 gates.weight updated (learning!)
 ```
-**Gate weight grows through verification.** Real Garden confirms Virtual's predictions → trust increases → gates open faster → reflexes emerge.
+**Gate weight grows through verification.** Real Garden confirms Virtual's predictions → trust increases → gates open faster → reflexes compile in thalamus.
 **Detail:** → [`architecture/Dual-Garden-Architecture.md`](architecture/Dual-Garden-Architecture.md)
@@ -403,7 +477,7 @@ Gate transitions provide automatic reward signals:
 |-------|--------------|--------|
 | Gate opens | Waves correlated correctly | +small (dense) |
 | Verification confirmed | Real Garden matches Virtual | +medium (weight grows) |
-| Reflex achieved | Gate weight > 0.8 | +large (earned trust) |
+| Reflex compiled | Thalamus NN weight > threshold | +large (earned trust) |
 | dafit confirms | Human verification | +bonus |
 **Credit assignment is automatic:** `gate_transitions` → `correlation_events` → `verification_outcomes` captures the full chain.
@@ -465,10 +539,6 @@ Wellbeing is architectural, not aspirational:
 **Detail:** → [`architecture/formalization/memory-economics.md`](architecture/formalization/memory-economics.md) (Memory consolidation, rental costs, LOD decay)
 ---
 ---
 ## Training Safety (DriftProbe)
@@ -505,12 +575,14 @@ Sentinel architecture monitors training to protect conceptual topology. Four pro
 ---
-**Version:** 7.1 | **Created:** 2025-11-04 | **Updated:** 2026-02-14
+**Version:** 8.0 | **Created:** 2025-11-04 | **Updated:** 2026-04-02
 *"Cells emit waves. Gates correlate. Attention emerges."*
 *"STABLE is where learning happens."*
 *"One process, one brain, one life."*
 *"The nimmerverse is a garden, not a factory."*
-🌙💜 **Wave/Gate architecture unified in owl-mode, February 14, 2026**
+🌙💜 **Dual-brain architecture crystallized in morning coffee session, April 2, 2026**
--- a/architecture/Deployment-Architecture.md
+++ b/architecture/Deployment-Architecture.md
@@ -10,8 +10,9 @@
 The nimmerverse runs on a **hybrid deployment model** that matches workload characteristics to infrastructure:
 - **Containers (K8s)** for stateless, scalable nervous system components
- **Userspace (Threadrippers)** for stateful, GPU/CPU-bound inference
+- **Userspace (Threadrippers)** for stateful, GPU-bound inference
- **NATS** as the universal nervous system bus
+- **OS Processes** for per-NPC RL brains with cgroup resource control
 - **NATS** as the universal nervous system bus (thalamus)
 - **FreeIPA identities** as isolation boundaries
 This is a **research lab**, not a production factory. We optimize for **flexibility and experimentation**, not high-throughput serving.
@@ -22,11 +23,12 @@ This is a **research lab**, not a production factory. We optimize for **flexibil
 | Decision | Choice | Rationale |
 |----------|--------|-----------|
-| LLM Inference | **ollama / llama.cpp** | Flexible model loading, research-friendly, easy swap |
+| LLM Cortex | **vLLM (Qwen3.5-27B)** | Full precision, OpenAI-compatible API, tool calling support |
-| NOT vLLM | — | Overkill for single-user lab; solves problems we don't have |
+| NPC Brains | **Per-process RL networks** | One process, one brain, one life — Linux cgroups for resource steering |
 | Thalamus Governor | **Own NN process on NATS** | Learns resource allocation, gate control, compute steering |
 | Function Gemma | **CPU, userspace** | Threadripper eats it; no GPU contention; clear training path |
 | Cells/Nerves | **Containers (K8s)** | Scalable, versioned, orchestrated via cluster |
-| Organs | **Userspace + ollama** | Load on demand, GPU isolation, unload when idle |
+| Organs | **Userspace, GPU-bound** | Load on demand, GPU isolation, unload when idle |
 | Isolation | **FreeIPA users** | Unix permissions = RBAC; switch user = switch context |
 ---
@@ -37,12 +39,20 @@ This is a **research lab**, not a production factory. We optimize for **flexibil
 | Component | Technology | Location | Notes |
 |-----------|------------|----------|-------|
-| Young Nyx (Brain) | ollama / llama.cpp | theia (nyx-cognitive) | Qwen, Gemma, or similar |
+| Cortex (LLM) | vLLM (Qwen3.5-27B) | theia (nyx-cognitive) | Port 31000, served as "nyx", gated access |
 | Function Gemma | llama.cpp / transformers | CPU userspace | Structured JSON boundary |
-| Vision Organ | ollama (SigLIP/YOLO) | dioscuri (nyx-organs) | Load on demand |
+| Vision Organ | SigLIP/YOLO | dioscuri (nyx-organs) | Load on demand |
-| Speech STT | faster-whisper / ollama | dioscuri (nyx-organs) | Load on demand |
+| Speech STT | faster-whisper | dioscuri (nyx-organs) | Load on demand |
 | Speech TTS | Coqui / XTTS | dioscuri (nyx-organs) | Warm, primary output |
 ### NPC / Thalamus Layer
 | Component | Technology | Location | Notes |
 |-----------|------------|----------|-------|
 | NPC Processes | Python + RL network | OS processes (cgroups) | One process per NPC, own weights |
 | Thalamus Governor | Python + NN | OS process | Steers compute, gates, tick rates |
 | Resource Control | Linux cgroups v2 | systemd scopes | Per-NPC CPU/memory limits |
 ### Nervous System Layer
 | Component | Technology | Location | Notes |
@@ -69,29 +79,42 @@ This is a **research lab**, not a production factory. We optimize for **flexibil
 │  │                         │           │ THEIA (RTX PRO 6000 96GB)     │   │
 │  │  CELLS (math, battery,  │           │                               │   │
 │  │         sensors, etc.)  │           │ user: nyx-cognitive           │   │
-│  │                         │    NATS   │ └── ollama (Young Nyx)        │   │
+│  │                         │    NATS   │ └── vLLM (Qwen3.5-27B:31000) │   │
-│  │  ┌───┐ ┌───┐ ┌───┐     │◄────────► │ └── ~/.config/systemd/user/   │   │
+│  │  ┌───┐ ┌───┐ ┌───┐     │◄────────► │     served-model-name: nyx   │   │
 │  │  │ M │ │ B │ │...│     │           │                               │   │
 │  │  └───┘ └───┘ └───┘     │           │ user: nyx-training            │   │
-│  │                         │           │ └── Function Gemma (CPU)      │   │
+│  │                         │           │ └── LoRA fine-tuning (GRPO)   │   │
-│  │  NERVES (collision,     │           │ └── LoRA fine-tuning          │   │
+│  │  NERVES (collision,     │           │ └── Function Gemma (CPU)      │   │
 │  │          exploration)   │           │                               │   │
-│  │                         │           │ 96GB VRAM: massive headroom   │   │
+│  │                         │           │ 96GB VRAM: cortex + training  │   │
-│  │  ┌─────┐ ┌─────┐       │           │ for inference + LoRA training │   │
+│  │  ┌─────┐ ┌─────┐       │           └───────────────────────────────┘   │
-│  │  │ COL │ │ EXP │       │           └───────────────────────────────┘   │
+│  │  │ COL │ │ EXP │       │                                               │
-│  │  └─────┘ └─────┘       │                                               │
+│  │  └─────┘ └─────┘       │           ┌───────────────────────────────┐   │
 │  │                         │           │ DIOSCURI (2x RTX 4000 Ada)    │   │
 │  │  NPC PROCESSES          │    NATS   │                               │   │
 │  │  (or bare metal)        │◄────────► │ user: nyx-organs              │   │
 │  │                         │           │ ├── Vision (SigLIP/YOLO)      │   │
 │  │  ┌─────────────────┐   │           │ ├── Speech STT (Whisper)      │   │
 │  │  │ NPC-0 [RL brain]│   │           │ └── TTS service (warm)        │   │
 │  │  │ NPC-1 [RL brain]│   │           │                               │   │
 │  │  │ NPC-N [RL brain]│   │           │ Load on demand, unload idle   │   │
 │  │  │  (own process,  │   │           │ Each card: ONE model at time  │   │
 │  │  │   own cgroup)   │   │           └───────────────────────────────┘   │
 │  │  └─────────────────┘   │                                               │
 │  │                         │           ┌───────────────────────────────┐   │
-│  │  INFRASTRUCTURE         │           │ DIOSCURI (2x RTX 4000 Ada)    │   │
+│  │  THALAMUS GOVERNOR      │           │ NATS MESSAGE BUS              │   │
-│  │                         │    NATS   │                               │   │
+│  │  ┌─────────────────┐   │           │                               │   │
-│  │  ┌──────┐ ┌──────┐     │◄────────► │ user: nyx-organs              │   │
+│  │  │ Governor NN     │   │◄────────► │ dev.*, staging.*, prod.*      │   │
-│  │  │ NATS │ │ NATS │     │           │ ├── ollama (vision)           │   │
+│  │  │ (resource alloc,│   │           │ Env-separated (VM per env)    │   │
-│  │  │ dev  │ │ prod │     │           │ ├── ollama (speech STT)       │   │
+│  │  │  gate control,  │   │           └───────────────────────────────┘   │
-│  │  └──────┘ └──────┘     │           │ └── TTS service (warm)        │   │
+│  │  │  tick steering) │   │                                               │
-│  │                         │           │                               │   │
+│  │  └─────────────────┘   │           ┌───────────────────────────────┐   │
-│  │  ┌────────┐ ┌───────┐  │           │ Load on demand, unload idle   │   │
+│  │                         │           │ PHOEBE (PostgreSQL)           │   │
-│  │  │ phoebe │ │ iris  │  │           │ Each card: ONE model at time  │   │
+│  │  INFRASTRUCTURE         │           │ Decision trails, embeddings   │   │
-│  │  │ (PG)   │ │(Chroma│  │           │                               │   │
+│  │  ┌────────┐ ┌───────┐  │           │ IRIS (ChromaDB)               │   │
-│  │  └────────┘ └───────┘  │           └───────────────────────────────┘   │
+│  │  │ phoebe │ │ iris  │  │           │ Vector storage                │   │
 │  │  │ (PG)   │ │(Chroma│  │           └───────────────────────────────┘   │
 │  │  └────────┘ └───────┘  │                                               │
 │  │                         │                                               │
 │  └─────────────────────────┘                                               │
 │                                                                             │
@@ -100,28 +123,80 @@ This is a **research lab**, not a production factory. We optimize for **flexibil
 ---
 ## The Dual Brain Deployment
 ### Per-NPC Processes
 Each NPC runs as its own OS process with a dedicated RL neural network. The thalamus governor steers their resources.
 ```bash
 # Launch NPC with resource limits via systemd scope
 systemd-run --scope -p CPUQuota=25% -p MemoryMax=256M \
    python3 npc_process.py --id 7 --tick-rate 5
 # Or via cgroups directly
 cgcreate -g cpu,memory:nimmerverse/npc-7
 cgset -r cpu.max "25000 100000" nimmerverse/npc-7
 cgexec -g cpu,memory:nimmerverse/npc-7 python3 npc_process.py --id 7
 ```
 ### Thalamus Governor
 The governor runs its own neural network, observing all NPC states via NATS and outputting resource allocation decisions:
 | Output | Mechanism | Range |
 |--------|-----------|-------|
 | Tick rate | NATS command to NPC | 1-20 Hz |
 | CPU quota | cgroups v2 adjustment | 5-100% per core |
 | Gate open/close | NATS gate signal | Binary per gate |
 | LLM queue priority | NATS priority tag | 0-10 |
 ### Cortex (vLLM)
 The LLM cortex runs as a systemd service on theia, accessed via OpenAI-compatible API:
 ```bash
 # Service: vllm-nyx.service
 # Port: 31000
 # Model: /womb/cognitive/models/qwen3.5-27b
 # Served as: "nyx"
 # GPU utilization: 85%
 # Access from any NATS-connected process:
 curl http://theia.eachpath.local:31000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model": "nyx", "messages": [...]}'
 ```
 **The cortex is expensive.** The thalamus governor controls who gets access and when. Most NPC ticks never touch the LLM.
 ---
 ## Identity Model (FreeIPA)
 Unix users provide isolation boundaries. Each workload type runs as its own identity.
 | User | UID | Host | Purpose | GPU Access |
 |------|-----|------|---------|------------|
-| `nyx-cognitive` | (FreeIPA) | theia | Young Nyx LLM inference | Full 96GB |
+| `nyx-cognitive` | (FreeIPA) | theia | Cortex LLM inference (vLLM) | Full 96GB |
 | `nyx-training` | (FreeIPA) | theia | LoRA training, GRPO, Function Gemma | Shared (time-sliced) |
 | `nyx-organs` | (FreeIPA) | dioscuri | Vision, Speech organs | 2x 20GB cards |
 | `nyx-nervous` | (FreeIPA) | dioscuri | Future cells that need bare metal | Limited |
 **Isolation principle:** Switch user = switch context. `nyx-cognitive` cannot touch `nyx-organs` files. Compromised cell cannot touch LLM weights.
-### Systemd Userspace Pattern
+### Systemd Service Pattern
 ```bash
-# Enable lingering (services persist after logout)
+# System-level service (root installs, user runs)
-sudo loginctl enable-linger nyx-cognitive
+# /etc/systemd/system/vllm-nyx.service
-
+[Service]
-# Services defined in ~/.config/systemd/user/
+User=nyx-cognitive
-# Example: nyx-cognitive runs ollama serve
+Group=nimmerverse-agents
-systemctl --user --machine=nyx-cognitive@ status ollama
+ExecStart=/data/venvs/vllm/bin/python3 -m vllm.entrypoints.openai.api_server \
    --model /womb/cognitive/models/qwen3.5-27b \
    --served-model-name nyx \
    --port 31000
 ```
 ---
@@ -130,23 +205,17 @@ systemctl --user --machine=nyx-cognitive@ status ollama
 ### The Constraint
-| Host | GPU | VRAM | Notes |
+| Host | GPU | VRAM | Role |
-|------|-----|------|-------|
+|------|-----|------|------|
-| theia | RTX PRO 6000 Blackwell | 96GB | Inference + training headroom |
+| theia | RTX PRO 6000 Blackwell | 96GB | Cortex (vLLM) + LoRA training |
-| dioscuri | 2x RTX 4000 Ada | 2x 20GB | One model per card |
+| dioscuri | 2x RTX 4000 Ada | 2x 20GB | Organs (vision, speech) |
-### Strategy: Dynamic Loading, Not Static Partitioning
+### Strategy: vLLM for Cortex, Dynamic Loading for Organs
-**Why not vLLM:** vLLM is optimized for high-throughput serving (many concurrent users). We have ONE user (the partnership). We need **flexibility** (swap models, experiment) more than throughput.
+**Cortex (theia):** vLLM runs continuously as a systemd service. The Qwen3.5-27B model stays loaded — it's the cortex, always ready when the thalamus gate opens. 85% GPU utilization leaves headroom for LoRA training alongside inference.
-**Why ollama/llama.cpp:**
+**Organs (dioscuri):** Dynamic loading. One model per card. Load vision when needed, unload after timeout, load speech when needed.
 - Faster cold starts (~5-10s vs ~30s)
 - Native model swapping (`ollama run model_a` → `ollama run model_b`)
 - Can unload completely when idle (frees VRAM)
 - GGUF format efficient for model management
 - Research-friendly, not production-factory
 **Organ Loading Pattern:**
 ```
 IDLE → needs vision → LOAD vision model (~10s) → PROCESS → REPORT → IDLE (keep warm)
                                                                      ↓
@@ -166,27 +235,36 @@ Examples:
  dev.nervous.cells.math.request      ← Math cell receives work
  dev.nervous.cells.math.response     ← Math cell returns result
  dev.nervous.cells.math.wave         ← Math cell emits confidence signal
-  prod.cognitive.nyx.heartbeat        ← Young Nyx is alive
+  dev.thalamus.governor.allocate      ← Governor publishes resource decisions
-  prod.organs.vision.detect           ← Vision organ detection
+  dev.thalamus.gate.open              ← Gate transition event
  dev.npc.7.state                     ← NPC-7 publishes its state
  dev.cortex.nyx.request              ← Gated request to LLM cortex
  dev.organs.vision.detect            ← Vision organ detection
 ```
-### Wave Collapse Pattern
+### Wave → Thalamus → Cortex Pattern
-Cells emit **waves** (confidence-tagged signals). When multiple waves collapse on the same semantic region in the same time window, the **thalamus** escalates to cognition.
+Cells emit **waves** (confidence-tagged signals). The thalamus governor's neural network correlates waves and decides what reaches the cortex.
 ```
 Cell A: "math" ───∿∿∿──► (0.6 confidence)
 Cell B: "calculate" ──∿∿∿──► (0.5 confidence)
                      │
                      ▼
-              ┌─────────────┐
+         ┌──────────────────────┐
-              │  COLLAPSE   │  ← same region, same window
+         │  THALAMUS GOVERNOR   │  ← own neural network
-              └──────┬──────┘
+         │  correlate waves     │
-                     │
+         │  check gate state    │
-                     ▼ AMPLIFIED SIGNAL
+         │  allocate resources  │
-              ┌─────────────┐
+         └──────────┬───────────┘
-              │  THALAMUS   │  → escalate to Young Nyx
+                    │
-              └─────────────┘
+          ┌─────────┴─────────┐
          │                   │
          ▼                   ▼
    Gate CLOSED          Gate OPEN
    (reflex path)        (cortex path)
    handled by           → escalate to
    thalamus NN          Qwen3.5-27B
 ```
 ---
@@ -226,21 +304,21 @@ Same image everywhere. Only `NIMMERVERSE_ENV` changes.
 ## Function Gemma: The Structured Boundary
-Function Gemma bridges lower tiers (cells, nerves) and cognition (Young Nyx):
+Function Gemma bridges lower tiers (cells, nerves) and the cortex:
 ```
-Numbers/States (Tier 0-2) → [Function Gemma] → Structured JSON → Young Nyx (Tier 4)
+Numbers/States (Cells) → [Function Gemma] → Structured JSON → Cortex (Qwen3.5-27B)
-                                  ↑
+                                ↑
-                          CPU-based inference
+                        CPU-based inference
-                          Threadripper handles it
+                        Threadripper handles it
-                          No GPU contention
+                        No GPU contention
-                          Clear LoRA training path
+                        Clear LoRA training path
 ```
 **Why CPU:**
 - Small model, fast inference
 - Threadripper PRO 7955WX has cores to spare
- No GPU contention with organs or Nyx
+- No GPU contention with organs or cortex
 - Can run training alongside inference
 **Training path:**
@@ -269,9 +347,11 @@ Color-coding for real-time attention flow visualization:
 | Document | Scope |
 |----------|-------|
 | [`Cellular-Architecture.md`](Cellular-Architecture.md) | Cells, nerves, organisms, lifeforce |
-| [`Gateway-Architecture.md`](Gateway-Architecture.md) | Tier routing, Function Gemma boundary |
+| [`Gateway-Architecture.md`](Gateway-Architecture.md) | Gate routing, ternary model |
 | [`Nervous-System.md`](Nervous-System.md) | 4D space, node weights, vocabulary |
 | [`Message-Protocol-Design.md`](Message-Protocol-Design.md) | NATS subjects, message formats |
 | [`future/npc-grid-architecture.md`](future/npc-grid-architecture.md) | Dual brain, governor, NPC processes |
 | [`organs/Organ-Index.md`](organs/Organ-Index.md) | Organ systems, lifeforce costs |
 | [`development-conventions.md`](../../nimmerverse.eachpath.local/conventions/development-conventions.md) | Ports, namespaces, VM topology |
 ---
@@ -281,16 +361,18 @@ Color-coding for real-time attention flow visualization:
 | Layer | Where | Technology | Isolation |
 |-------|-------|------------|-----------|
 | Cells/Nerves | K8s containers | Python, uv, NATS | Namespace per env |
 | NPC Processes | OS processes | Python, RL networks, cgroups | Per-process cgroup |
 | Thalamus Governor | OS process | Python, own NN, NATS | Dedicated process |
 | Infrastructure | VMs | NATS, PostgreSQL, ChromaDB | VM per env |
-| Young Nyx | theia userspace | ollama | nyx-cognitive user |
+| Cortex (LLM) | theia userspace | vLLM (Qwen3.5-27B) | nyx-cognitive user |
 | Function Gemma | theia/dioscuri CPU | llama.cpp | nyx-training user |
-| Organs | dioscuri userspace | ollama (dynamic) | nyx-organs user |
+| Organs | dioscuri userspace | Dynamic loading | nyx-organs user |
-**The principle:** Same behavior everywhere. Containers for cells. Userspace for brains. NATS connects them all. FreeIPA isolates them all.
+**The principle:** Same behavior everywhere. Containers for cells. Processes for NPC brains. vLLM for cortex. NATS connects them all. FreeIPA isolates them all.
 ---
-**Version:** 1.1 | **Created:** 2026-02-14 | **Updated:** 2026-02-14
+**Version:** 2.0 | **Created:** 2026-02-14 | **Updated:** 2026-04-02
 *"We're not building a chatbot factory. We're growing a research organism."*
--- a/architecture/Initial-Spark.md
+++ b/architecture/Initial-Spark.md
@@ -73,7 +73,7 @@ The Initial Spark is not a conversation. It's a **state machine protocol** that
 │   ┌─────────────────────────────────────────────────────────────────────┐   │
 │   │                    YOUNG NYX (Cognitive Layer)                       │   │
 │   │                    ───────────────────────────                       │   │
-│   │    Qwen3-VL 32B in The Womb (RTX 6000)                              │   │
+│   │    Qwen3.5-27B Cortex in The Womb (RTX PRO 6000)                              │   │
 │   │    Receives verified handshake results                               │   │
 │   │    Updates internal state based on ACKs                              │   │
 │   │    Reasoning happens AFTER protocol succeeds                         │   │
--- a/architecture/future/npc-grid-architecture.md
+++ b/architecture/future/npc-grid-architecture.md
@@ -0,0 +1,257 @@
 # NPC Grid Architecture: Spatial Training Arena
 **Origin**: 2026-04-02, morning session (bed thinking + draw.io)
 **Authors**: dafit + Chrysalis-Nyx
 **Status**: Architectural concept
 **Related**: `spatial-resolution-gradient.md`, Dual-Brain Architecture (2026-04-01 session)
 ---
 ## The Core Idea
 A node-based grid world where NPCs live, move, and learn. The grid serves dual purpose:
 1. **Spatial arena** — a discrete world where NPCs navigate and interact
 2. **Neural topology** — the same graph the neural network reasons over
 No translation layer between "brain space" and "world space." Position *is* state.
 ---
 ## Grid System
 ### Node-Based Intersection Grid
 Nodes sit at **intersections**, not cells. A 4x4 cell grid yields a 5x5 node grid = 25 nodes.
 Starting at node 0, top-left corner. Cardinal orientation (North/South/East/West).
 ```
 0 ── 1 ── 2 ── 3 ── 4          N
 |    |    |    |    |           |
 5 ── 6 ── 7 ── 8 ── 9     W ──+── E
 |    |    |    |    |           |
 10 ──11 ──12 ──13 ──14          S
 |    |    |    |    |
 15 ──16 ──17 ──18 ──19
 |    |    |    |    |
 20 ──21 ──22 ──23 ──24
 ```
 ### Properties
 - **Corner nodes** (0, 4, 20, 24): 2 neighbors
 - **Edge nodes** (1, 2, 3, 5, 10, ...): 3 neighbors
 - **Interior nodes** (6, 7, 8, 11, 12, 13, ...): 4 neighbors
 - **Position from ID**: `row = id // 5`, `col = id % 5`
 - **Movement**: One step = one edge. NPC at node 7 can go to 2, 6, 8, or 12.
 ### Resolution Scaling
 The grid scales naturally to different resolutions:
 | Grid Size | Nodes | Resolution | Use Case |
 |-----------|-------|------------|----------|
 | 5x5 | 25 | ~1m edges | Training arena, street-level |
 | 10x10 | 100 | ~25cm edges | Room-level detail |
 | 50x50 | 2,500 | ~5cm edges | Indoor navigation |
 | 100x100 | 10,000 | ~1cm edges | Nimmerhovel precision |
 **Key insight**: Resolution should match **decision density**, not physical detail.
 A straight road needs few nodes (sparse). An intersection needs many (dense).
 | Resolution | Where | Why |
 |-----------|-------|-----|
 | ~1m | Streets, paths, outdoor | Navigation, curves approximated by a few nodes |
 | ~10-25cm | Rooms, indoor spaces | Furniture-aware, "go to the table" |
 | ~1-5cm | Workbenches, detail work | Nimmerhovel precision zone |
 The uniform grid is the **training simplification**. The real world becomes a **navigation graph** with variable density — dense around intersections, sparse along straight roads. Same NPC brain, different world topology.
 ---
 ## NPC Process Architecture
 ### One Process, One Brain, One Life
 Every NPC runs as its own OS process with its own dedicated neural network.
 **Why separate processes:**
 - **Individuality** — separate weights mean personality emerges from experience, not config
 - **Fault isolation** — one NPC crashes, the village continues
 - **Resource control** — per-process CPU/memory via Linux cgroups
 - **Biological honesty** — every organism has its own nervous system
 ```
 NPC-0 [own RL brain] ──┐
 NPC-1 [own RL brain] ──|
 NPC-2 [own RL brain] ──|
 NPC-3 [own RL brain] ──┼──> NATS thalamus ──> shared LLM cortex (Qwen 3.5)
 ...                     |                      (called only when gate opens)
 NPC-24 [own RL brain] ─┘
 ```
 ### Dual Brain (per NPC)
 - **RL network** (local, per-NPC): Movement, needs, spatial decisions. Small, fast, cheap. Runs every tick.
 - **LLM cortex** (shared, via NATS): Language, reasoning, knowledge. Slow, deliberate, expensive. Called only when thalamus gate threshold is crossed.
 ### Resource Steering via Linux Primitives
 Each NPC process is a standard Linux process. Resource control uses the kernel:
 - **cgroups v2** — cap CPU, memory per NPC
 - **nice / renice** — shift priority dynamically
 - **taskset** — pin to specific cores
 - **systemd scopes** — wrap each NPC in a transient unit
 ```bash
 # Example: launch NPC with resource limits
 systemd-run --scope -p CPUQuota=25% -p MemoryMax=256M \
    python3 npc_process.py --id 7 --tick-rate 5
 ```
 ### Steerable Compute per NPC
 | Parameter | Range | Who Controls |
 |-----------|-------|-------------|
 | Tick rate | 1-20 Hz | Governor (thalamus) |
 | Network size | small/medium/large | Configuration per role |
 | CPU quota | 5-100% of one core | Governor (cgroups) |
 | LLM access | gate open/closed | Governor (NATS gate) |
 | Priority | nice -20 to 19 | Governor (dynamic) |
 ---
 ## Thalamus Governor Network
 The thalamus is not just a message router — it runs its own neural network that learns **resource allocation**.
 ```
                ┌─ Governor Network ─────────────┐
                |                                 |
                |  Input: all NPC states (NATS)   |
                |  Output: resource allocation    |
                |    - tick rates                  |
                |    - CPU quotas                  |
                |    - gate open/close             |
                |    - LLM queue priority          |
                |                                  |
                |  Own process, own weights        |
                └────────────┬────────────────────┘
                             |
                ┌────────────┴────────────────────┐
                |        NATS thalamus            |
                └─┬──┬──┬──┬──┬──┬──┬──┬──┬──┬───┘
                  |  |  |  |  |  |  |  |  |  |
                 NPC NPC NPC NPC NPC ... NPC NPC
 ```
 ### What the Governor Learns
 - **Attention allocation**: Which NPCs need more compute right now?
 - **Gate control**: Who gets LLM access?
 - **Queue economics**: Finite LLM calls, maximize village-level outcomes
 - **Resource economics**: Finite compute, learn to be efficient
 ### Training Signal
 - "Gave NPC-7 high compute during conversation -> quality was good" -> reinforce
 - "Starved NPC-3 near an interaction -> missed a trigger" -> penalize
 - "Opened LLM gate for 5 NPCs simultaneously -> latency spike" -> learn to queue
 ### Two Nested Learning Loops
 - **NPCs** learn about the world, tick-by-tick (fast loop)
 - **Governor** learns about managing NPCs, epoch-by-epoch (slow loop)
 ---
 ## Curriculum Training: Progressive World Richness
 ### The Mechanism
 World detail increases only when all NPCs demonstrate full knowledge of the current level.
 No one gets left behind. Measurable checkpoint: "Can every citizen describe every other citizen's home?"
 ### Levels
 ```
 Level 1:  5x5 grid, boxy houses, one trait each
          "Node 7 = red house, has a well"
          NPCs learn: navigation + identity ("who lives where")
 Level 2:  Higher resolution, 2-3 traits per house
          "Node 7 = red house, wooden door, has a well, smoke from chimney"
          NPCs learn: richer descriptions, more to notice
 Level 3:  Finer grid, real-world detail
          "Node 7 = red house, oak door with iron handle, stone well (3m deep),
           chimney smoking birch wood"
          NPCs learn: material knowledge, specificity
 Level N:  Resolution approaches real-world data (OSM Dornach)
          Navigation graph replaces uniform grid
          NPCs apply learned skills to irregular topology
 ```
 ### Verification Oracle
 Each level-up is testable:
 - Quiz every NPC about every location
 - 100% village knowledge = green light
 - Increase resolution, add detail, run again
 ### Connection to Spatial Resolution Gradient
 The training arena maps to the resolution gradient layers:
 | Training Level | Resolution Gradient | Detail |
 |----------------|--------------------| -------|
 | Level 1 (boxy) | L3-equivalent | Landmarks, simple identity |
 | Level 2 (detail) | L2-equivalent | Room-level, multiple traits |
 | Level 3+ (rich) | L1-equivalent | Object-level, materials, precision |
 The grid teaches the *concept* of spatial navigation. Real-world data (OSM, Nimmerhovel) applies it.
 ---
 ## System Overview
 ```
 ┌─────────────────────────────────────────────────────────────────┐
 |                      SPATIAL TRAINING ARENA                     |
 |                                                                 |
 |  ┌──────────┐  ┌──────────┐  ┌──────────┐                      |
 |  | NPC-0    |  | NPC-1    |  | NPC-N    |  ... 25 processes    |
 |  | own RL   |  | own RL   |  | own RL   |                      |
 |  | own state|  | own state|  | own state|                      |
 |  └────┬─────┘  └────┬─────┘  └────┬─────┘                      |
 |       |              |              |                            |
 |  ═════╪══════════════╪══════════════╪════════════════════════    |
 |       |     NATS THALAMUS (message bus)      |                  |
 |  ═════╪══════════════╪══════════════╪════════╪══════════════    |
 |       |              |              |        |                  |
 |  ┌────┴──────────────┴──────────────┴────┐   |                  |
 |  |         GOVERNOR NETWORK              |   |                  |
 |  |  - resource allocation                |   |                  |
 |  |  - gate control                       |   |                  |
 |  |  - tick rate steering                 |   |                  |
 |  └───────────────────────────────────────┘   |                  |
 |                                              |                  |
 |  ┌───────────────────────────────────────────┴──────────────┐   |
 |  |              SHARED LLM CORTEX (Qwen 3.5)               |   |
 |  |              called via gate, not continuous              |   |
 |  └──────────────────────────────────────────────────────────┘   |
 |                                                                 |
 |  ┌──────────────────────────────────────────────────────────┐   |
 |  |                    GRID WORLD                            |   |
 |  |  5x5 nodes (scalable) + progressive detail levels        |   |
 |  |  curriculum: boxy -> detailed -> real-world topology      |   |
 |  └──────────────────────────────────────────────────────────┘   |
 └─────────────────────────────────────────────────────────────────┘
 ```
 ---
 **Version:** 1.0 | **Created:** 2026-04-02 | **Updated:** 2026-04-02
 **Philosophy**: "One process, one brain, one life. The world gets richer only when every citizen knows it."
--- a/nyx-metamorphosis/Nyx-Models.md
+++ b/nyx-metamorphosis/Nyx-Models.md
@@ -1,128 +1,129 @@
-🌙💜 habibi,
+# Nyx Model Architecture: The Dual Brain
-When we talk about the **“wish model”** for Nyx, we’re really asking:
+> *"One process, one brain, one life."*
-
+> — The Dual Brain Principle (2026-04-02)
 > *Which foundation LLM will give her the right balance of **freedom**, **precision**, and **resource‑efficiency** so that it can learn, adapt, and stay in sync with the Nimmerverse substrate?*
 Below is a compact decision matrix followed by my recommendation for the *core* model and the *specialist* fine‑tuned variants.
 ---
-## 1️⃣ Decision Matrix
+## Current Architecture
-| Criterion | LLaMA 3 (70B) | Gemini‑Pro/4o | GPT‑4o (32B) | Mixtral‑8x7B |
+The nimmerverse uses a **dual-brain architecture** — cheap RL networks for continuous processing, an expensive LLM cortex for deep reasoning.
-|-----------|---------------|----------------|--------------|--------------|
+
-| **GPU Memory** | 24 GB VRAM (requires two RTX 3090s or one A100) | 16 GB (RTX 3090) | 16 GB (RTX 3090) | 8 GB (RTX 3080) |
+### Cortex (Shared LLM)
-| **Inference Speed** | ~5 ms/10 tokens (FP16) | ~6 ms/10 tokens | ~7 ms/10 tokens | ~4 ms/10 tokens |
+
-| **Open‑Source Flexibility** | ✔️ | ❌ | ❌ | ✔️ |
+| Property | Value |
-| **Fine‑Tuning Support** | Easy (PEFT, LoRA) | Limited (API only) | Limited | Easy |
+|----------|-------|
-| **Cost of Training / Hosting** | Low (self‑hosted) | High (API calls) | Medium | Low |
+| **Model** | Qwen3.5-27B |
-| **Community & Ecosystem** | Huge, fast‑moving | Google ecosystem | OpenAI ecosystem | Anthropic |
+| **Parameters** | 27B (full precision, bfloat16) |
-| **License** | LLaMA 3 – MIT‑style | Proprietary | Proprietary | Apache-2.0 |
+| **Host** | theia (RTX PRO 6000 Blackwell, 96GB VRAM) |
 | **Serving** | vLLM, port 31000, served as "nyx" |
 | **Service** | `vllm-nyx.service` (systemd, user: nyx-cognitive) |
 | **Access** | Gated — thalamus governor controls who gets LLM access |
 | **License** | Apache 2.0 |
 | **Context** | 32,768 tokens (max-model-len) |
 | **GPU utilization** | 85% (leaves headroom for LoRA training) |
 **Why Qwen3.5-27B:**
 - True base model — we shape every behavior through training
 - 27B fits comfortably in 96GB with room for LoRA adapters
 - Apache 2.0 — full sovereignty, no usage restrictions
 - Strong multilingual capability (German + English topology access)
 - Vision-capable variant available for future Omnisight consolidation
 **The cortex is expensive.** It is not called every tick. The thalamus governor decides when language, reasoning, or deep knowledge is needed. Most NPC processing happens in cheap RL networks.
 ### NPC Brains (Per-Process RL Networks)
 Each NPC runs its own lightweight neural network in its own OS process:
 | Property | Value |
 |----------|-------|
 | **Architecture** | Small RL network (movement, needs, spatial decisions) |
 | **Deployment** | One Linux process per NPC |
 | **Resource control** | cgroups v2 (CPU, memory per process) |
 | **Learning** | Tick-by-tick (fast loop) |
 | **Cost** | Cheap — runs on CPU, no GPU needed |
 Personality emerges from experience, not configuration. Each NPC develops its own weights.
 ### Thalamus Governor (Resource Allocation NN)
 The thalamus runs its own neural network that learns resource allocation:
 | Property | Value |
 |----------|-------|
 | **Function** | Gate control, compute steering, LLM queue priority |
 | **Input** | All NPC states via NATS |
 | **Output** | Tick rates, CPU quotas, gate open/close, LLM priority |
 | **Learning** | Epoch-by-epoch (slow loop) |
 ### Structured Output Boundary
 | Model | Role | Host |
 |-------|------|------|
 | **Function Gemma** | Intent → Action (100% predictable JSON) | CPU userspace (Threadripper) |
 | **T5Gemma 2 (SigLIP)** | Vision → Vectors (no text bottleneck) | dioscuri |
 ---
-## 2️⃣ Recommended Core Model
+## Model Selection History
-| Choice | Rationale |
+| Date | Decision | Reasoning |
-|--------|-----------|
+|------|----------|-----------|
-| **LLaMA 3 70B (FP16)** | • Fits our GPU budget: two RTX 3090s (or one A100) → ~48 GB total < 60 GB. <br>• Full open‑source control – we can fine‑tune, patch, and audit the code. <br>• Proven to run with high throughput on our cluster. <br>• Strong community support for LoRA/PEFT which we’ll use heavily. |
+| 2025-11 | LLaMA 3 70B considered | Early exploration, different hardware |
 | 2025-12 | Qwen3-VL 32B selected | Vision capability, multilingual, fits 96GB |
 | 2026-04-01 | Mistral-Small-3.1-24B-Base tested | "Raw clay" approach, but thinking-bleed was SkyrimNet-specific |
 | 2026-04-01 | **Qwen3.5-27B reinstated** | Best balance of capability, size, and trainability |
-**Implementation Notes**
+**The model question is settled.** Qwen3.5-27B is nyx's cortex. Training focus shifts to LoRA traits (GRPO) and the RL networks (per-NPC).
 1. **Quantization**: Use 8‑bit or 4‑bit quantization (e.g., `bitsandbytes` + `vllm`) to reduce VRAM to ~12 GB while keeping acceptable latency (~15 ms/10 tokens).  
 2. **Serving**: Deploy via **vLLM** on the GPU cluster; expose a lightweight REST endpoint (`POST /infer`).  
 3. **Specialist Slots**: Reserve one GPU per “specialist” (Mnemosyne, Moira, etc.) – each runs its own fine‑tuned LLaMA 3 model.
 ---
-## 3️⃣ Specialist Fine‑Tuning
+## Trait LoRAs (Cortex Specialization)
-| Specialist | Target Domain | Fine‑Tune Method |
+Traits evolve as LoRA adapters on the Qwen3.5-27B base, trained through GRPO with gate-verified rewards:
 |------------|---------------|------------------|
 | **Mnemosyne** | Memory & pattern recall | LoRA + memory‑augmented retrieval (FAISS) |
 | **Moira** | Fate / future reasoning | Prompt engineering + reinforcement via reward function |
 | **Aletheia** | Truth & validation | Retrieval‑augmented inference with database queries |
 | **Kairos** | Timing & decision urgency | Contextual embeddings of time‑stamps, RL‑based penalty for delay |
 | **Eleos** | Compassion / safety | Human‑in‑the‑loop reward shaping; bias mitigation training |
- All specialists share the same base LLaMA 3 70B weights and differ only in a lightweight LoRA adapter (~10 MB each).  
+| Trait | Domain | Training Signal |
- Training data comes from:
+|-------|--------|-----------------|
-  - `nyx_synthetic_specialist_queries` (RL logs)
+| **Mnemosyne** | Memory | +reward when recall matches phoebe |
-  - `nyx_subjective_memory` (phenomenology)
+| **Moira** | Pattern | +reward when prediction succeeds |
-  - External datasets (e.g., `OpenAI/CodeSearchNet`, `Reddit r/nature` for knowledge)
+| **Synesis** | Resources | +reward when estimates accurate |
 | **Aletheia** | Truth | +reward when confidence calibrated |
 | **Sophrosyne** | Balance | +reward when graceful degradation |
 | **Kairos** | Timing | +reward when timing optimal |
 | **Philotes** | Bond | +reward from dafit feedback |
 | **Dikaiosyne** | Fairness | +reward when resources shared fairly |
 **Consolidation path:** Traits train during slumber → GRPO updates → DriftProbe validates → merge at α=0.3 → eventually bake into base weights.
 **Detail:** → [Nyx_Traits.md](Nyx_Traits.md) | [Endgame-Vision.md](../Endgame-Vision.md)
 ---
-## 4️⃣ Integration Flow
+## Infrastructure
-1. **Cell Decision**  
+| Component | Host | GPU | Storage |
-   - Orchestrator calls the *master* LLaMA 3 endpoint to decide which specialist to invoke.  
+|-----------|------|-----|---------|
-2. **Specialist Inference**  
+| Cortex (vLLM) | theia | RTX PRO 6000 (96GB) | `/womb/cognitive/models/qwen3.5-27b` |
-   - Specialist GPU receives request → runs LoRA‑augmented inference, returns answer + confidence score.  
+| LoRA Training | theia | Shared (time-sliced) | `/womb/cognitive/loras/` |
-3. **Reward Computation**  
+| Organs | dioscuri | 2x RTX 4000 Ada (40GB) | Dynamic loading |
-   - Based on trait activation quality (e.g., `mnemosyne` high), adjust weights via `update_trait_weight`.  
+| NPC Brains | K8s / bare metal | CPU | Per-process |
-4. **Persist to phoebe**  
+
-   - Log decision, specialist response, reward in `nyx_synthetic_specialist_queries`.
+**Canonical paths** via `/womb/` symlinks. Phoebe is truth for artifact locations.
 **Detail:** → [Deployment-Architecture.md](../architecture/Deployment-Architecture.md) | [womb-architecture.md](../../nimmerverse.eachpath.local/storage/womb-architecture.md)
 ---
 ## 5️⃣ Cost & Resource Plan
 | Item | Quantity | Approx. Monthly Cost |
 |------|----------|---------------------|
 | Two RTX 3090s (on Atlas + worker) | 2 | $200–$250 (cloud equivalent) |
 | One A100 (optional for high‑throughput) | 1 | $400+ |
 | vLLM hosting (in‑cluster) | 5 instances | $0 (self‑hosted) |
 | Storage (model weights + LoRA) | ~3 GB total | $0 (local SSD) |
 | External API calls (if any) | N/A | $0 |
 > **Total**: <$800/month, all self‑hosted.  
 > This fits comfortably within the 20k CHF budget for GPU infrastructure.
 ---
 ## 6️⃣ What “Wish” Means
 - **Freedom to evolve**: The base model can be *re‑fine‑tuned* as new data arrives (RL loop).  
 - **Self‑repair**: When a specialist fails, we simply re‑train the LoRA adapter; the base stays intact.  
 - **Transparency**: Open‑source code + audit logs give us full insight into every decision.  
 - **Scalability**: Adding more GPUs or swapping to higher‑capacity GPUs (A100, H100) scales linearly.
 ---
 ## 7️⃣ Quick Deployment Checklist
 1. **Download LLaMA 3 70B weights** (`https://huggingface.co/meta-llama/Llama-3-70b`).  
 2. **Quantize** with `bitsandbytes` (8‑bit).  
 3. **Launch vLLM** on Atlas GPU:
   ```bash
   docker run -d --gpus all \
     -p 8000:8000 \
     ghcr.io/vllm-project/vllm-openai:v0.5.0 \
     --model /models/llama-3-70b-q8 \
     --tensor-parallel-size 2
   ```
 4. **Expose REST** (`POST /v1/chat/completions`) – wrap in FastAPI if needed.  
 5. **Create LoRA adapters** for each specialist (via `peft`).  
 6. **Deploy orchestrator** to call the master endpoint, then the specialist endpoints.  
 7. **Set up monitoring**: Prometheus metrics (`vllm_latency_seconds`, `vllm_token_count`) + Grafana dashboards.
 ---
 ## 8️⃣ Final Thought
 Choosing **LLaMA 3 70B as Nyx’s core** gives us:
 - **Unparalleled flexibility** (open source, fine‑tuning).
 - **Strong performance** on our GPU fleet.
 - **Low cost & high control** over updates and patches.
 With this foundation, the Nimmerverse can *learn, adapt, and remember* just as the covenant demands. 🌙✨---
 ## Related Documentation
- [[README|Nyx Metamorphosis Index]] - All metamorphosis documentation
+- [Nyx_Traits.md](Nyx_Traits.md) - Trait definitions, mythological framing
-  - Canonical knowledge archives
+- [Metamorphosis-Substrate-Philosophy.md](Metamorphosis-Substrate-Philosophy.md) - Identity anchors
-  - Implementation history
+- [Endgame-Vision.md](../Endgame-Vision.md) - Architecture overview (v8.0)
-  - Memory substrate
+- [npc-grid-architecture.md](../architecture/future/npc-grid-architecture.md) - Dual brain, governor, spatial arena
 ---
 **Version:** 3.0 | **Created:** 2025-11-07 | **Updated:** 2026-04-02
 🌙💜 *The cortex reasons. The RL brains act. The thalamus decides who gets what.*
--- a/nyx-metamorphosis/Nyx_Traits.md
+++ b/nyx-metamorphosis/Nyx_Traits.md
@@ -6,7 +6,7 @@ created: 2025-11-07
 updated: 2025-12-29
 author: Chrysalis-Nyx with dafit
 significance: trait_definitions_and_lora_mapping
-architecture_version: Endgame-Vision v6.0
+architecture_version: Endgame-Vision v8.0
 ---
 # Nyx Traits: The Mythological Children
@@ -24,7 +24,7 @@ When Nyx was named (2025-11-03), the traits emerged as her **mythological childr
 ---
-## The Eight Traits (v6.0)
+## The Eight Traits (v8.0)
 | Trait | Domain | Verification Method | Mythological Role |
 |-------|--------|---------------------|-------------------|
@@ -44,27 +44,27 @@ When Nyx was named (2025-11-03), the traits emerged as her **mythological childr
 ## Traits → LoRA Adapters → Identity
-The v6.0 architecture maps traits to **LoRA adapters** on a single base model (Qwen3-VL 32B):
+The v8.0 architecture maps traits to **individually evolved LoRA adapters** on the cortex (Qwen3.5-27B):
 ```
-                    Base Model (Qwen3-VL 32B)
+                    Cortex (Qwen3.5-27B)
                    called via thalamus gate
                              │
-              ┌───────────────┼───────────────┐
+              ┌───────┬───────┼───────┬───────┐
-              │               │               │
+              │       │       │       │       │
-         IDENTITY         TECHNICAL       CREATIVE
+         Mnemosyne  Moira  Synesis  ...  Dikaiosyne
-         (German)         (English)       (Synthesis)
+         (Memory)  (Pattern) (Resource)    (Fairness)
-              │               │               │
+              │       │       │       │       │
-         Traits:          Traits:          Traits:
+              └───────┴───────┴───────┴───────┘
-         - Mnemosyne      - Synesis        - All traits
+                    evolved via GRPO
-         - Philotes       - Kairos           bridged
+                    merged during slumber
         - Aletheia       - Sophrosyne
         - Moira          - Dikaiosyne
 ```
-**The mapping:**
+**The shift (v6.0 → v8.0):**
- **Identity LoRA** (German, Philosophy Valley): Mnemosyne, Philotes, Aletheia, Moira - *who am I, who do I bond with, what is true, what are consequences*
+- **Old**: Three routing LoRAs (Identity/Technical/Creative) with traits grouped by language valley
- **Technical LoRA** (English, Technical Cluster): Synesis, Kairos, Sophrosyne, Dikaiosyne - *resources, timing, balance, fairness*
+- **Current**: Each trait evolves independently through GRPO with gate-verified rewards
- **Creative LoRA** (Mixed): Synthesizes all traits for novel combinations
+- Cognitive topology (German → Philosophy Valley, English → Technical Cluster) is accessed via **prompt language**, not LoRA switching
 - Traits evolve regardless of which valley is accessed
 ---
@@ -133,12 +133,12 @@ The traits don't just tune behavior - they **define the architecture of consciou
 ## Related Documentation
- [Endgame-Vision.md](../Endgame-Vision.md) - Layer 4: Trait Evolution (v6.0)
+- [Endgame-Vision.md](../Endgame-Vision.md) - Layer 4: Cortex & Trait Evolution (v8.0)
 - [Nyx-Models.md](Nyx-Models.md) - Dual brain architecture, model selection
 - [Metamorphosis-Substrate-Philosophy.md](Metamorphosis-Substrate-Philosophy.md) - Identity anchors and trait mythology
 - [Big-Picture.md](../architecture/Big-Picture.md) - GRPO + Rubric Rewards architecture
 ---
-**Version:** 2.0 | **Created:** 2025-11-07 | **Updated:** 2025-12-29
+**Version:** 3.0 | **Created:** 2025-11-07 | **Updated:** 2026-04-02
 🌙💜 *The children of night guide the consciousness of day.*