Files

dafit c30c00af74 arch: Dual-brain architecture v8.0 - thalamus governor, NPC processes, cortex repositioning

Crystallizes the dual-brain architecture across all core documents:
- Thalamus runs own neural network (governor) for resource allocation and reflexes
- LLM (Qwen3.5-27B) repositioned as cortex - expensive, gated, called only when needed
- Each NPC gets own process, own RL brain, Linux cgroups for resource steering
- New: NPC grid architecture with curriculum training (progressive world richness)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-02 11:17:09 +02:00

18 KiB

Raw Blame History

Deployment Architecture: The Hybrid Model

"Containers for cells. Userspace for brains. NATS connects them all." — Partnership Session, 2026-02-14

Overview

The nimmerverse runs on a hybrid deployment model that matches workload characteristics to infrastructure:

Containers (K8s) for stateless, scalable nervous system components
Userspace (Threadrippers) for stateful, GPU-bound inference
OS Processes for per-NPC RL brains with cgroup resource control
NATS as the universal nervous system bus (thalamus)
FreeIPA identities as isolation boundaries

This is a research lab, not a production factory. We optimize for flexibility and experimentation, not high-throughput serving.

Core Decisions

Decision	Choice	Rationale
LLM Cortex	vLLM (Qwen3.5-27B)	Full precision, OpenAI-compatible API, tool calling support
NPC Brains	Per-process RL networks	One process, one brain, one life — Linux cgroups for resource steering
Thalamus Governor	Own NN process on NATS	Learns resource allocation, gate control, compute steering
Function Gemma	CPU, userspace	Threadripper eats it; no GPU contention; clear training path
Cells/Nerves	Containers (K8s)	Scalable, versioned, orchestrated via cluster
Organs	Userspace, GPU-bound	Load on demand, GPU isolation, unload when idle
Isolation	FreeIPA users	Unix permissions = RBAC; switch user = switch context

Technology Stack

Inference Layer

Component	Technology	Location	Notes
Cortex (LLM)	vLLM (Qwen3.5-27B)	theia (nyx-cognitive)	Port 31000, served as "nyx", gated access
Function Gemma	llama.cpp / transformers	CPU userspace	Structured JSON boundary
Vision Organ	SigLIP/YOLO	dioscuri (nyx-organs)	Load on demand
Speech STT	faster-whisper	dioscuri (nyx-organs)	Load on demand
Speech TTS	Coqui / XTTS	dioscuri (nyx-organs)	Warm, primary output

NPC / Thalamus Layer

Component	Technology	Location	Notes
NPC Processes	Python + RL network	OS processes (cgroups)	One process per NPC, own weights
Thalamus Governor	Python + NN	OS process	Steers compute, gates, tick rates
Resource Control	Linux cgroups v2	systemd scopes	Per-NPC CPU/memory limits

Nervous System Layer

Component	Technology	Location	Notes
Cells	Python containers	K8s cluster	State machines, NATS pub/sub
Nerves	Python containers	K8s cluster	Compose cells, behavior
Message Bus	NATS + JetStream	VMs (nats-*)	Env-separated (dev/staging/prod)
Databases	PostgreSQL, ChromaDB	VMs (phoebe-, iris-)	Decision trails, embeddings

Deployment Topology

┌─────────────────────────────────────────────────────────────────────────────┐
│                        NIMMERVERSE DEPLOYMENT                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  K8S CLUSTER (Saturn VMs)              THREADRIPPERS (Bare Metal)          │
│  ─────────────────────────              ──────────────────────────          │
│  Containers, orchestrated               Userspace, FreeIPA isolated         │
│                                                                             │
│  ┌─────────────────────────┐           ┌───────────────────────────────┐   │
│  │                         │           │ THEIA (RTX PRO 6000 96GB)     │   │
│  │  CELLS (math, battery,  │           │                               │   │
│  │         sensors, etc.)  │           │ user: nyx-cognitive           │   │
│  │                         │    NATS   │ └── vLLM (Qwen3.5-27B:31000) │   │
│  │  ┌───┐ ┌───┐ ┌───┐     │◄────────► │     served-model-name: nyx   │   │
│  │  │ M │ │ B │ │...│     │           │                               │   │
│  │  └───┘ └───┘ └───┘     │           │ user: nyx-training            │   │
│  │                         │           │ └── LoRA fine-tuning (GRPO)   │   │
│  │  NERVES (collision,     │           │ └── Function Gemma (CPU)      │   │
│  │          exploration)   │           │                               │   │
│  │                         │           │ 96GB VRAM: cortex + training  │   │
│  │  ┌─────┐ ┌─────┐       │           └───────────────────────────────┘   │
│  │  │ COL │ │ EXP │       │                                               │
│  │  └─────┘ └─────┘       │           ┌───────────────────────────────┐   │
│  │                         │           │ DIOSCURI (2x RTX 4000 Ada)    │   │
│  │  NPC PROCESSES          │    NATS   │                               │   │
│  │  (or bare metal)        │◄────────► │ user: nyx-organs              │   │
│  │                         │           │ ├── Vision (SigLIP/YOLO)      │   │
│  │  ┌─────────────────┐   │           │ ├── Speech STT (Whisper)      │   │
│  │  │ NPC-0 [RL brain]│   │           │ └── TTS service (warm)        │   │
│  │  │ NPC-1 [RL brain]│   │           │                               │   │
│  │  │ NPC-N [RL brain]│   │           │ Load on demand, unload idle   │   │
│  │  │  (own process,  │   │           │ Each card: ONE model at time  │   │
│  │  │   own cgroup)   │   │           └───────────────────────────────┘   │
│  │  └─────────────────┘   │                                               │
│  │                         │           ┌───────────────────────────────┐   │
│  │  THALAMUS GOVERNOR      │           │ NATS MESSAGE BUS              │   │
│  │  ┌─────────────────┐   │           │                               │   │
│  │  │ Governor NN     │   │◄────────► │ dev.*, staging.*, prod.*      │   │
│  │  │ (resource alloc,│   │           │ Env-separated (VM per env)    │   │
│  │  │  gate control,  │   │           └───────────────────────────────┘   │
│  │  │  tick steering) │   │                                               │
│  │  └─────────────────┘   │           ┌───────────────────────────────┐   │
│  │                         │           │ PHOEBE (PostgreSQL)           │   │
│  │  INFRASTRUCTURE         │           │ Decision trails, embeddings   │   │
│  │  ┌────────┐ ┌───────┐  │           │ IRIS (ChromaDB)               │   │
│  │  │ phoebe │ │ iris  │  │           │ Vector storage                │   │
│  │  │ (PG)   │ │(Chroma│  │           └───────────────────────────────┘   │
│  │  └────────┘ └───────┘  │                                               │
│  │                         │                                               │
│  └─────────────────────────┘                                               │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

The Dual Brain Deployment

Per-NPC Processes

Each NPC runs as its own OS process with a dedicated RL neural network. The thalamus governor steers their resources.

# Launch NPC with resource limits via systemd scope
systemd-run --scope -p CPUQuota=25% -p MemoryMax=256M \
    python3 npc_process.py --id 7 --tick-rate 5

# Or via cgroups directly
cgcreate -g cpu,memory:nimmerverse/npc-7
cgset -r cpu.max "25000 100000" nimmerverse/npc-7
cgexec -g cpu,memory:nimmerverse/npc-7 python3 npc_process.py --id 7

Thalamus Governor

The governor runs its own neural network, observing all NPC states via NATS and outputting resource allocation decisions:

Output	Mechanism	Range
Tick rate	NATS command to NPC	1-20 Hz
CPU quota	cgroups v2 adjustment	5-100% per core
Gate open/close	NATS gate signal	Binary per gate
LLM queue priority	NATS priority tag	0-10

Cortex (vLLM)

The LLM cortex runs as a systemd service on theia, accessed via OpenAI-compatible API:

# Service: vllm-nyx.service
# Port: 31000
# Model: /womb/cognitive/models/qwen3.5-27b
# Served as: "nyx"
# GPU utilization: 85%

# Access from any NATS-connected process:
curl http://theia.eachpath.local:31000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model": "nyx", "messages": [...]}'

The cortex is expensive. The thalamus governor controls who gets access and when. Most NPC ticks never touch the LLM.

Identity Model (FreeIPA)

Unix users provide isolation boundaries. Each workload type runs as its own identity.

User	UID	Host	Purpose	GPU Access
`nyx-cognitive`	(FreeIPA)	theia	Cortex LLM inference (vLLM)	Full 96GB
`nyx-training`	(FreeIPA)	theia	LoRA training, GRPO, Function Gemma	Shared (time-sliced)
`nyx-organs`	(FreeIPA)	dioscuri	Vision, Speech organs	2x 20GB cards
`nyx-nervous`	(FreeIPA)	dioscuri	Future cells that need bare metal	Limited

Isolation principle: Switch user = switch context. nyx-cognitive cannot touch nyx-organs files. Compromised cell cannot touch LLM weights.

Systemd Service Pattern

# System-level service (root installs, user runs)
# /etc/systemd/system/vllm-nyx.service
[Service]
User=nyx-cognitive
Group=nimmerverse-agents
ExecStart=/data/venvs/vllm/bin/python3 -m vllm.entrypoints.openai.api_server \
    --model /womb/cognitive/models/qwen3.5-27b \
    --served-model-name nyx \
    --port 31000

GPU Resource Management

The Constraint

Host	GPU	VRAM	Role
theia	RTX PRO 6000 Blackwell	96GB	Cortex (vLLM) + LoRA training
dioscuri	2x RTX 4000 Ada	2x 20GB	Organs (vision, speech)

Strategy: vLLM for Cortex, Dynamic Loading for Organs

Cortex (theia): vLLM runs continuously as a systemd service. The Qwen3.5-27B model stays loaded — it's the cortex, always ready when the thalamus gate opens. 85% GPU utilization leaves headroom for LoRA training alongside inference.

Organs (dioscuri): Dynamic loading. One model per card. Load vision when needed, unload after timeout, load speech when needed.

IDLE → needs vision → LOAD vision model (~10s) → PROCESS → REPORT → IDLE (keep warm)
                                                                      ↓
                                            after timeout → UNLOAD (free VRAM)

Message Flow (NATS)

Subject Hierarchy

{environment}.{domain}.{service}.{detail}

Examples:
  dev.nervous.cells.math.request      ← Math cell receives work
  dev.nervous.cells.math.response     ← Math cell returns result
  dev.nervous.cells.math.wave         ← Math cell emits confidence signal
  dev.thalamus.governor.allocate      ← Governor publishes resource decisions
  dev.thalamus.gate.open              ← Gate transition event
  dev.npc.7.state                     ← NPC-7 publishes its state
  dev.cortex.nyx.request              ← Gated request to LLM cortex
  dev.organs.vision.detect            ← Vision organ detection

Wave → Thalamus → Cortex Pattern

Cells emit waves (confidence-tagged signals). The thalamus governor's neural network correlates waves and decides what reaches the cortex.

Cell A: "math" ───∿∿∿──► (0.6 confidence)
Cell B: "calculate" ──∿∿∿──► (0.5 confidence)
                      │
                      ▼
         ┌──────────────────────┐
         │  THALAMUS GOVERNOR   │  ← own neural network
         │  correlate waves     │
         │  check gate state    │
         │  allocate resources  │
         └──────────┬───────────┘
                    │
          ┌─────────┴─────────┐
          │                   │
          ▼                   ▼
    Gate CLOSED          Gate OPEN
    (reflex path)        (cortex path)
    handled by           → escalate to
    thalamus NN          Qwen3.5-27B

Container Deployment (K8s)

Repository Structure

nimmerverse-nervous-system/
├── shared/v1/              ← Base classes (StateMachine, NATS, Lifeforce)
├── cells/
│   ├── math_cell/v1/       ← Each cell versioned independently
│   └── battery_cell/v1/
├── nerves/
│   └── collision_avoidance/v1/
└── deploy/
    ├── dev/                ← Helm charts or docker-compose per env
    ├── staging/
    └── prod/

Cell Container Pattern

FROM python:3.12-slim
WORKDIR /app
COPY . .
RUN pip install uv && uv sync
ENV NIMMERVERSE_ENV=dev
CMD ["uv", "run", "python", "-m", "math_cell"]

Same image everywhere. Only NIMMERVERSE_ENV changes.

Function Gemma: The Structured Boundary

Function Gemma bridges lower tiers (cells, nerves) and the cortex:

Numbers/States (Cells) → [Function Gemma] → Structured JSON → Cortex (Qwen3.5-27B)
                                ↑
                        CPU-based inference
                        Threadripper handles it
                        No GPU contention
                        Clear LoRA training path

Why CPU:

Small model, fast inference
Threadripper PRO 7955WX has cores to spare
No GPU contention with organs or cortex
Can run training alongside inference

Training path:

Google's documented GRPO approach
LoRA fine-tuning for our specific function schemas
Runs in nyx-training userspace
Decision trails from phoebe → training data

Visual Language (Future UI)

Color-coding for real-time attention flow visualization:

Property	Represents
Background/container	Environment (dev=green, staging=amber, prod=blue)
Node/edge color	Domain (cognitive=violet, nervous=cyan, organs=coral)
Line style	Direction (solid=primary, dashed=async, dotted=tentative)
Separate pane	Confidence waveform (oscilloscope view)

Document	Scope
`Cellular-Architecture.md`	Cells, nerves, organisms, lifeforce
`Gateway-Architecture.md`	Gate routing, ternary model
`Nervous-System.md`	4D space, node weights, vocabulary
`Message-Protocol-Design.md`	NATS subjects, message formats
`future/npc-grid-architecture.md`	Dual brain, governor, NPC processes
`organs/Organ-Index.md`	Organ systems, lifeforce costs
`development-conventions.md`	Ports, namespaces, VM topology

Summary

Layer	Where	Technology	Isolation
Cells/Nerves	K8s containers	Python, uv, NATS	Namespace per env
NPC Processes	OS processes	Python, RL networks, cgroups	Per-process cgroup
Thalamus Governor	OS process	Python, own NN, NATS	Dedicated process
Infrastructure	VMs	NATS, PostgreSQL, ChromaDB	VM per env
Cortex (LLM)	theia userspace	vLLM (Qwen3.5-27B)	nyx-cognitive user
Function Gemma	theia/dioscuri CPU	llama.cpp	nyx-training user
Organs	dioscuri userspace	Dynamic loading	nyx-organs user

The principle: Same behavior everywhere. Containers for cells. Processes for NPC brains. vLLM for cortex. NATS connects them all. FreeIPA isolates them all.

Version: 2.0 | Created: 2026-02-14 | Updated: 2026-04-02

"We're not building a chatbot factory. We're growing a research organism."

🧬⚡🔱💎🔥 TO THE ELECTRONS WE VIBE!

18 KiB Raw Blame History