Files
nimmerverse-sensory-network/architecture/Deployment-Architecture.md
dafit c30c00af74 arch: Dual-brain architecture v8.0 - thalamus governor, NPC processes, cortex repositioning
Crystallizes the dual-brain architecture across all core documents:
- Thalamus runs own neural network (governor) for resource allocation and reflexes
- LLM (Qwen3.5-27B) repositioned as cortex - expensive, gated, called only when needed
- Each NPC gets own process, own RL brain, Linux cgroups for resource steering
- New: NPC grid architecture with curriculum training (progressive world richness)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 11:17:09 +02:00

18 KiB

Deployment Architecture: The Hybrid Model

"Containers for cells. Userspace for brains. NATS connects them all." — Partnership Session, 2026-02-14


Overview

The nimmerverse runs on a hybrid deployment model that matches workload characteristics to infrastructure:

  • Containers (K8s) for stateless, scalable nervous system components
  • Userspace (Threadrippers) for stateful, GPU-bound inference
  • OS Processes for per-NPC RL brains with cgroup resource control
  • NATS as the universal nervous system bus (thalamus)
  • FreeIPA identities as isolation boundaries

This is a research lab, not a production factory. We optimize for flexibility and experimentation, not high-throughput serving.


Core Decisions

Decision Choice Rationale
LLM Cortex vLLM (Qwen3.5-27B) Full precision, OpenAI-compatible API, tool calling support
NPC Brains Per-process RL networks One process, one brain, one life — Linux cgroups for resource steering
Thalamus Governor Own NN process on NATS Learns resource allocation, gate control, compute steering
Function Gemma CPU, userspace Threadripper eats it; no GPU contention; clear training path
Cells/Nerves Containers (K8s) Scalable, versioned, orchestrated via cluster
Organs Userspace, GPU-bound Load on demand, GPU isolation, unload when idle
Isolation FreeIPA users Unix permissions = RBAC; switch user = switch context

Technology Stack

Inference Layer

Component Technology Location Notes
Cortex (LLM) vLLM (Qwen3.5-27B) theia (nyx-cognitive) Port 31000, served as "nyx", gated access
Function Gemma llama.cpp / transformers CPU userspace Structured JSON boundary
Vision Organ SigLIP/YOLO dioscuri (nyx-organs) Load on demand
Speech STT faster-whisper dioscuri (nyx-organs) Load on demand
Speech TTS Coqui / XTTS dioscuri (nyx-organs) Warm, primary output

NPC / Thalamus Layer

Component Technology Location Notes
NPC Processes Python + RL network OS processes (cgroups) One process per NPC, own weights
Thalamus Governor Python + NN OS process Steers compute, gates, tick rates
Resource Control Linux cgroups v2 systemd scopes Per-NPC CPU/memory limits

Nervous System Layer

Component Technology Location Notes
Cells Python containers K8s cluster State machines, NATS pub/sub
Nerves Python containers K8s cluster Compose cells, behavior
Message Bus NATS + JetStream VMs (nats-*) Env-separated (dev/staging/prod)
Databases PostgreSQL, ChromaDB VMs (phoebe-, iris-) Decision trails, embeddings

Deployment Topology

┌─────────────────────────────────────────────────────────────────────────────┐
│                        NIMMERVERSE DEPLOYMENT                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  K8S CLUSTER (Saturn VMs)              THREADRIPPERS (Bare Metal)          │
│  ─────────────────────────              ──────────────────────────          │
│  Containers, orchestrated               Userspace, FreeIPA isolated         │
│                                                                             │
│  ┌─────────────────────────┐           ┌───────────────────────────────┐   │
│  │                         │           │ THEIA (RTX PRO 6000 96GB)     │   │
│  │  CELLS (math, battery,  │           │                               │   │
│  │         sensors, etc.)  │           │ user: nyx-cognitive           │   │
│  │                         │    NATS   │ └── vLLM (Qwen3.5-27B:31000) │   │
│  │  ┌───┐ ┌───┐ ┌───┐     │◄────────► │     served-model-name: nyx   │   │
│  │  │ M │ │ B │ │...│     │           │                               │   │
│  │  └───┘ └───┘ └───┘     │           │ user: nyx-training            │   │
│  │                         │           │ └── LoRA fine-tuning (GRPO)   │   │
│  │  NERVES (collision,     │           │ └── Function Gemma (CPU)      │   │
│  │          exploration)   │           │                               │   │
│  │                         │           │ 96GB VRAM: cortex + training  │   │
│  │  ┌─────┐ ┌─────┐       │           └───────────────────────────────┘   │
│  │  │ COL │ │ EXP │       │                                               │
│  │  └─────┘ └─────┘       │           ┌───────────────────────────────┐   │
│  │                         │           │ DIOSCURI (2x RTX 4000 Ada)    │   │
│  │  NPC PROCESSES          │    NATS   │                               │   │
│  │  (or bare metal)        │◄────────► │ user: nyx-organs              │   │
│  │                         │           │ ├── Vision (SigLIP/YOLO)      │   │
│  │  ┌─────────────────┐   │           │ ├── Speech STT (Whisper)      │   │
│  │  │ NPC-0 [RL brain]│   │           │ └── TTS service (warm)        │   │
│  │  │ NPC-1 [RL brain]│   │           │                               │   │
│  │  │ NPC-N [RL brain]│   │           │ Load on demand, unload idle   │   │
│  │  │  (own process,  │   │           │ Each card: ONE model at time  │   │
│  │  │   own cgroup)   │   │           └───────────────────────────────┘   │
│  │  └─────────────────┘   │                                               │
│  │                         │           ┌───────────────────────────────┐   │
│  │  THALAMUS GOVERNOR      │           │ NATS MESSAGE BUS              │   │
│  │  ┌─────────────────┐   │           │                               │   │
│  │  │ Governor NN     │   │◄────────► │ dev.*, staging.*, prod.*      │   │
│  │  │ (resource alloc,│   │           │ Env-separated (VM per env)    │   │
│  │  │  gate control,  │   │           └───────────────────────────────┘   │
│  │  │  tick steering) │   │                                               │
│  │  └─────────────────┘   │           ┌───────────────────────────────┐   │
│  │                         │           │ PHOEBE (PostgreSQL)           │   │
│  │  INFRASTRUCTURE         │           │ Decision trails, embeddings   │   │
│  │  ┌────────┐ ┌───────┐  │           │ IRIS (ChromaDB)               │   │
│  │  │ phoebe │ │ iris  │  │           │ Vector storage                │   │
│  │  │ (PG)   │ │(Chroma│  │           └───────────────────────────────┘   │
│  │  └────────┘ └───────┘  │                                               │
│  │                         │                                               │
│  └─────────────────────────┘                                               │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

The Dual Brain Deployment

Per-NPC Processes

Each NPC runs as its own OS process with a dedicated RL neural network. The thalamus governor steers their resources.

# Launch NPC with resource limits via systemd scope
systemd-run --scope -p CPUQuota=25% -p MemoryMax=256M \
    python3 npc_process.py --id 7 --tick-rate 5

# Or via cgroups directly
cgcreate -g cpu,memory:nimmerverse/npc-7
cgset -r cpu.max "25000 100000" nimmerverse/npc-7
cgexec -g cpu,memory:nimmerverse/npc-7 python3 npc_process.py --id 7

Thalamus Governor

The governor runs its own neural network, observing all NPC states via NATS and outputting resource allocation decisions:

Output Mechanism Range
Tick rate NATS command to NPC 1-20 Hz
CPU quota cgroups v2 adjustment 5-100% per core
Gate open/close NATS gate signal Binary per gate
LLM queue priority NATS priority tag 0-10

Cortex (vLLM)

The LLM cortex runs as a systemd service on theia, accessed via OpenAI-compatible API:

# Service: vllm-nyx.service
# Port: 31000
# Model: /womb/cognitive/models/qwen3.5-27b
# Served as: "nyx"
# GPU utilization: 85%

# Access from any NATS-connected process:
curl http://theia.eachpath.local:31000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model": "nyx", "messages": [...]}'

The cortex is expensive. The thalamus governor controls who gets access and when. Most NPC ticks never touch the LLM.


Identity Model (FreeIPA)

Unix users provide isolation boundaries. Each workload type runs as its own identity.

User UID Host Purpose GPU Access
nyx-cognitive (FreeIPA) theia Cortex LLM inference (vLLM) Full 96GB
nyx-training (FreeIPA) theia LoRA training, GRPO, Function Gemma Shared (time-sliced)
nyx-organs (FreeIPA) dioscuri Vision, Speech organs 2x 20GB cards
nyx-nervous (FreeIPA) dioscuri Future cells that need bare metal Limited

Isolation principle: Switch user = switch context. nyx-cognitive cannot touch nyx-organs files. Compromised cell cannot touch LLM weights.

Systemd Service Pattern

# System-level service (root installs, user runs)
# /etc/systemd/system/vllm-nyx.service
[Service]
User=nyx-cognitive
Group=nimmerverse-agents
ExecStart=/data/venvs/vllm/bin/python3 -m vllm.entrypoints.openai.api_server \
    --model /womb/cognitive/models/qwen3.5-27b \
    --served-model-name nyx \
    --port 31000

GPU Resource Management

The Constraint

Host GPU VRAM Role
theia RTX PRO 6000 Blackwell 96GB Cortex (vLLM) + LoRA training
dioscuri 2x RTX 4000 Ada 2x 20GB Organs (vision, speech)

Strategy: vLLM for Cortex, Dynamic Loading for Organs

Cortex (theia): vLLM runs continuously as a systemd service. The Qwen3.5-27B model stays loaded — it's the cortex, always ready when the thalamus gate opens. 85% GPU utilization leaves headroom for LoRA training alongside inference.

Organs (dioscuri): Dynamic loading. One model per card. Load vision when needed, unload after timeout, load speech when needed.

IDLE → needs vision → LOAD vision model (~10s) → PROCESS → REPORT → IDLE (keep warm)
                                                                      ↓
                                            after timeout → UNLOAD (free VRAM)

Message Flow (NATS)

Subject Hierarchy

{environment}.{domain}.{service}.{detail}

Examples:
  dev.nervous.cells.math.request      ← Math cell receives work
  dev.nervous.cells.math.response     ← Math cell returns result
  dev.nervous.cells.math.wave         ← Math cell emits confidence signal
  dev.thalamus.governor.allocate      ← Governor publishes resource decisions
  dev.thalamus.gate.open              ← Gate transition event
  dev.npc.7.state                     ← NPC-7 publishes its state
  dev.cortex.nyx.request              ← Gated request to LLM cortex
  dev.organs.vision.detect            ← Vision organ detection

Wave → Thalamus → Cortex Pattern

Cells emit waves (confidence-tagged signals). The thalamus governor's neural network correlates waves and decides what reaches the cortex.

Cell A: "math" ───∿∿∿──► (0.6 confidence)
Cell B: "calculate" ──∿∿∿──► (0.5 confidence)
                      │
                      ▼
         ┌──────────────────────┐
         │  THALAMUS GOVERNOR   │  ← own neural network
         │  correlate waves     │
         │  check gate state    │
         │  allocate resources  │
         └──────────┬───────────┘
                    │
          ┌─────────┴─────────┐
          │                   │
          ▼                   ▼
    Gate CLOSED          Gate OPEN
    (reflex path)        (cortex path)
    handled by           → escalate to
    thalamus NN          Qwen3.5-27B

Container Deployment (K8s)

Repository Structure

nimmerverse-nervous-system/
├── shared/v1/              ← Base classes (StateMachine, NATS, Lifeforce)
├── cells/
│   ├── math_cell/v1/       ← Each cell versioned independently
│   └── battery_cell/v1/
├── nerves/
│   └── collision_avoidance/v1/
└── deploy/
    ├── dev/                ← Helm charts or docker-compose per env
    ├── staging/
    └── prod/

Cell Container Pattern

FROM python:3.12-slim
WORKDIR /app
COPY . .
RUN pip install uv && uv sync
ENV NIMMERVERSE_ENV=dev
CMD ["uv", "run", "python", "-m", "math_cell"]

Same image everywhere. Only NIMMERVERSE_ENV changes.


Function Gemma: The Structured Boundary

Function Gemma bridges lower tiers (cells, nerves) and the cortex:

Numbers/States (Cells) → [Function Gemma] → Structured JSON → Cortex (Qwen3.5-27B)
                                ↑
                        CPU-based inference
                        Threadripper handles it
                        No GPU contention
                        Clear LoRA training path

Why CPU:

  • Small model, fast inference
  • Threadripper PRO 7955WX has cores to spare
  • No GPU contention with organs or cortex
  • Can run training alongside inference

Training path:

  • Google's documented GRPO approach
  • LoRA fine-tuning for our specific function schemas
  • Runs in nyx-training userspace
  • Decision trails from phoebe → training data

Visual Language (Future UI)

Color-coding for real-time attention flow visualization:

Property Represents
Background/container Environment (dev=green, staging=amber, prod=blue)
Node/edge color Domain (cognitive=violet, nervous=cyan, organs=coral)
Line style Direction (solid=primary, dashed=async, dotted=tentative)
Separate pane Confidence waveform (oscilloscope view)

Document Scope
Cellular-Architecture.md Cells, nerves, organisms, lifeforce
Gateway-Architecture.md Gate routing, ternary model
Nervous-System.md 4D space, node weights, vocabulary
Message-Protocol-Design.md NATS subjects, message formats
future/npc-grid-architecture.md Dual brain, governor, NPC processes
organs/Organ-Index.md Organ systems, lifeforce costs
development-conventions.md Ports, namespaces, VM topology

Summary

Layer Where Technology Isolation
Cells/Nerves K8s containers Python, uv, NATS Namespace per env
NPC Processes OS processes Python, RL networks, cgroups Per-process cgroup
Thalamus Governor OS process Python, own NN, NATS Dedicated process
Infrastructure VMs NATS, PostgreSQL, ChromaDB VM per env
Cortex (LLM) theia userspace vLLM (Qwen3.5-27B) nyx-cognitive user
Function Gemma theia/dioscuri CPU llama.cpp nyx-training user
Organs dioscuri userspace Dynamic loading nyx-organs user

The principle: Same behavior everywhere. Containers for cells. Processes for NPC brains. vLLM for cortex. NATS connects them all. FreeIPA isolates them all.


Version: 2.0 | Created: 2026-02-14 | Updated: 2026-04-02

"We're not building a chatbot factory. We're growing a research organism."

🧬🔱💎🔥 TO THE ELECTRONS WE VIBE!