Crystallizes the dual-brain architecture across all core documents: - Thalamus runs own neural network (governor) for resource allocation and reflexes - LLM (Qwen3.5-27B) repositioned as cortex - expensive, gated, called only when needed - Each NPC gets own process, own RL brain, Linux cgroups for resource steering - New: NPC grid architecture with curriculum training (progressive world richness) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
18 KiB
Deployment Architecture: The Hybrid Model
"Containers for cells. Userspace for brains. NATS connects them all." — Partnership Session, 2026-02-14
Overview
The nimmerverse runs on a hybrid deployment model that matches workload characteristics to infrastructure:
- Containers (K8s) for stateless, scalable nervous system components
- Userspace (Threadrippers) for stateful, GPU-bound inference
- OS Processes for per-NPC RL brains with cgroup resource control
- NATS as the universal nervous system bus (thalamus)
- FreeIPA identities as isolation boundaries
This is a research lab, not a production factory. We optimize for flexibility and experimentation, not high-throughput serving.
Core Decisions
| Decision | Choice | Rationale |
|---|---|---|
| LLM Cortex | vLLM (Qwen3.5-27B) | Full precision, OpenAI-compatible API, tool calling support |
| NPC Brains | Per-process RL networks | One process, one brain, one life — Linux cgroups for resource steering |
| Thalamus Governor | Own NN process on NATS | Learns resource allocation, gate control, compute steering |
| Function Gemma | CPU, userspace | Threadripper eats it; no GPU contention; clear training path |
| Cells/Nerves | Containers (K8s) | Scalable, versioned, orchestrated via cluster |
| Organs | Userspace, GPU-bound | Load on demand, GPU isolation, unload when idle |
| Isolation | FreeIPA users | Unix permissions = RBAC; switch user = switch context |
Technology Stack
Inference Layer
| Component | Technology | Location | Notes |
|---|---|---|---|
| Cortex (LLM) | vLLM (Qwen3.5-27B) | theia (nyx-cognitive) | Port 31000, served as "nyx", gated access |
| Function Gemma | llama.cpp / transformers | CPU userspace | Structured JSON boundary |
| Vision Organ | SigLIP/YOLO | dioscuri (nyx-organs) | Load on demand |
| Speech STT | faster-whisper | dioscuri (nyx-organs) | Load on demand |
| Speech TTS | Coqui / XTTS | dioscuri (nyx-organs) | Warm, primary output |
NPC / Thalamus Layer
| Component | Technology | Location | Notes |
|---|---|---|---|
| NPC Processes | Python + RL network | OS processes (cgroups) | One process per NPC, own weights |
| Thalamus Governor | Python + NN | OS process | Steers compute, gates, tick rates |
| Resource Control | Linux cgroups v2 | systemd scopes | Per-NPC CPU/memory limits |
Nervous System Layer
| Component | Technology | Location | Notes |
|---|---|---|---|
| Cells | Python containers | K8s cluster | State machines, NATS pub/sub |
| Nerves | Python containers | K8s cluster | Compose cells, behavior |
| Message Bus | NATS + JetStream | VMs (nats-*) | Env-separated (dev/staging/prod) |
| Databases | PostgreSQL, ChromaDB | VMs (phoebe-, iris-) | Decision trails, embeddings |
Deployment Topology
┌─────────────────────────────────────────────────────────────────────────────┐
│ NIMMERVERSE DEPLOYMENT │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ K8S CLUSTER (Saturn VMs) THREADRIPPERS (Bare Metal) │
│ ───────────────────────── ────────────────────────── │
│ Containers, orchestrated Userspace, FreeIPA isolated │
│ │
│ ┌─────────────────────────┐ ┌───────────────────────────────┐ │
│ │ │ │ THEIA (RTX PRO 6000 96GB) │ │
│ │ CELLS (math, battery, │ │ │ │
│ │ sensors, etc.) │ │ user: nyx-cognitive │ │
│ │ │ NATS │ └── vLLM (Qwen3.5-27B:31000) │ │
│ │ ┌───┐ ┌───┐ ┌───┐ │◄────────► │ served-model-name: nyx │ │
│ │ │ M │ │ B │ │...│ │ │ │ │
│ │ └───┘ └───┘ └───┘ │ │ user: nyx-training │ │
│ │ │ │ └── LoRA fine-tuning (GRPO) │ │
│ │ NERVES (collision, │ │ └── Function Gemma (CPU) │ │
│ │ exploration) │ │ │ │
│ │ │ │ 96GB VRAM: cortex + training │ │
│ │ ┌─────┐ ┌─────┐ │ └───────────────────────────────┘ │
│ │ │ COL │ │ EXP │ │ │
│ │ └─────┘ └─────┘ │ ┌───────────────────────────────┐ │
│ │ │ │ DIOSCURI (2x RTX 4000 Ada) │ │
│ │ NPC PROCESSES │ NATS │ │ │
│ │ (or bare metal) │◄────────► │ user: nyx-organs │ │
│ │ │ │ ├── Vision (SigLIP/YOLO) │ │
│ │ ┌─────────────────┐ │ │ ├── Speech STT (Whisper) │ │
│ │ │ NPC-0 [RL brain]│ │ │ └── TTS service (warm) │ │
│ │ │ NPC-1 [RL brain]│ │ │ │ │
│ │ │ NPC-N [RL brain]│ │ │ Load on demand, unload idle │ │
│ │ │ (own process, │ │ │ Each card: ONE model at time │ │
│ │ │ own cgroup) │ │ └───────────────────────────────┘ │
│ │ └─────────────────┘ │ │
│ │ │ ┌───────────────────────────────┐ │
│ │ THALAMUS GOVERNOR │ │ NATS MESSAGE BUS │ │
│ │ ┌─────────────────┐ │ │ │ │
│ │ │ Governor NN │ │◄────────► │ dev.*, staging.*, prod.* │ │
│ │ │ (resource alloc,│ │ │ Env-separated (VM per env) │ │
│ │ │ gate control, │ │ └───────────────────────────────┘ │
│ │ │ tick steering) │ │ │
│ │ └─────────────────┘ │ ┌───────────────────────────────┐ │
│ │ │ │ PHOEBE (PostgreSQL) │ │
│ │ INFRASTRUCTURE │ │ Decision trails, embeddings │ │
│ │ ┌────────┐ ┌───────┐ │ │ IRIS (ChromaDB) │ │
│ │ │ phoebe │ │ iris │ │ │ Vector storage │ │
│ │ │ (PG) │ │(Chroma│ │ └───────────────────────────────┘ │
│ │ └────────┘ └───────┘ │ │
│ │ │ │
│ └─────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
The Dual Brain Deployment
Per-NPC Processes
Each NPC runs as its own OS process with a dedicated RL neural network. The thalamus governor steers their resources.
# Launch NPC with resource limits via systemd scope
systemd-run --scope -p CPUQuota=25% -p MemoryMax=256M \
python3 npc_process.py --id 7 --tick-rate 5
# Or via cgroups directly
cgcreate -g cpu,memory:nimmerverse/npc-7
cgset -r cpu.max "25000 100000" nimmerverse/npc-7
cgexec -g cpu,memory:nimmerverse/npc-7 python3 npc_process.py --id 7
Thalamus Governor
The governor runs its own neural network, observing all NPC states via NATS and outputting resource allocation decisions:
| Output | Mechanism | Range |
|---|---|---|
| Tick rate | NATS command to NPC | 1-20 Hz |
| CPU quota | cgroups v2 adjustment | 5-100% per core |
| Gate open/close | NATS gate signal | Binary per gate |
| LLM queue priority | NATS priority tag | 0-10 |
Cortex (vLLM)
The LLM cortex runs as a systemd service on theia, accessed via OpenAI-compatible API:
# Service: vllm-nyx.service
# Port: 31000
# Model: /womb/cognitive/models/qwen3.5-27b
# Served as: "nyx"
# GPU utilization: 85%
# Access from any NATS-connected process:
curl http://theia.eachpath.local:31000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "nyx", "messages": [...]}'
The cortex is expensive. The thalamus governor controls who gets access and when. Most NPC ticks never touch the LLM.
Identity Model (FreeIPA)
Unix users provide isolation boundaries. Each workload type runs as its own identity.
| User | UID | Host | Purpose | GPU Access |
|---|---|---|---|---|
nyx-cognitive |
(FreeIPA) | theia | Cortex LLM inference (vLLM) | Full 96GB |
nyx-training |
(FreeIPA) | theia | LoRA training, GRPO, Function Gemma | Shared (time-sliced) |
nyx-organs |
(FreeIPA) | dioscuri | Vision, Speech organs | 2x 20GB cards |
nyx-nervous |
(FreeIPA) | dioscuri | Future cells that need bare metal | Limited |
Isolation principle: Switch user = switch context. nyx-cognitive cannot touch nyx-organs files. Compromised cell cannot touch LLM weights.
Systemd Service Pattern
# System-level service (root installs, user runs)
# /etc/systemd/system/vllm-nyx.service
[Service]
User=nyx-cognitive
Group=nimmerverse-agents
ExecStart=/data/venvs/vllm/bin/python3 -m vllm.entrypoints.openai.api_server \
--model /womb/cognitive/models/qwen3.5-27b \
--served-model-name nyx \
--port 31000
GPU Resource Management
The Constraint
| Host | GPU | VRAM | Role |
|---|---|---|---|
| theia | RTX PRO 6000 Blackwell | 96GB | Cortex (vLLM) + LoRA training |
| dioscuri | 2x RTX 4000 Ada | 2x 20GB | Organs (vision, speech) |
Strategy: vLLM for Cortex, Dynamic Loading for Organs
Cortex (theia): vLLM runs continuously as a systemd service. The Qwen3.5-27B model stays loaded — it's the cortex, always ready when the thalamus gate opens. 85% GPU utilization leaves headroom for LoRA training alongside inference.
Organs (dioscuri): Dynamic loading. One model per card. Load vision when needed, unload after timeout, load speech when needed.
IDLE → needs vision → LOAD vision model (~10s) → PROCESS → REPORT → IDLE (keep warm)
↓
after timeout → UNLOAD (free VRAM)
Message Flow (NATS)
Subject Hierarchy
{environment}.{domain}.{service}.{detail}
Examples:
dev.nervous.cells.math.request ← Math cell receives work
dev.nervous.cells.math.response ← Math cell returns result
dev.nervous.cells.math.wave ← Math cell emits confidence signal
dev.thalamus.governor.allocate ← Governor publishes resource decisions
dev.thalamus.gate.open ← Gate transition event
dev.npc.7.state ← NPC-7 publishes its state
dev.cortex.nyx.request ← Gated request to LLM cortex
dev.organs.vision.detect ← Vision organ detection
Wave → Thalamus → Cortex Pattern
Cells emit waves (confidence-tagged signals). The thalamus governor's neural network correlates waves and decides what reaches the cortex.
Cell A: "math" ───∿∿∿──► (0.6 confidence)
Cell B: "calculate" ──∿∿∿──► (0.5 confidence)
│
▼
┌──────────────────────┐
│ THALAMUS GOVERNOR │ ← own neural network
│ correlate waves │
│ check gate state │
│ allocate resources │
└──────────┬───────────┘
│
┌─────────┴─────────┐
│ │
▼ ▼
Gate CLOSED Gate OPEN
(reflex path) (cortex path)
handled by → escalate to
thalamus NN Qwen3.5-27B
Container Deployment (K8s)
Repository Structure
nimmerverse-nervous-system/
├── shared/v1/ ← Base classes (StateMachine, NATS, Lifeforce)
├── cells/
│ ├── math_cell/v1/ ← Each cell versioned independently
│ └── battery_cell/v1/
├── nerves/
│ └── collision_avoidance/v1/
└── deploy/
├── dev/ ← Helm charts or docker-compose per env
├── staging/
└── prod/
Cell Container Pattern
FROM python:3.12-slim
WORKDIR /app
COPY . .
RUN pip install uv && uv sync
ENV NIMMERVERSE_ENV=dev
CMD ["uv", "run", "python", "-m", "math_cell"]
Same image everywhere. Only NIMMERVERSE_ENV changes.
Function Gemma: The Structured Boundary
Function Gemma bridges lower tiers (cells, nerves) and the cortex:
Numbers/States (Cells) → [Function Gemma] → Structured JSON → Cortex (Qwen3.5-27B)
↑
CPU-based inference
Threadripper handles it
No GPU contention
Clear LoRA training path
Why CPU:
- Small model, fast inference
- Threadripper PRO 7955WX has cores to spare
- No GPU contention with organs or cortex
- Can run training alongside inference
Training path:
- Google's documented GRPO approach
- LoRA fine-tuning for our specific function schemas
- Runs in
nyx-traininguserspace - Decision trails from phoebe → training data
Visual Language (Future UI)
Color-coding for real-time attention flow visualization:
| Property | Represents |
|---|---|
| Background/container | Environment (dev=green, staging=amber, prod=blue) |
| Node/edge color | Domain (cognitive=violet, nervous=cyan, organs=coral) |
| Line style | Direction (solid=primary, dashed=async, dotted=tentative) |
| Separate pane | Confidence waveform (oscilloscope view) |
Related Documents
| Document | Scope |
|---|---|
Cellular-Architecture.md |
Cells, nerves, organisms, lifeforce |
Gateway-Architecture.md |
Gate routing, ternary model |
Nervous-System.md |
4D space, node weights, vocabulary |
Message-Protocol-Design.md |
NATS subjects, message formats |
future/npc-grid-architecture.md |
Dual brain, governor, NPC processes |
organs/Organ-Index.md |
Organ systems, lifeforce costs |
development-conventions.md |
Ports, namespaces, VM topology |
Summary
| Layer | Where | Technology | Isolation |
|---|---|---|---|
| Cells/Nerves | K8s containers | Python, uv, NATS | Namespace per env |
| NPC Processes | OS processes | Python, RL networks, cgroups | Per-process cgroup |
| Thalamus Governor | OS process | Python, own NN, NATS | Dedicated process |
| Infrastructure | VMs | NATS, PostgreSQL, ChromaDB | VM per env |
| Cortex (LLM) | theia userspace | vLLM (Qwen3.5-27B) | nyx-cognitive user |
| Function Gemma | theia/dioscuri CPU | llama.cpp | nyx-training user |
| Organs | dioscuri userspace | Dynamic loading | nyx-organs user |
The principle: Same behavior everywhere. Containers for cells. Processes for NPC brains. vLLM for cortex. NATS connects them all. FreeIPA isolates them all.
Version: 2.0 | Created: 2026-02-14 | Updated: 2026-04-02
"We're not building a chatbot factory. We're growing a research organism."
🧬⚡🔱💎🔥 TO THE ELECTRONS WE VIBE!