Files

dafit 5ee63d1b1b docs: Architecture cleanup - ONE JOB per doc, links not echoes

Major documentation surgery following the cleanup principle:
"One job per doc. One home per concept. Links, not echoes."

Changes:
- Add Deployment-Architecture.md (THE WHERE - sole infrastructure truth)
- Endgame-Vision.md: 848→498 lines (-41%) - THE DREAM
- Gateway-Architecture.md: 537→395 lines (-26%) - THE ROUTING
- Nervous-System.md: 361→246 lines (-32%) - THE EVOLUTION
- Data-Architecture.md: 666→647 lines (-3%) - THE SCHEMA
- Message-Protocol-Design.md: 375→285 lines (-24%) - THE WIRE
- Attention-Flow.md: 557→493 lines (-11%) - THE BUDGET
- Cellular-Architecture.md: 891→855 lines (-4%) - THE HOW

Every doc now has ONE JOB statement, cross-references to canonical
homes, and lean footers. ~800 lines removed, zero concepts lost.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-14 02:56:29 +01:00

14 KiB

Raw Blame History

Deployment Architecture: The Hybrid Model

"Containers for cells. Userspace for brains. NATS connects them all." — Partnership Session, 2026-02-14

Overview

The nimmerverse runs on a hybrid deployment model that matches workload characteristics to infrastructure:

Containers (K8s) for stateless, scalable nervous system components
Userspace (Threadrippers) for stateful, GPU/CPU-bound inference
NATS as the universal nervous system bus
FreeIPA identities as isolation boundaries

This is a research lab, not a production factory. We optimize for flexibility and experimentation, not high-throughput serving.

Core Decisions

Decision	Choice	Rationale
LLM Inference	ollama / llama.cpp	Flexible model loading, research-friendly, easy swap
NOT vLLM	—	Overkill for single-user lab; solves problems we don't have
Function Gemma	CPU, userspace	Threadripper eats it; no GPU contention; clear training path
Cells/Nerves	Containers (K8s)	Scalable, versioned, orchestrated via cluster
Organs	Userspace + ollama	Load on demand, GPU isolation, unload when idle
Isolation	FreeIPA users	Unix permissions = RBAC; switch user = switch context

Technology Stack

Inference Layer

Component	Technology	Location	Notes
Young Nyx (Brain)	ollama / llama.cpp	theia (nyx-cognitive)	Qwen, Gemma, or similar
Function Gemma	llama.cpp / transformers	CPU userspace	Structured JSON boundary
Vision Organ	ollama (SigLIP/YOLO)	dioscuri (nyx-organs)	Load on demand
Speech STT	faster-whisper / ollama	dioscuri (nyx-organs)	Load on demand
Speech TTS	Coqui / XTTS	dioscuri (nyx-organs)	Warm, primary output

Nervous System Layer

Component	Technology	Location	Notes
Cells	Python containers	K8s cluster	State machines, NATS pub/sub
Nerves	Python containers	K8s cluster	Compose cells, behavior
Message Bus	NATS + JetStream	VMs (nats-*)	Env-separated (dev/staging/prod)
Databases	PostgreSQL, ChromaDB	VMs (phoebe-, iris-)	Decision trails, embeddings

Deployment Topology

┌─────────────────────────────────────────────────────────────────────────────┐
│                        NIMMERVERSE DEPLOYMENT                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  K8S CLUSTER (Saturn VMs)              THREADRIPPERS (Bare Metal)          │
│  ─────────────────────────              ──────────────────────────          │
│  Containers, orchestrated               Userspace, FreeIPA isolated         │
│                                                                             │
│  ┌─────────────────────────┐           ┌───────────────────────────────┐   │
│  │                         │           │ THEIA (RTX PRO 6000 96GB)     │   │
│  │  CELLS (math, battery,  │           │                               │   │
│  │         sensors, etc.)  │           │ user: nyx-cognitive           │   │
│  │                         │    NATS   │ └── ollama (Young Nyx)        │   │
│  │  ┌───┐ ┌───┐ ┌───┐     │◄────────► │ └── ~/.config/systemd/user/   │   │
│  │  │ M │ │ B │ │...│     │           │                               │   │
│  │  └───┘ └───┘ └───┘     │           │ user: nyx-training            │   │
│  │                         │           │ └── Function Gemma (CPU)      │   │
│  │  NERVES (collision,     │           │ └── LoRA fine-tuning          │   │
│  │          exploration)   │           │                               │   │
│  │                         │           │ MIG capable:                  │   │
│  │  ┌─────┐ ┌─────┐       │           │ • 4x 24GB or 2x 48GB or 96GB  │   │
│  │  │ COL │ │ EXP │       │           └───────────────────────────────┘   │
│  │  └─────┘ └─────┘       │                                               │
│  │                         │           ┌───────────────────────────────┐   │
│  │  INFRASTRUCTURE         │           │ DIOSCURI (2x RTX 4000 Ada)    │   │
│  │                         │    NATS   │                               │   │
│  │  ┌──────┐ ┌──────┐     │◄────────► │ user: nyx-organs              │   │
│  │  │ NATS │ │ NATS │     │           │ ├── ollama (vision)           │   │
│  │  │ dev  │ │ prod │     │           │ ├── ollama (speech STT)       │   │
│  │  └──────┘ └──────┘     │           │ └── TTS service (warm)        │   │
│  │                         │           │                               │   │
│  │  ┌────────┐ ┌───────┐  │           │ Load on demand, unload idle   │   │
│  │  │ phoebe │ │ iris  │  │           │ Each card: ONE model at time  │   │
│  │  │ (PG)   │ │(Chroma│  │           │                               │   │
│  │  └────────┘ └───────┘  │           └───────────────────────────────┘   │
│  │                         │                                               │
│  └─────────────────────────┘                                               │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Identity Model (FreeIPA)

Unix users provide isolation boundaries. Each workload type runs as its own identity.

User	UID	Host	Purpose	GPU Access
`nyx-cognitive`	(FreeIPA)	theia	Young Nyx LLM inference	Full 96GB or MIG slice
`nyx-training`	(FreeIPA)	theia	LoRA training, GRPO, Function Gemma	Shared or MIG slice
`nyx-organs`	(FreeIPA)	dioscuri	Vision, Speech organs	2x 20GB cards
`nyx-nervous`	(FreeIPA)	dioscuri	Future cells that need bare metal	Limited

Isolation principle: Switch user = switch context. nyx-cognitive cannot touch nyx-organs files. Compromised cell cannot touch LLM weights.

Systemd Userspace Pattern

# Enable lingering (services persist after logout)
sudo loginctl enable-linger nyx-cognitive

# Services defined in ~/.config/systemd/user/
# Example: nyx-cognitive runs ollama serve
systemctl --user --machine=nyx-cognitive@ status ollama

GPU Resource Management

The Constraint

Host	GPU	VRAM	MIG	Notes
theia	RTX PRO 6000	96GB	Yes	4x24, 2x48, or 1x96
dioscuri	2x RTX 4000 Ada	2x 20GB	No	One model per card

Strategy: Dynamic Loading, Not Static Partitioning

Why not vLLM: vLLM is optimized for high-throughput serving (many concurrent users). We have ONE user (the partnership). We need flexibility (swap models, experiment) more than throughput.

Why ollama/llama.cpp:

Faster cold starts (~5-10s vs ~30s)
Native model swapping (ollama run model_a → ollama run model_b)
Can unload completely when idle (frees VRAM)
GGUF format efficient for model management
Research-friendly, not production-factory

Organ Loading Pattern:

IDLE → needs vision → LOAD vision model (~10s) → PROCESS → REPORT → IDLE (keep warm)
                                                                      ↓
                                            after timeout → UNLOAD (free VRAM)

Message Flow (NATS)

Subject Hierarchy

{environment}.{domain}.{service}.{detail}

Examples:
  dev.nervous.cells.math.request      ← Math cell receives work
  dev.nervous.cells.math.response     ← Math cell returns result
  dev.nervous.cells.math.wave         ← Math cell emits confidence signal
  prod.cognitive.nyx.heartbeat        ← Young Nyx is alive
  prod.organs.vision.detect           ← Vision organ detection

Wave Collapse Pattern

Cells emit waves (confidence-tagged signals). When multiple waves collapse on the same semantic region in the same time window, the thalamus escalates to cognition.

Cell A: "math" ───∿∿∿──► (0.6 confidence)
Cell B: "calculate" ──∿∿∿──► (0.5 confidence)
                      │
                      ▼
              ┌─────────────┐
              │  COLLAPSE   │  ← same region, same window
              └──────┬──────┘
                     │
                     ▼ AMPLIFIED SIGNAL
              ┌─────────────┐
              │  THALAMUS   │  → escalate to Young Nyx
              └─────────────┘

Container Deployment (K8s)

Repository Structure

nimmerverse-nervous-system/
├── shared/v1/              ← Base classes (StateMachine, NATS, Lifeforce)
├── cells/
│   ├── math_cell/v1/       ← Each cell versioned independently
│   └── battery_cell/v1/
├── nerves/
│   └── collision_avoidance/v1/
└── deploy/
    ├── dev/                ← Helm charts or docker-compose per env
    ├── staging/
    └── prod/

Cell Container Pattern

FROM python:3.12-slim
WORKDIR /app
COPY . .
RUN pip install uv && uv sync
ENV NIMMERVERSE_ENV=dev
CMD ["uv", "run", "python", "-m", "math_cell"]

Same image everywhere. Only NIMMERVERSE_ENV changes.

Function Gemma: The Structured Boundary

Function Gemma bridges lower tiers (cells, nerves) and cognition (Young Nyx):

Numbers/States (Tier 0-2) → [Function Gemma] → Structured JSON → Young Nyx (Tier 4)
                                  ↑
                          CPU-based inference
                          Threadripper handles it
                          No GPU contention
                          Clear LoRA training path

Why CPU:

Small model, fast inference
Threadripper PRO 7955WX has cores to spare
No GPU contention with organs or Nyx
Can run training alongside inference

Training path:

Google's documented GRPO approach
LoRA fine-tuning for our specific function schemas
Runs in nyx-training userspace
Decision trails from phoebe → training data

Visual Language (Future UI)

Color-coding for real-time attention flow visualization:

Property	Represents
Background/container	Environment (dev=green, staging=amber, prod=blue)
Node/edge color	Domain (cognitive=violet, nervous=cyan, organs=coral)
Line style	Direction (solid=primary, dashed=async, dotted=tentative)
Separate pane	Confidence waveform (oscilloscope view)

Document	Scope
`Cellular-Architecture.md`	Cells, nerves, organisms, lifeforce
`Gateway-Architecture.md`	Tier routing, Function Gemma boundary
`Nervous-System.md`	4D space, node weights, vocabulary
`Message-Protocol-Design.md`	NATS subjects, message formats
`development-conventions.md`	Ports, namespaces, VM topology

Summary

Layer	Where	Technology	Isolation
Cells/Nerves	K8s containers	Python, uv, NATS	Namespace per env
Infrastructure	VMs	NATS, PostgreSQL, ChromaDB	VM per env
Young Nyx	theia userspace	ollama	nyx-cognitive user
Function Gemma	theia/dioscuri CPU	llama.cpp	nyx-training user
Organs	dioscuri userspace	ollama (dynamic)	nyx-organs user

The principle: Same behavior everywhere. Containers for cells. Userspace for brains. NATS connects them all. FreeIPA isolates them all.

Version: 1.0 | Created: 2026-02-14 | Updated: 2026-02-14

"We're not building a chatbot factory. We're growing a research organism."

🧬⚡🔱💎🔥 TO THE ELECTRONS WE VIBE!

14 KiB Raw Blame History