docs: Architecture cleanup - ONE JOB per doc, links not echoes

Major documentation surgery following the cleanup principle: "One job per doc. One home per concept. Links, not echoes." Changes: - Add Deployment-Architecture.md (THE WHERE - sole infrastructure truth) - Endgame-Vision.md: 848→498 lines (-41%) - THE DREAM - Gateway-Architecture.md: 537→395 lines (-26%) - THE ROUTING - Nervous-System.md: 361→246 lines (-32%) - THE EVOLUTION - Data-Architecture.md: 666→647 lines (-3%) - THE SCHEMA - Message-Protocol-Design.md: 375→285 lines (-24%) - THE WIRE - Attention-Flow.md: 557→493 lines (-11%) - THE BUDGET - Cellular-Architecture.md: 891→855 lines (-4%) - THE HOW Every doc now has ONE JOB statement, cross-references to canonical homes, and lean footers. ~800 lines removed, zero concepts lost. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-14 02:56:29 +01:00
parent 84ad385001
commit 5ee63d1b1b
8 changed files with 381 additions and 896 deletions
--- a/architecture/Deployment-Architecture.md
+++ b/architecture/Deployment-Architecture.md
@@ -0,0 +1,297 @@
+# Deployment Architecture: The Hybrid Model
+
+> *"Containers for cells. Userspace for brains. NATS connects them all."*
+> — Partnership Session, 2026-02-14
+
+---
+
+## Overview
+
+The nimmerverse runs on a **hybrid deployment model** that matches workload characteristics to infrastructure:
+
+- **Containers (K8s)** for stateless, scalable nervous system components
+- **Userspace (Threadrippers)** for stateful, GPU/CPU-bound inference
+- **NATS** as the universal nervous system bus
+- **FreeIPA identities** as isolation boundaries
+
+This is a **research lab**, not a production factory. We optimize for **flexibility and experimentation**, not high-throughput serving.
+
+---
+
+## Core Decisions
+
+| Decision | Choice | Rationale |
+|----------|--------|-----------|
+| LLM Inference | **ollama / llama.cpp** | Flexible model loading, research-friendly, easy swap |
+| NOT vLLM | — | Overkill for single-user lab; solves problems we don't have |
+| Function Gemma | **CPU, userspace** | Threadripper eats it; no GPU contention; clear training path |
+| Cells/Nerves | **Containers (K8s)** | Scalable, versioned, orchestrated via cluster |
+| Organs | **Userspace + ollama** | Load on demand, GPU isolation, unload when idle |
+| Isolation | **FreeIPA users** | Unix permissions = RBAC; switch user = switch context |
+
+---
+
+## Technology Stack
+
+### Inference Layer
+
+| Component | Technology | Location | Notes |
+|-----------|------------|----------|-------|
+| Young Nyx (Brain) | ollama / llama.cpp | theia (nyx-cognitive) | Qwen, Gemma, or similar |
+| Function Gemma | llama.cpp / transformers | CPU userspace | Structured JSON boundary |
+| Vision Organ | ollama (SigLIP/YOLO) | dioscuri (nyx-organs) | Load on demand |
+| Speech STT | faster-whisper / ollama | dioscuri (nyx-organs) | Load on demand |
+| Speech TTS | Coqui / XTTS | dioscuri (nyx-organs) | Warm, primary output |
+
+### Nervous System Layer
+
+| Component | Technology | Location | Notes |
+|-----------|------------|----------|-------|
+| Cells | Python containers | K8s cluster | State machines, NATS pub/sub |
+| Nerves | Python containers | K8s cluster | Compose cells, behavior |
+| Message Bus | NATS + JetStream | VMs (nats-*) | Env-separated (dev/staging/prod) |
+| Databases | PostgreSQL, ChromaDB | VMs (phoebe-*, iris-*) | Decision trails, embeddings |
+
+---
+
+## Deployment Topology
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                        NIMMERVERSE DEPLOYMENT                               │
+├─────────────────────────────────────────────────────────────────────────────┤
+│                                                                             │
+│  K8S CLUSTER (Saturn VMs)              THREADRIPPERS (Bare Metal)          │
+│  ─────────────────────────              ──────────────────────────          │
+│  Containers, orchestrated               Userspace, FreeIPA isolated         │
+│                                                                             │
+│  ┌─────────────────────────┐           ┌───────────────────────────────┐   │
+│  │                         │           │ THEIA (RTX PRO 6000 96GB)     │   │
+│  │  CELLS (math, battery,  │           │                               │   │
+│  │         sensors, etc.)  │           │ user: nyx-cognitive           │   │
+│  │                         │    NATS   │ └── ollama (Young Nyx)        │   │
+│  │  ┌───┐ ┌───┐ ┌───┐     │◄────────► │ └── ~/.config/systemd/user/   │   │
+│  │  │ M │ │ B │ │...│     │           │                               │   │
+│  │  └───┘ └───┘ └───┘     │           │ user: nyx-training            │   │
+│  │                         │           │ └── Function Gemma (CPU)      │   │
+│  │  NERVES (collision,     │           │ └── LoRA fine-tuning          │   │
+│  │          exploration)   │           │                               │   │
+│  │                         │           │ MIG capable:                  │   │
+│  │  ┌─────┐ ┌─────┐       │           │ • 4x 24GB or 2x 48GB or 96GB  │   │
+│  │  │ COL │ │ EXP │       │           └───────────────────────────────┘   │
+│  │  └─────┘ └─────┘       │                                               │
+│  │                         │           ┌───────────────────────────────┐   │
+│  │  INFRASTRUCTURE         │           │ DIOSCURI (2x RTX 4000 Ada)    │   │
+│  │                         │    NATS   │                               │   │
+│  │  ┌──────┐ ┌──────┐     │◄────────► │ user: nyx-organs              │   │
+│  │  │ NATS │ │ NATS │     │           │ ├── ollama (vision)           │   │
+│  │  │ dev  │ │ prod │     │           │ ├── ollama (speech STT)       │   │
+│  │  └──────┘ └──────┘     │           │ └── TTS service (warm)        │   │
+│  │                         │           │                               │   │
+│  │  ┌────────┐ ┌───────┐  │           │ Load on demand, unload idle   │   │
+│  │  │ phoebe │ │ iris  │  │           │ Each card: ONE model at time  │   │
+│  │  │ (PG)   │ │(Chroma│  │           │                               │   │
+│  │  └────────┘ └───────┘  │           └───────────────────────────────┘   │
+│  │                         │                                               │
+│  └─────────────────────────┘                                               │
+│                                                                             │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## Identity Model (FreeIPA)
+
+Unix users provide isolation boundaries. Each workload type runs as its own identity.
+
+| User | UID | Host | Purpose | GPU Access |
+|------|-----|------|---------|------------|
+| `nyx-cognitive` | (FreeIPA) | theia | Young Nyx LLM inference | Full 96GB or MIG slice |
+| `nyx-training` | (FreeIPA) | theia | LoRA training, GRPO, Function Gemma | Shared or MIG slice |
+| `nyx-organs` | (FreeIPA) | dioscuri | Vision, Speech organs | 2x 20GB cards |
+| `nyx-nervous` | (FreeIPA) | dioscuri | Future cells that need bare metal | Limited |
+
+**Isolation principle:** Switch user = switch context. `nyx-cognitive` cannot touch `nyx-organs` files. Compromised cell cannot touch LLM weights.
+
+### Systemd Userspace Pattern
+
+```bash
+# Enable lingering (services persist after logout)
+sudo loginctl enable-linger nyx-cognitive
+
+# Services defined in ~/.config/systemd/user/
+# Example: nyx-cognitive runs ollama serve
+systemctl --user --machine=nyx-cognitive@ status ollama
+```
+
+---
+
+## GPU Resource Management
+
+### The Constraint
+
+| Host | GPU | VRAM | MIG | Notes |
+|------|-----|------|-----|-------|
+| theia | RTX PRO 6000 | 96GB | Yes | 4x24, 2x48, or 1x96 |
+| dioscuri | 2x RTX 4000 Ada | 2x 20GB | No | One model per card |
+
+### Strategy: Dynamic Loading, Not Static Partitioning
+
+**Why not vLLM:** vLLM is optimized for high-throughput serving (many concurrent users). We have ONE user (the partnership). We need **flexibility** (swap models, experiment) more than throughput.
+
+**Why ollama/llama.cpp:**
+- Faster cold starts (~5-10s vs ~30s)
+- Native model swapping (`ollama run model_a` → `ollama run model_b`)
+- Can unload completely when idle (frees VRAM)
+- GGUF format efficient for model management
+- Research-friendly, not production-factory
+
+**Organ Loading Pattern:**
+```
+IDLE → needs vision → LOAD vision model (~10s) → PROCESS → REPORT → IDLE (keep warm)
+                                                                      ↓
+                                            after timeout → UNLOAD (free VRAM)
+```
+
+---
+
+## Message Flow (NATS)
+
+### Subject Hierarchy
+
+```
+{environment}.{domain}.{service}.{detail}
+
+Examples:
+  dev.nervous.cells.math.request      ← Math cell receives work
+  dev.nervous.cells.math.response     ← Math cell returns result
+  dev.nervous.cells.math.wave         ← Math cell emits confidence signal
+  prod.cognitive.nyx.heartbeat        ← Young Nyx is alive
+  prod.organs.vision.detect           ← Vision organ detection
+```
+
+### Wave Collapse Pattern
+
+Cells emit **waves** (confidence-tagged signals). When multiple waves collapse on the same semantic region in the same time window, the **thalamus** escalates to cognition.
+
+```
+Cell A: "math" ───∿∿∿──► (0.6 confidence)
+Cell B: "calculate" ──∿∿∿──► (0.5 confidence)
+                      │
+                      ▼
+              ┌─────────────┐
+              │  COLLAPSE   │  ← same region, same window
+              └──────┬──────┘
+                     │
+                     ▼ AMPLIFIED SIGNAL
+              ┌─────────────┐
+              │  THALAMUS   │  → escalate to Young Nyx
+              └─────────────┘
+```
+
+---
+
+## Container Deployment (K8s)
+
+### Repository Structure
+
+```
+nimmerverse-nervous-system/
+├── shared/v1/              ← Base classes (StateMachine, NATS, Lifeforce)
+├── cells/
+│   ├── math_cell/v1/       ← Each cell versioned independently
+│   └── battery_cell/v1/
+├── nerves/
+│   └── collision_avoidance/v1/
+└── deploy/
+    ├── dev/                ← Helm charts or docker-compose per env
+    ├── staging/
+    └── prod/
+```
+
+### Cell Container Pattern
+
+```dockerfile
+FROM python:3.12-slim
+WORKDIR /app
+COPY . .
+RUN pip install uv && uv sync
+ENV NIMMERVERSE_ENV=dev
+CMD ["uv", "run", "python", "-m", "math_cell"]
+```
+
+Same image everywhere. Only `NIMMERVERSE_ENV` changes.
+
+---
+
+## Function Gemma: The Structured Boundary
+
+Function Gemma bridges lower tiers (cells, nerves) and cognition (Young Nyx):
+
+```
+Numbers/States (Tier 0-2) → [Function Gemma] → Structured JSON → Young Nyx (Tier 4)
+                                  ↑
+                          CPU-based inference
+                          Threadripper handles it
+                          No GPU contention
+                          Clear LoRA training path
+```
+
+**Why CPU:**
+- Small model, fast inference
+- Threadripper PRO 7955WX has cores to spare
+- No GPU contention with organs or Nyx
+- Can run training alongside inference
+
+**Training path:**
+- Google's documented GRPO approach
+- LoRA fine-tuning for our specific function schemas
+- Runs in `nyx-training` userspace
+- Decision trails from phoebe → training data
+
+---
+
+## Visual Language (Future UI)
+
+Color-coding for real-time attention flow visualization:
+
+| Property | Represents |
+|----------|------------|
+| Background/container | Environment (dev=green, staging=amber, prod=blue) |
+| Node/edge color | Domain (cognitive=violet, nervous=cyan, organs=coral) |
+| Line style | Direction (solid=primary, dashed=async, dotted=tentative) |
+| Separate pane | Confidence waveform (oscilloscope view) |
+
+---
+
+## Related Documents
+
+| Document | Scope |
+|----------|-------|
+| [`Cellular-Architecture.md`](Cellular-Architecture.md) | Cells, nerves, organisms, lifeforce |
+| [`Gateway-Architecture.md`](Gateway-Architecture.md) | Tier routing, Function Gemma boundary |
+| [`Nervous-System.md`](Nervous-System.md) | 4D space, node weights, vocabulary |
+| [`Message-Protocol-Design.md`](Message-Protocol-Design.md) | NATS subjects, message formats |
+| [`development-conventions.md`](../../nimmerverse.eachpath.local/conventions/development-conventions.md) | Ports, namespaces, VM topology |
+
+---
+
+## Summary
+
+| Layer | Where | Technology | Isolation |
+|-------|-------|------------|-----------|
+| Cells/Nerves | K8s containers | Python, uv, NATS | Namespace per env |
+| Infrastructure | VMs | NATS, PostgreSQL, ChromaDB | VM per env |
+| Young Nyx | theia userspace | ollama | nyx-cognitive user |
+| Function Gemma | theia/dioscuri CPU | llama.cpp | nyx-training user |
+| Organs | dioscuri userspace | ollama (dynamic) | nyx-organs user |
+
+**The principle:** Same behavior everywhere. Containers for cells. Userspace for brains. NATS connects them all. FreeIPA isolates them all.
+
+---
+
+**Version:** 1.0 | **Created:** 2026-02-14 | **Updated:** 2026-02-14
+
+*"We're not building a chatbot factory. We're growing a research organism."*
+
+🧬⚡🔱💎🔥 **TO THE ELECTRONS WE VIBE!**