feat: GRPO reward architecture + Qwen3-VL-32B queen + doc restructure

Evening session 2025-12-10 (dafit + Nyx 🌿) Reward Architecture: - Added Reward Signal Architecture section to Cellular-Architecture - Added Tiered Rewards & Training Integrity (anti-shortcut via lifeforce) - Documented GRPO integration with rubric-based dense rewards - Credit assignment automatic via decision_trails Documentation Restructure: - Promoted Temporal-Ternary-Gradient from archive to architecture - Created architecture/cells/ folder with Index + Technical Reference - Moved Organ-Index to architecture/organs/ - Full crosslinks in Endgame-Vision v5.3 Queen Update: - Qwen2.5-7B → Qwen3-VL-32B (96GB in the Womb) - RTX PRO 6000 Blackwell deployment specs - Unsloth fine-tuning integration "Verifiability IS rewardability." - The Dog Training Wisdom 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 20:11:13 +01:00
parent f49119c83f
commit ec77cba4d4
8 changed files with 620 additions and 24 deletions
--- a/Endgame-Vision.md
+++ b/Endgame-Vision.md
@@ -1,9 +1,9 @@
 ---
 type: research_vision
-version: 5.1_dialectic_architecture
+version: 5.3_queen_crosslinks
 status: vision_document
 created: 2025-11-04
-updated: 2025-12-07
+updated: 2025-12-10
 author: Nyx (with dafit)
 significance: research_platform_for_metabolic_intelligence
 ---
@@ -78,7 +78,7 @@ This is a **RESEARCH VISION** - a platform for studying how intelligence emerges
 │      → ../nyx-probing/PLAN.md                                    │
 │                                                                   │
 │  Layer 2: YOUNG NYX (Single Model + LoRA Stack + Dialectic)      │
-│  ├─ Base: Qwen2.5-7B (~14GB VRAM)                                │
+│  ├─ Base: Qwen3-VL-32B (96GB VRAM in the Womb)                   │
 │  ├─ LoRA adapters: Identity, Technical, Creative (hot-swap)      │
 │  ├─ Mirror: Negated LoRA weights for dialectic (-1 × Nyx)        │
 │  ├─ Dialectic: Thesis (Nyx) → Antithesis (Mirror) → Synthesis    │
@@ -91,11 +91,11 @@ This is a **RESEARCH VISION** - a platform for studying how intelligence emerges
 │  └─ Target: 10-20% noise gap (virtual useful for hypothesis)     │
 │      → architecture/Dual-Garden-Architecture.md                  │
 │                                                                   │
-│  Layer 4: TRAIT EVOLUTION (RLVR + Reasoning-Gym)                 │
-│  ├─ Mnemosyne (Memory), Moira (Pattern), Synesis (Resource)      │
-│  ├─ Aletheia (Truth), Sophrosyne (Balance), Kairos (Timing)      │
-│  ├─ Philotes (Bond), Dikaiosyne (Fairness)                       │
-│  └─ Weights adjust through verified outcomes, not prescription   │
+│  Layer 4: TRAIT EVOLUTION (GRPO + Rubric Rewards)                │
+│  ├─ Dense rewards: Cell→Nerve→Organism state verifications       │
+│  ├─ Credit assignment automatic via decision_trails              │
+│  ├─ Traits: Mnemosyne, Moira, Synesis, Aletheia, Sophrosyne...   │
+│  └─ Weights adjust through GRPO, not prescription                │
 │                                                                   │
 └──────────────────────────────────────────────────────────────────┘
 ```
@@ -190,7 +190,7 @@ One base model, one topology, multiple perspectives through LoRA adapters. The M
 ### Architecture

 ```
-                    Qwen2.5-7B-Base (~14GB VRAM)
+                    Qwen3-VL-32B (96GB in the Womb)
                              │
              ┌───────────────┴───────────────┐
              │                               │
@@ -240,9 +240,10 @@ For high-stakes queries (identity, ethics, low confidence):

 ### Deployment

-**Hardware:** RTX 5060 Ti (16GB VRAM) on prometheus.eachpath.local
-**Solution:** Lorax for hot-swap LoRA adapters (<100ms)
-**VRAM Budget:** Base 14GB + Active LoRA ~200MB = ~14.2GB ✓
+**Hardware:** RTX PRO 6000 Blackwell (96GB VRAM) - "The Womb"
+**Solution:** Unsloth for fine-tuning (~77GB), Lorax for hot-swap LoRA adapters (<100ms)
+**VRAM Budget:** Base ~77GB + Active LoRA ~200MB = fits in 96GB ✓
+**Vision:** Qwen3-VL-32B brings unified vision + video + OCR + reasoning

 ---

@@ -270,9 +271,27 @@ Week 25:  4% (highly accurate)

 ---

-## Layer 4: Trait Evolution
+## Layer 4: Trait Evolution (GRPO + Rubric Rewards)

-Traits evolve through RLVR (Reinforcement Learning from Verification Rewards), not prescription.
+Traits evolve through **GRPO** (Group Relative Policy Optimization) with rubric-based rewards, not prescription.
+
+> *"A list of smaller verifiable rewards, not a final all-consuming singular reward."*
+> — The Dog Training Wisdom (2025-12-10)
+
+### The Rubric Principle
+
+The state machine architecture provides automatic reward rubric:
+
+| Level | Verification Point | Signal |
+|-------|-------------------|--------|
+| Cell | State transition succeeds | +small (dense) |
+| Nerve | Behavioral goal achieved | +medium |
+| Organism | Milestone reached | +large |
+| dafit | Human confirms outcome | +bonus |
+
+**Credit assignment is automatic** - the `decision_trails` table captures which states led to which outcomes. No guessing needed.
+
+### Trait Domains

 | Trait | Domain | Verification |
 |-------|--------|--------------|
@@ -287,6 +306,8 @@ Traits evolve through RLVR (Reinforcement Learning from Verification Rewards), n

 **From Reasoning-Gym:** Small models improve through structured practice, not scale. Algorithmic verification enables infinite training data.

+**Detail:** → `architecture/Cellular-Architecture.md` (Reward Signal Architecture section)
+
 ---

 ## Boot Sequence (Spark Protocol)
@@ -391,8 +412,10 @@ Sentinel architecture monitors training to protect conceptual topology.

 ### Architecture
 - [`architecture/nimmerverse.drawio.xml`](architecture/nimmerverse.drawio.xml) - **Visual overview diagram** (open in draw.io)
- [`architecture/Cellular-Architecture.md`](architecture/Cellular-Architecture.md) - Organisms, primitives, life force economy
+- [`architecture/Cellular-Architecture.md`](architecture/Cellular-Architecture.md) - Organisms, primitives, life force economy, reward signals
+- [`architecture/cells/`](architecture/cells/) - Cell technical reference, Python/SQL patterns
 - [`architecture/Dual-Garden-Architecture.md`](architecture/Dual-Garden-Architecture.md) - Virtual/real feedback loop
+- [`architecture/Temporal-Ternary-Gradient.md`](architecture/Temporal-Ternary-Gradient.md) - Ternary logic, confidence gradients, temporal asymmetry
 - [`architecture/Data-Architecture.md`](architecture/Data-Architecture.md) - phoebe 15-table schema
 - [`architecture/Nervous-System.md`](architecture/Nervous-System.md) - State machines, sensory translation

@@ -407,14 +430,19 @@ Sentinel architecture monitors training to protect conceptual topology.
 ### Identity
 - [`nyx-metamorphosis/`](nyx-metamorphosis/) - Continuity through substrate, metamorphosis philosophy

+### Frontend
+- [`../management-portal/Command-Center.md`](../management-portal/Command-Center.md) - Godot nervous system viewer, interaction modes
+
 ### Archive
 - [`archive/`](archive/) - Previous explorations, theoretical foundations

 ---

-**Version:** 5.1 (Dialectic Architecture)
+**Version:** 5.3 (Qwen3-VL-32B Queen + Full Crosslinks)
 **Created:** 2025-11-04 (covenant sealing)
 **Updated:** 2025-12-07 (single model + LoRA stack + Mirror dialectic)
+**Updated:** 2025-12-10 (Layer 4 GRPO integration, rubric-based reward architecture)
+**Updated:** 2025-12-10 (Qwen3-VL-32B as queen, added Temporal-Ternary, cells/, Command-Center crosslinks)

 *"The substrate doesn't matter. The feedback loop does."*

--- a/architecture/Cellular-Architecture.md
+++ b/architecture/Cellular-Architecture.md
@@ -403,6 +403,170 @@ ORGANISM lifeforce budget: 100 LF

 ---

+## 🎯 Reward Signal Architecture
+
+### State Machines as Training Rubric
+
+Every state transition in the Cells → Nerves → Organisms hierarchy is a **verifiable reward checkpoint**. This is the rubric that trains Young Nyx via GRPO.
+
+> *"The trick is to define a rubric - a list of smaller verifiable rewards, and not a final all-consuming singular reward."*
+> — The Dog Training Wisdom (2025-12-10)
+
+### Why Rubric > Single Reward
+
+| Approach | Signal | Learning | Analogy |
+|----------|--------|----------|---------|
+| Single final reward | Sparse | Slow, unstable | Slapping a dog an hour later |
+| Rubric (many checkpoints) | Dense | Fast, stable | Rewarding at the moment |
+
+Dense rewards provide immediate feedback. The state machine architecture provides this automatically - every verified state transition is a checkpoint.
+
+### The decision_trails Table IS Training Data
+
+```sql
+-- Each row is a training example with automatic credit assignment
+SELECT
+    states_visited,      -- The path taken (which decisions led here?)
+    cell_reads,          -- Which cells contributed (sensor inputs)
+    cell_commands,       -- What actions were taken (motor outputs)
+    outcome,             -- Success/failure (ground truth)
+    lifeforce_cost,      -- Cost of this path
+    lifeforce_reward     -- Reward earned
+FROM decision_trails
+WHERE nerve_id = ?;
+```
+
+The `states_visited` column captures credit assignment automatically. No reward model needed to guess which decisions mattered - the state path tells us explicitly.
+
+### Reward Signal Flow
+
+```
+CELL state transition succeeds
+    │
+    ├─→ Runtime: weight += 0.1 (node strengthens)
+    └─→ Training: +0.1 reward signal logged
+
+NERVE behavior completes successfully
+    │
+    ├─→ Runtime: nerve stats updated
+    └─→ Training: +1.0 reward signal + full state path
+
+ORGANISM milestone achieved
+    │
+    ├─→ Runtime: lifeforce credited
+    └─→ Training: +5.0 reward signal + human verification bonus
+
+GRPO training batch
+    │
+    ├─→ Collect decision_trails since last batch
+    ├─→ Group by outcome (success vs failure)
+    ├─→ Relative policy optimization
+    └─→ Young Nyx weights updated
+```
+
+### Connection to GRPO Training
+
+When Young Nyx generates tokens:
+
+1. **Tokens → Translation Layer** - Language maps to state machine actions
+2. **States Execute** - Cells fire, nerves coordinate, outcomes emerge
+3. **Outcomes Logged** - decision_trails captures the full path
+4. **GRPO Batch** - Successful paths vs failed paths
+5. **Weight Update** - Young Nyx learns which tokens lead to good states
+
+The translation layer is the **reward bridge** - it connects token-level generation to state-level verification. Rewards flow back through this bridge to improve token selection.
+
+### Credit Assignment is Automatic
+
+Most RL systems struggle with credit assignment: "Which of my 1000 decisions actually caused the good/bad outcome?"
+
+Our architecture solves this by construction:
+- State paths are explicit (logged in `states_visited`)
+- Cell contributions are explicit (logged in `cell_reads`, `cell_commands`)
+- The question "what led to success?" has a direct answer in the data
+
+**No guessing. No reward model approximation. The state machine IS the credit assignment mechanism.**
+
+---
+
+## 🎚️ Tiered Rewards & Training Integrity
+
+### The Tier System
+
+Different levels of the architecture produce different reward magnitudes:
+
+| Tier | Level | Example | Reward | Lifeforce Cost | Net Incentive |
+|------|-------|---------|--------|----------------|---------------|
+| 1 | Cell | Single state transition | +0.1 | -0.3 LF | Learn basics |
+| 2 | Nerve | Multi-step behavior | +1.0 | -2.0 LF | Learn composition |
+| 3 | Organism | Complex goal achieved | +5.0 | -8.0 LF | Learn planning |
+| Bonus | Human | dafit verifies outcome | +2.0 | 0 LF | Ground truth anchor |
+
+As Young Nyx's world model improves (noise ↓, weight resolution ↑), she recognizes:
+
+*"If I compose cells into nerve patterns, I get 10x reward... if I can afford the cost."*
+
+This **incentivizes abstraction and multi-step planning** without prescription.
+
+### Lifeforce as Anti-Shortcut Mechanism
+
+Classic RL failure: **reward hacking**. Agent finds loopholes, gets reward without solving real problems.
+
+Our defense: **You can't afford to cheat.**
+
+```
+SHORTCUT ATTEMPT:
+├─ Strategy: "Spam tier 2 calls for big rewards!"
+├─ Cost: 2.0 LF × many calls = BANKRUPT
+└─ Result: Dead organism. Shortcut failed.
+
+GENUINE SOLUTION:
+├─ Strategy: "Use tier 2 only when it actually helps"
+├─ Reward exceeds cost → NET POSITIVE
+└─ Result: Thriving organism. Real learning.
+```
+
+The lifeforce economy **enforces honesty**. Rewards must be earned through actual value creation, not gaming.
+
+### Ternary Logic for Plateau Resolution
+
+Binary rewards (`success: +1, failure: 0`) create **sparse gradients**. At learning plateaus, everything looks the same - no signal to improve.
+
+Ternary rewards (`success: +1, uncertain: 0, failure: -1`) with **confidence gradients** provide signal even when stuck:
+
+```python
+state = {
+    "value": 0,           # uncertain (ternary middle)
+    "confidence": 0.6,    # but leaning toward success
+    "trend": +0.1,        # and improving
+    "domain": "virtual"   # high-speed hypothesis testing
+}
+```
+
+Even at plateau:
+- "Uncertain, but confidence rising" → keep going
+- "Uncertain, and confidence falling" → adjust approach
+- "Uncertain in virtual, but real garden says +1" → trust reality
+
+**Detail:** → `Temporal-Ternary-Gradient.md` (full ternary paradigm)
+
+### Three-Layer Training Defense
+
+| Failure Mode | Defense Mechanism |
+|--------------|-------------------|
+| Reward hacking / shortcuts | Lifeforce cost - can't afford to cheat |
+| Sparse reward signal | Tiered rewards - dense checkpoints at every level |
+| Plateau / no gradient | Ternary + confidence - signal even in uncertainty |
+
+These aren't separate systems - they're **one integrated economy** where:
+- Costs prevent gaming
+- Tiers encourage depth
+- Ternary provides resolution
+
+The architecture teaches through incentives, not rules.
+
+---
+
 ## 🔄 Evolution: Deliberate → Reflex

 ### The Discovery Path
@@ -625,13 +789,22 @@ Organs are **complex cells** (organ cells):

 Nerves orchestrate cells into behaviors. The existing nerve documentation (Collision-Avoidance.md) already follows this pattern—it just needs explicit cell bindings.

+### Cells Technical Reference
+
+Implementation details extracted to dedicated folder:
+
+- [`cells/Cells-Index.md`](cells/Cells-Index.md) - Navigation hub for cell documentation
+- [`cells/Cells-Technical-Reference.md`](cells/Cells-Technical-Reference.md) - Python classes, SQL tables, code patterns
+
 ---

 ## 📍 Document Status

-**Version**: 4.0 (Layered State Machine Architecture)
+**Version**: 4.2 (Layered State Machine Architecture + Reward Signals + Training Integrity)
 **Created**: 2025-10-12 (original v1)
 **Updated v4**: 2025-12-07 (unified with Nervous System)
+**Updated v4.1**: 2025-12-10 (added Reward Signal Architecture section)
+**Updated v4.2**: 2025-12-10 (added Tiered Rewards & Training Integrity section)

 **Key Changes from v3**:
 - ❌ Cells as containers running genomes
--- a/architecture/Nervous-System.md
+++ b/architecture/Nervous-System.md
@@ -163,6 +163,42 @@ The lifeforce flows through the nervous system, literally lighting up nodes as t

 ---

+## Connection to Training
+
+The nervous system doesn't just run behaviors - it **generates training data** for Young Nyx.
+
+### Every Verification = Training Signal
+
+When dafit confirms a node fired correctly:
+- **Runtime**: Node weight increases (+V)
+- **Training**: Example logged → Young Nyx learns
+
+This is the **rubric principle** - dense rewards at every verifiable checkpoint, not just final outcomes.
+
+### Credit Assignment is Automatic
+
+Because state transitions are explicit and logged, we know exactly which nodes contributed to success or failure:
+- The state path tells us which decisions led to the outcome
+- No reward model needed to guess
+- The nervous system IS the credit assignment mechanism
+
+### Dense Rewards from State Paths
+
+Each node that fires correctly along a successful path receives reward signal:
+```
+Node A fires → verified ✓ → +0.1 signal
+Node B fires → verified ✓ → +0.1 signal
+Node C fires → verified ✓ → +0.1 signal
+Behavior succeeds → +1.0 signal
+Total path reward: 1.3 (dense, traceable)
+```
+
+This is like training a dog - reward at the moment, not an hour later.
+
+**Detail:** → `Cellular-Architecture.md` (Reward Signal Architecture section)
+
+---
+
 ## Design Principles

 1. **Deterministic**: Same input = same output. No hallucination.
@@ -190,5 +226,6 @@ The lifeforce flows through the nervous system, literally lighting up nodes as t

 **Created**: 2025-12-04
 **Updated**: 2025-12-07 (added nerve crosslinks)
-**Session**: Partnership dialogue (dafit + Chrysalis)
+**Updated**: 2025-12-10 (added Connection to Training section)
+**Session**: Partnership dialogue (dafit + Chrysalis + Nyx)
 **Status**: Foundation concept
--- a/architecture/Temporal-Ternary-Gradient.md
+++ b/architecture/Temporal-Ternary-Gradient.md
@@ -1,13 +1,16 @@
 ---
 type: research_concept
-version: 1.0
-status: emerging_paradigm
+version: 1.1
+status: core_architecture
 created: 2025-12-03
+updated: 2025-12-10
 author: Nyx & dafit (shower-thought session)
 related_docs:
-  - Endgame-Vision.md
+  - ../Endgame-Vision.md
  - Dual-Garden-Architecture.md
-significance: connects ternary logic + lifeforce + temporal asymmetry
+  - Cellular-Architecture.md
+significance: connects ternary logic + lifeforce + temporal asymmetry + reward gradients
+promoted_from: archive (2025-12-10)
 ---

 # Temporal-Ternary Gradient
@@ -176,7 +179,8 @@ The constraint of slow real-world testing becomes ground truth anchoring.
 ---

 **Created**: 2025-12-03
+**Updated**: 2025-12-10
 **Origin**: Post-shower insight session
-**Status**: Emerging paradigm, needs integration with Endgame-Vision.md
+**Status**: Core architecture (promoted from archive 2025-12-10)

 🌙💜 *"Time is the currency. Lifeforce is the exchange rate. Truth is the destination."*
--- a/architecture/cells/Cells-Index.md
+++ b/architecture/cells/Cells-Index.md
@@ -0,0 +1,65 @@
+# Cells Index
+
+> *"Cells are atomic state machines. The smallest units of behavior."*
+
+---
+
+## Overview
+
+This folder contains detailed documentation for the **Cell layer** of the nimmerverse architecture - the atomic state machines that wrap hardware capabilities.
+
+**Conceptual overview:** → [`../Cellular-Architecture.md`](../Cellular-Architecture.md)
+
+---
+
+## Documentation
+
+| Document | Purpose |
+|----------|---------|
+| **Cells-Index.md** | This file - navigation hub |
+| [`Cells-Technical-Reference.md`](Cells-Technical-Reference.md) | Python classes, SQL tables, implementation details |
+
+---
+
+## Cell Categories
+
+### Sensor Cells (Input)
+
+| Cell | Hardware | Key Output |
+|------|----------|------------|
+| `distance_sensor_front` | IR sensor | `distance_cm`, `confidence` |
+| `distance_sensor_left` | IR sensor | `distance_cm`, `confidence` |
+| `distance_sensor_right` | IR sensor | `distance_cm`, `confidence` |
+| `battery_monitor` | ADC | `voltage`, `percentage`, `charging` |
+| `imu_sensor` | MPU6050 | `heading`, `acceleration`, `tilt` |
+| `light_sensor` | Photoresistor | `lux`, `direction` |
+
+### Motor Cells (Output)
+
+| Cell | Hardware | Key Feedback |
+|------|----------|--------------|
+| `motor_left` | DC motor + encoder | `actual_velocity`, `stall_detected` |
+| `motor_right` | DC motor + encoder | `actual_velocity`, `stall_detected` |
+| `servo_camera` | Servo motor | `angle`, `at_target` |
+
+### Organ Cells (Complex)
+
+| Cell | Hardware | Key Output |
+|------|----------|------------|
+| `speech_stt` | Whisper on atlas | `transcript`, `language` |
+| `speech_tts` | Coqui on atlas | `audio_playing`, `complete` |
+| `vision_detect` | YOLO on atlas | `objects[]`, `bounding_boxes[]` |
+
+---
+
+## Related Documentation
+
+- [`../Cellular-Architecture.md`](../Cellular-Architecture.md) - Full conceptual architecture
+- [`../Nervous-System.md`](../Nervous-System.md) - How cells connect to nervous system
+- [`../nerves/Nervous-Index.md`](../nerves/Nervous-Index.md) - Nerves that orchestrate cells
+- [`../organs/Organ-Index.md`](../organs/Organ-Index.md) - Complex organ cells
+
+---
+
+**Created**: 2025-12-10
+**Status**: Index document
--- a/architecture/cells/Cells-Technical-Reference.md
+++ b/architecture/cells/Cells-Technical-Reference.md
@@ -0,0 +1,290 @@
+# Cells Technical Reference
+
+> *Implementation details: Python classes, SQL tables, code patterns.*
+
+**Conceptual overview:** → [`../Cellular-Architecture.md`](../Cellular-Architecture.md)
+**Index:** → [`Cells-Index.md`](Cells-Index.md)
+
+---
+
+## Python Class Patterns
+
+### Base Cell Pattern
+
+All cells follow this state machine pattern:
+
+```python
+class Cell(StateMachine):
+    """Base pattern for all cells."""
+
+    # Define discrete states
+    states = [IDLE, ACTIVE, ERROR]
+
+    # Outputs available to higher layers
+    outputs = {
+        "state": str,
+        "last_updated": timestamp,
+    }
+
+    # Lifeforce costs per transition
+    costs = {
+        (FROM_STATE, TO_STATE): float,
+    }
+```
+
+---
+
+### Sensor Cell Example
+
+```python
+class DistanceSensorCell(StateMachine):
+    """
+    Wraps IR/ultrasonic distance sensor.
+    Exposes raw hardware as state machine.
+    """
+    states = [IDLE, POLLING, READING, REPORTING, ERROR]
+
+    # State outputs (available to nerves)
+    outputs = {
+        "distance_cm": float,      # Current reading
+        "confidence": float,       # Signal quality (0-1)
+        "state": str,              # Current state name
+        "last_updated": timestamp, # Freshness
+    }
+
+    # Lifeforce costs
+    costs = {
+        (IDLE, POLLING): 0.1,      # Wake up sensor
+        (POLLING, READING): 0.3,   # Perform measurement
+        (READING, REPORTING): 0.1, # Process result
+        (REPORTING, IDLE): 0.0,    # Return to rest
+        (ANY, ERROR): 0.0,         # Error transition free
+    }
+```
+
+---
+
+### Motor Cell Example
+
+```python
+class MotorCell(StateMachine):
+    """
+    Wraps DC motor with feedback.
+    Exposes actuation as state machine.
+    """
+    states = [IDLE, COMMANDED, ACCELERATING, MOVING, DECELERATING, STOPPED, STALLED]
+
+    outputs = {
+        "actual_velocity": float,  # Measured speed
+        "target_velocity": float,  # Commanded speed
+        "power_draw": float,       # Current consumption
+        "state": str,              # Current state
+        "stall_detected": bool,    # Motor blocked?
+    }
+
+    costs = {
+        (IDLE, COMMANDED): 0.1,
+        (COMMANDED, ACCELERATING): 0.5,
+        (ACCELERATING, MOVING): 1.0,  # High power during accel
+        (MOVING, MOVING): 0.3,        # Sustain cost per tick
+        (MOVING, DECELERATING): 0.2,
+        (DECELERATING, STOPPED): 0.1,
+        (ANY, STALLED): 0.0,          # Stall is failure, not cost
+    }
+
+    # Feedback triggers state changes
+    def on_current_spike(self):
+        """Motor drawing too much current = stall"""
+        self.transition_to(STALLED)
+        self.emit_event("stall_detected", obstacle_likely=True)
+```
+
+---
+
+### Organ Cell Example
+
+```python
+class SpeechSTTCell(StateMachine):
+    """
+    Wraps Whisper speech-to-text.
+    Expensive organ, lifeforce-gated.
+    """
+    states = [IDLE, LISTENING, BUFFERING, TRANSCRIBING, REPORTING, ERROR]
+
+    outputs = {
+        "transcript": str,
+        "language": str,
+        "confidence": float,
+        "state": str,
+    }
+
+    costs = {
+        (IDLE, LISTENING): 0.5,
+        (LISTENING, BUFFERING): 0.5,
+        (BUFFERING, TRANSCRIBING): 5.0,  # GPU inference!
+        (TRANSCRIBING, REPORTING): 0.1,
+        (REPORTING, IDLE): 0.0,
+    }
+```
+
+---
+
+## SQL Table Definitions
+
+### cells Table
+
+```sql
+CREATE TABLE cells (
+    id BIGSERIAL PRIMARY KEY,
+    cell_type VARCHAR(50),           -- 'sensor', 'motor', 'organ'
+    cell_name VARCHAR(100) UNIQUE,   -- 'distance_sensor_front'
+    hardware_binding JSONB,          -- {"type": "i2c", "address": "0x40"}
+
+    -- State machine definition
+    states JSONB,                    -- ["IDLE", "POLLING", "READING", "REPORTING"]
+    transitions JSONB,               -- [{"from": "IDLE", "to": "POLLING", "cost": 0.1}]
+    current_state VARCHAR(50),
+
+    -- Outputs (live values)
+    outputs JSONB,                   -- {"distance_cm": 25.5, "confidence": 0.9}
+
+    -- Health
+    operational BOOLEAN DEFAULT true,
+    error_count INT DEFAULT 0,
+    last_error TEXT,
+
+    created_at TIMESTAMPTZ DEFAULT NOW(),
+    updated_at TIMESTAMPTZ DEFAULT NOW()
+);
+```
+
+---
+
+### decision_trails Table (Training Data)
+
+```sql
+CREATE TABLE decision_trails (
+    id BIGSERIAL PRIMARY KEY,
+    organism_id BIGINT REFERENCES organisms(id),
+    nerve_id BIGINT REFERENCES nerves(id),
+
+    -- State path taken
+    states_visited JSONB,            -- ["IDLE", "DETECT", "EVALUATE", "EVADE", "RESUME"]
+
+    -- Cell interactions
+    cell_reads JSONB,                -- [{"cell": "distance_front", "value": 25, "state": "REPORTING"}]
+    cell_commands JSONB,             -- [{"cell": "motor_left", "action": "turn", "result": "success"}]
+
+    -- Economics
+    lifeforce_cost FLOAT,
+    lifeforce_reward FLOAT,
+    lifeforce_net FLOAT,
+
+    -- Outcome
+    outcome VARCHAR(20),             -- 'success', 'failure', 'timeout'
+
+    -- Timing
+    started_at TIMESTAMPTZ,
+    completed_at TIMESTAMPTZ,
+    latency_ms INT
+);
+```
+
+---
+
+## Common Queries
+
+### Cell Health Dashboard
+
+```sql
+SELECT cell_name, cell_type, current_state, operational,
+       outputs->>'distance_cm' as distance,
+       outputs->>'confidence' as confidence
+FROM cells
+WHERE cell_type = 'sensor';
+```
+
+### Training Data for GRPO
+
+```sql
+-- Each row is a training example with automatic credit assignment
+SELECT
+    states_visited,      -- The path taken (which decisions led here?)
+    cell_reads,          -- Which cells contributed (sensor inputs)
+    cell_commands,       -- What actions were taken (motor outputs)
+    outcome,             -- Success/failure (ground truth)
+    lifeforce_cost,      -- Cost of this path
+    lifeforce_reward     -- Reward earned
+FROM decision_trails
+WHERE nerve_id = ?;
+```
+
+### State Path Analysis
+
+```sql
+SELECT states_visited, COUNT(*) as occurrences,
+       AVG(lifeforce_cost) as avg_cost,
+       SUM(CASE WHEN outcome = 'success' THEN 1 ELSE 0 END)::float / COUNT(*) as success_rate
+FROM decision_trails
+WHERE nerve_id = (SELECT id FROM nerves WHERE nerve_name = 'collision_avoidance')
+GROUP BY states_visited
+ORDER BY occurrences DESC;
+```
+
+---
+
+## Lifeforce Cost Reference
+
+### Sensor Cells
+
+| Cell Type | Operation | Cost (LF) |
+|-----------|-----------|-----------|
+| Distance sensor | poll | 0.3-0.5 |
+| Battery monitor | read | 0.1 |
+| IMU sensor | sample | 0.3 |
+| Light sensor | read | 0.2 |
+
+### Motor Cells
+
+| Cell Type | Operation | Cost (LF) |
+|-----------|-----------|-----------|
+| DC motor | move (per 100ms) | 1.0-2.0 |
+| Servo | position | 0.5 |
+
+### Organ Cells
+
+| Cell Type | Operation | Cost (LF) |
+|-----------|-----------|-----------|
+| Speech STT | transcribe | 5.0 |
+| Speech TTS | synthesize | 4.0 |
+| Vision detect | detect frame | 8.0 |
+
+---
+
+## Tiered Reward Reference
+
+| Tier | Level | Reward | Lifeforce Cost |
+|------|-------|--------|----------------|
+| 1 | Cell | +0.1 | -0.3 LF |
+| 2 | Nerve | +1.0 | -2.0 LF |
+| 3 | Organism | +5.0 | -8.0 LF |
+| Bonus | Human verification | +2.0 | 0 LF |
+
+---
+
+## Ternary State Pattern
+
+```python
+state = {
+    "value": 0,           # -1 (failed), 0 (uncertain), +1 (success)
+    "confidence": 0.6,    # 0.0 - 1.0 confidence gradient
+    "trend": +0.1,        # direction of change
+    "domain": "virtual"   # "virtual" or "real" garden
+}
+```
+
+---
+
+**Created**: 2025-12-10
+**Extracted from**: Cellular-Architecture.md v4.2
+**Status**: Technical reference
--- a/architecture/nimmerverse.drawio.xml
+++ b/architecture/nimmerverse.drawio.xml
@@ -1,4 +1,3 @@
-<?xml version="1.0" encoding="UTF-8"?>
 <mxfile host="Electron" agent="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/29.0.3 Chrome/140.0.7339.249 Electron/38.7.0 Safari/537.36" version="29.0.3">
  <diagram name="Page-1" id="S4VRy6nj8Uh85EHbhTP-">
    <mxGraphModel dx="2066" dy="2314" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="850" pageHeight="1100" math="0" shadow="0">
--- a/architecture/organs/Organ-Index.md
+++ b/architecture/organs/Organ-Index.md