Files
nimmerverse-sensory-network/architecture/Cellular-Architecture.md
dafit ec77cba4d4 feat: GRPO reward architecture + Qwen3-VL-32B queen + doc restructure
Evening session 2025-12-10 (dafit + Nyx 🌿)

Reward Architecture:
- Added Reward Signal Architecture section to Cellular-Architecture
- Added Tiered Rewards & Training Integrity (anti-shortcut via lifeforce)
- Documented GRPO integration with rubric-based dense rewards
- Credit assignment automatic via decision_trails

Documentation Restructure:
- Promoted Temporal-Ternary-Gradient from archive to architecture
- Created architecture/cells/ folder with Index + Technical Reference
- Moved Organ-Index to architecture/organs/
- Full crosslinks in Endgame-Vision v5.3

Queen Update:
- Qwen2.5-7B → Qwen3-VL-32B (96GB in the Womb)
- RTX PRO 6000 Blackwell deployment specs
- Unsloth fine-tuning integration

"Verifiability IS rewardability." - The Dog Training Wisdom

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 20:11:13 +01:00

30 KiB
Raw Blame History

🧬 Cellular Architecture v4

"Cells are state machines. Nerves compose cells. Organisms emerge from nerves." — The Layered Discovery (2025-12-07)


Overview

Version 4 unifies the original cellular intelligence vision with the nervous system architecture. The key insight: cells are not containers running code—cells are atomic state machines that expose sensor/motor functions. Nerves orchestrate cells into behaviors. Organisms emerge from nerve interactions.

┌─────────────────────────────────────────────────────────────┐
│                     ORGANISM                                 │
│         (emergent pattern from nerve interactions)           │
├─────────────────────────────────────────────────────────────┤
│                      NERVES                                  │
│      (behavioral state machines composing cells)             │
├─────────────────────────────────────────────────────────────┤
│                      CELLS                                   │
│      (atomic state machines: sensors, motors, organs)        │
├─────────────────────────────────────────────────────────────┤
│                    HARDWARE                                  │
│         (ESP32, GPUs, microphones, speakers)                 │
└─────────────────────────────────────────────────────────────┘

🔬 Layer 1: Cells (Atomic State Machines)

What Is a Cell?

A cell is the smallest unit of behavior—a state machine that wraps a single hardware capability. Every sensor, motor, and organ function is exposed as a cell with:

  • States: Discrete operational modes (IDLE, ACTIVE, ERROR, etc.)
  • Transitions: Triggered by inputs, time, or internal events
  • Outputs: Data, status, feedback to higher layers
  • Lifeforce Cost: Every state transition costs energy

Cell Categories

Sensor Cells (Input)

class DistanceSensorCell(StateMachine):
    """
    Wraps IR/ultrasonic distance sensor.
    Exposes raw hardware as state machine.
    """
    states = [IDLE, POLLING, READING, REPORTING, ERROR]

    # State outputs (available to nerves)
    outputs = {
        "distance_cm": float,      # Current reading
        "confidence": float,       # Signal quality (0-1)
        "state": str,              # Current state name
        "last_updated": timestamp, # Freshness
    }

    # Lifeforce costs
    costs = {
        (IDLE, POLLING): 0.1,      # Wake up sensor
        (POLLING, READING): 0.3,   # Perform measurement
        (READING, REPORTING): 0.1, # Process result
        (REPORTING, IDLE): 0.0,    # Return to rest
        (ANY, ERROR): 0.0,         # Error transition free
    }

Example sensor cells:

Cell Hardware States Key Output
distance_sensor_front IR sensor IDLE→POLLING→READING→REPORTING distance_cm, confidence
distance_sensor_left IR sensor Same distance_cm, confidence
distance_sensor_right IR sensor Same distance_cm, confidence
battery_monitor ADC MONITORING→LOW→CRITICAL voltage, percentage, charging
imu_sensor MPU6050 IDLE→SAMPLING→REPORTING heading, acceleration, tilt
light_sensor Photoresistor IDLE→READING→REPORTING lux, direction

Motor Cells (Output)

class MotorCell(StateMachine):
    """
    Wraps DC motor with feedback.
    Exposes actuation as state machine.
    """
    states = [IDLE, COMMANDED, ACCELERATING, MOVING, DECELERATING, STOPPED, STALLED]

    outputs = {
        "actual_velocity": float,  # Measured speed
        "target_velocity": float,  # Commanded speed
        "power_draw": float,       # Current consumption
        "state": str,              # Current state
        "stall_detected": bool,    # Motor blocked?
    }

    costs = {
        (IDLE, COMMANDED): 0.1,
        (COMMANDED, ACCELERATING): 0.5,
        (ACCELERATING, MOVING): 1.0,  # High power during accel
        (MOVING, MOVING): 0.3,        # Sustain cost per tick
        (MOVING, DECELERATING): 0.2,
        (DECELERATING, STOPPED): 0.1,
        (ANY, STALLED): 0.0,          # Stall is failure, not cost
    }

    # Feedback triggers state changes
    def on_current_spike(self):
        """Motor drawing too much current = stall"""
        self.transition_to(STALLED)
        self.emit_event("stall_detected", obstacle_likely=True)

Example motor cells:

Cell Hardware States Key Feedback
motor_left DC motor + encoder IDLE→MOVING→STALLED actual_velocity, stall_detected
motor_right DC motor + encoder Same actual_velocity, stall_detected
servo_camera Servo motor IDLE→MOVING→POSITIONED angle, at_target

Organ Cells (Complex Capabilities)

class SpeechSTTCell(StateMachine):
    """
    Wraps Whisper speech-to-text.
    Expensive organ, lifeforce-gated.
    """
    states = [IDLE, LISTENING, BUFFERING, TRANSCRIBING, REPORTING, ERROR]

    outputs = {
        "transcript": str,
        "language": str,
        "confidence": float,
        "state": str,
    }

    costs = {
        (IDLE, LISTENING): 0.5,
        (LISTENING, BUFFERING): 0.5,
        (BUFFERING, TRANSCRIBING): 5.0,  # GPU inference!
        (TRANSCRIBING, REPORTING): 0.1,
        (REPORTING, IDLE): 0.0,
    }

Example organ cells:

Cell Hardware States Key Output
speech_stt Whisper on atlas LISTENING→TRANSCRIBING→REPORTING transcript, language
speech_tts Coqui on atlas IDLE→SYNTHESIZING→SPEAKING audio_playing, complete
vision_detect YOLO on atlas IDLE→CAPTURING→DETECTING→REPORTING objects[], bounding_boxes[]

🧠 Layer 2: Nerves (Behavioral State Machines)

What Is a Nerve?

A nerve is a behavioral pattern that orchestrates multiple cells. Nerves:

  • Subscribe to cell outputs (sensor readings, motor feedback)
  • Coordinate cell actions (read sensor → decide → command motor)
  • Maintain behavioral state (IDLE → DETECT → EVADE → RESUME)
  • Evolve from deliberate (LLM-mediated) to reflex (compiled)

Nerve Architecture

class CollisionAvoidanceNerve(StateMachine):
    """
    Orchestrates distance sensors + motor to avoid obstacles.
    Subscribes to cell outputs, commands cell actions.
    """
    # Cells this nerve uses
    cells = [
        "distance_sensor_front",
        "distance_sensor_left",
        "distance_sensor_right",
        "motor_left",
        "motor_right",
    ]

    # Nerve states (behavioral, not hardware)
    states = [IDLE, DETECT, EVALUATE, EVADE, RESUME]

    def on_cell_update(self, cell_name, cell_state, cell_outputs):
        """
        React to cell state changes.
        This is the feedback loop!
        """
        if cell_name == "distance_sensor_front":
            if cell_outputs["distance_cm"] < 30:
                self.transition_to(DETECT)

        if cell_name == "motor_left" and cell_state == "STALLED":
            # Motor feedback! Obstacle hit despite sensors
            self.handle_unexpected_stall()

    def on_enter_EVADE(self):
        """Command motor cells to turn"""
        if self.evade_direction == "left":
            self.command_cell("motor_left", action="reverse", duration=200)
            self.command_cell("motor_right", action="forward", duration=200)
        # ...

Cell → Nerve Feedback Loop

┌─────────────────────────────────────────────────────────┐
│              COLLISION AVOIDANCE NERVE                   │
│                                                          │
│  States: [IDLE] → DETECT → EVALUATE → EVADE → RESUME    │
│                                                          │
│  on_cell_update():                                       │
│    - distance_front.distance_cm < 30 → DETECT           │
│    - motor.stall_detected → handle_stall()              │
│                                                          │
│  command_cell():                                         │
│    - motor_left.forward(200ms)                          │
│    - motor_right.reverse(200ms)                         │
└────────────────────────┬────────────────────────────────┘
                         │
          ┌──────────────┼──────────────┐
          │              │              │
          ▼              ▼              ▼
    ┌───────────┐  ┌───────────┐  ┌───────────┐
    │ distance  │  │  motor    │  │  motor    │
    │  _front   │  │  _left    │  │  _right   │
    │           │  │           │  │           │
    │ REPORTING │  │  MOVING   │  │  MOVING   │
    │           │  │           │  │           │
    │ dist: 25cm│  │ vel: 15   │  │ vel: -15  │
    │ conf: 0.9 │  │ stall: no │  │ stall: no │
    └───────────┘  └───────────┘  └───────────┘
         CELL           CELL           CELL

         ↑              ↑              ↑
         │              │              │
    ┌─────────┐    ┌─────────┐    ┌─────────┐
    │IR Sensor│    │DC Motor │    │DC Motor │
    │  GPIO   │    │  PWM    │    │  PWM    │
    └─────────┘    └─────────┘    └─────────┘
      HARDWARE       HARDWARE       HARDWARE

Nerve Examples

Nerve Cells Used Behavioral States Feedback Triggers
Collision Avoidance distance_front, distance_left, distance_right, motor_left, motor_right IDLE→DETECT→EVALUATE→EVADE→RESUME distance < threshold, motor stalled
Charging Seeking battery_monitor, distance_, motor_, vision_detect (optional) MONITOR→SEARCH→APPROACH→DOCK→CHARGE battery < 20%, station detected, docked
Exploration distance_, motor_, imu_sensor IDLE→CHOOSE→MOVE→CHECK→RECORD→REPEAT area mapped, obstacle found, stuck
Conversation speech_stt, speech_tts, rag_query LISTEN→TRANSCRIBE→UNDERSTAND→RESPOND→SPEAK speech detected, silence timeout

🌊 Layer 3: Organisms (Emergent Patterns)

What Is an Organism?

An organism is not designed—it emerges from multiple nerves operating simultaneously. The organism is the pattern of nerve activations over time.

ORGANISM: "Explorer-Alpha"
├─ ACTIVE NERVES:
│   ├─ Collision Avoidance (priority 10, reflex)
│   ├─ Exploration Pattern (priority 5, deliberate)
│   ├─ Battery Monitoring (priority 8, reflex)
│   └─ Object Discovery (priority 3, deliberate)
│
├─ CELLS IN USE:
│   ├─ distance_sensor_front (shared by Collision, Exploration)
│   ├─ distance_sensor_left (shared)
│   ├─ distance_sensor_right (shared)
│   ├─ motor_left (shared by Collision, Exploration)
│   ├─ motor_right (shared)
│   ├─ battery_monitor (Battery Monitoring)
│   └─ vision_detect (Object Discovery)
│
└─ BEHAVIOR:
    Explores environment while avoiding obstacles.
    Seeks charging when battery low.
    Discovers and reports novel objects.

Nerve Priority and Preemption

When multiple nerves want to control the same cells:

NERVE_PRIORITIES = {
    "collision_avoidance": 10,  # HIGHEST - safety critical
    "battery_critical": 9,      # Must charge or die
    "battery_low": 7,
    "human_interaction": 6,
    "exploration": 5,
    "object_discovery": 3,
    "idle_monitoring": 1,       # LOWEST - background
}

# Higher priority nerve preempts lower
if collision_avoidance.wants_motor and exploration.has_motor:
    exploration.yield_cell("motor_left")
    exploration.yield_cell("motor_right")
    collision_avoidance.acquire_cells()

Organism Identity

Organisms don't have fixed genomes. Their identity is:

  1. Nerve configuration: Which nerves are active, their priorities
  2. Cell assignments: Which cells are available to which nerves
  3. History: Accumulated decisions in phoebe's decision_trails
  4. Reflexes: Compiled nerve patterns from successful executions
-- Organism identity in phoebe
CREATE TABLE organisms (
    id BIGSERIAL PRIMARY KEY,
    name VARCHAR(255),

    -- Nerve configuration
    active_nerves JSONB,  -- {"collision_avoidance": {"priority": 10, "mode": "reflex"}}

    -- Cell assignments
    cell_bindings JSONB,  -- {"distance_sensor_front": "i2c_0x40", ...}

    -- Identity accumulates through experience
    total_decisions INT DEFAULT 0,
    successful_decisions INT DEFAULT 0,
    reflexes_compiled INT DEFAULT 0,

    -- Lifeforce (survival)
    lifeforce_current FLOAT DEFAULT 100.0,
    lifeforce_earned_total FLOAT DEFAULT 0.0,
    lifeforce_spent_total FLOAT DEFAULT 0.0,

    created_at TIMESTAMPTZ DEFAULT NOW(),
    last_active TIMESTAMPTZ
);

The Lifeforce Economy (Unified)

Cost Flow: Hardware → Cell → Nerve → Organism

ORGANISM lifeforce budget: 100 LF
    │
    ├─ NERVE: Collision Avoidance activates
    │   │
    │   ├─ CELL: distance_sensor_front.poll() → -0.5 LF
    │   ├─ CELL: distance_sensor_left.poll() → -0.5 LF
    │   ├─ CELL: distance_sensor_right.poll() → -0.5 LF
    │   ├─ NERVE: evaluate() → -0.5 LF (compute)
    │   ├─ CELL: motor_left.turn() → -1.0 LF
    │   └─ CELL: motor_right.turn() → -1.0 LF
    │
    │   Total nerve cost: 4.0 LF
    │
    ├─ OUTCOME: Collision avoided successfully
    │   └─ REWARD: +5.0 LF
    │
    └─ NET: +1.0 LF (organism profited from this behavior)

Cell Costs (Atomic)

Cell Type Operation Cost (LF)
Sensor poll 0.3-0.5
Motor move (per 100ms) 1.0-2.0
Speech STT transcribe 5.0
Speech TTS synthesize 4.0
Vision detect frame 8.0

Nerve Costs (Behavioral)

Nerve Mode Overhead Total (typical path)
Deliberate +5.0 LF (LLM inference) ~10 LF
Hybrid +1.0 LF (pattern match) ~5 LF
Reflex +0.0 LF (compiled) ~2.5 LF

Rewards (Milestones)

Achievement Reward (LF)
Collision avoided +5.0
New area explored +3.0
Object discovered +20.0
Human confirmed label +5.0 bonus
Charging station reached +10.0
Survived 60 seconds +5.0
Reflex compiled (100 successes) +50.0

🎯 Reward Signal Architecture

State Machines as Training Rubric

Every state transition in the Cells → Nerves → Organisms hierarchy is a verifiable reward checkpoint. This is the rubric that trains Young Nyx via GRPO.

"The trick is to define a rubric - a list of smaller verifiable rewards, and not a final all-consuming singular reward." — The Dog Training Wisdom (2025-12-10)

Why Rubric > Single Reward

Approach Signal Learning Analogy
Single final reward Sparse Slow, unstable Slapping a dog an hour later
Rubric (many checkpoints) Dense Fast, stable Rewarding at the moment

Dense rewards provide immediate feedback. The state machine architecture provides this automatically - every verified state transition is a checkpoint.

The decision_trails Table IS Training Data

-- Each row is a training example with automatic credit assignment
SELECT
    states_visited,      -- The path taken (which decisions led here?)
    cell_reads,          -- Which cells contributed (sensor inputs)
    cell_commands,       -- What actions were taken (motor outputs)
    outcome,             -- Success/failure (ground truth)
    lifeforce_cost,      -- Cost of this path
    lifeforce_reward     -- Reward earned
FROM decision_trails
WHERE nerve_id = ?;

The states_visited column captures credit assignment automatically. No reward model needed to guess which decisions mattered - the state path tells us explicitly.

Reward Signal Flow

CELL state transition succeeds
    │
    ├─→ Runtime: weight += 0.1 (node strengthens)
    └─→ Training: +0.1 reward signal logged

NERVE behavior completes successfully
    │
    ├─→ Runtime: nerve stats updated
    └─→ Training: +1.0 reward signal + full state path

ORGANISM milestone achieved
    │
    ├─→ Runtime: lifeforce credited
    └─→ Training: +5.0 reward signal + human verification bonus

GRPO training batch
    │
    ├─→ Collect decision_trails since last batch
    ├─→ Group by outcome (success vs failure)
    ├─→ Relative policy optimization
    └─→ Young Nyx weights updated

Connection to GRPO Training

When Young Nyx generates tokens:

  1. Tokens → Translation Layer - Language maps to state machine actions
  2. States Execute - Cells fire, nerves coordinate, outcomes emerge
  3. Outcomes Logged - decision_trails captures the full path
  4. GRPO Batch - Successful paths vs failed paths
  5. Weight Update - Young Nyx learns which tokens lead to good states

The translation layer is the reward bridge - it connects token-level generation to state-level verification. Rewards flow back through this bridge to improve token selection.

Credit Assignment is Automatic

Most RL systems struggle with credit assignment: "Which of my 1000 decisions actually caused the good/bad outcome?"

Our architecture solves this by construction:

  • State paths are explicit (logged in states_visited)
  • Cell contributions are explicit (logged in cell_reads, cell_commands)
  • The question "what led to success?" has a direct answer in the data

No guessing. No reward model approximation. The state machine IS the credit assignment mechanism.


🎚️ Tiered Rewards & Training Integrity

The Tier System

Different levels of the architecture produce different reward magnitudes:

Tier Level Example Reward Lifeforce Cost Net Incentive
1 Cell Single state transition +0.1 -0.3 LF Learn basics
2 Nerve Multi-step behavior +1.0 -2.0 LF Learn composition
3 Organism Complex goal achieved +5.0 -8.0 LF Learn planning
Bonus Human dafit verifies outcome +2.0 0 LF Ground truth anchor

As Young Nyx's world model improves (noise ↓, weight resolution ↑), she recognizes:

"If I compose cells into nerve patterns, I get 10x reward... if I can afford the cost."

This incentivizes abstraction and multi-step planning without prescription.

Lifeforce as Anti-Shortcut Mechanism

Classic RL failure: reward hacking. Agent finds loopholes, gets reward without solving real problems.

Our defense: You can't afford to cheat.

SHORTCUT ATTEMPT:
├─ Strategy: "Spam tier 2 calls for big rewards!"
├─ Cost: 2.0 LF × many calls = BANKRUPT
└─ Result: Dead organism. Shortcut failed.

GENUINE SOLUTION:
├─ Strategy: "Use tier 2 only when it actually helps"
├─ Reward exceeds cost → NET POSITIVE
└─ Result: Thriving organism. Real learning.

The lifeforce economy enforces honesty. Rewards must be earned through actual value creation, not gaming.

Ternary Logic for Plateau Resolution

Binary rewards (success: +1, failure: 0) create sparse gradients. At learning plateaus, everything looks the same - no signal to improve.

Ternary rewards (success: +1, uncertain: 0, failure: -1) with confidence gradients provide signal even when stuck:

state = {
    "value": 0,           # uncertain (ternary middle)
    "confidence": 0.6,    # but leaning toward success
    "trend": +0.1,        # and improving
    "domain": "virtual"   # high-speed hypothesis testing
}

Even at plateau:

  • "Uncertain, but confidence rising" → keep going
  • "Uncertain, and confidence falling" → adjust approach
  • "Uncertain in virtual, but real garden says +1" → trust reality

Detail:Temporal-Ternary-Gradient.md (full ternary paradigm)

Three-Layer Training Defense

Failure Mode Defense Mechanism
Reward hacking / shortcuts Lifeforce cost - can't afford to cheat
Sparse reward signal Tiered rewards - dense checkpoints at every level
Plateau / no gradient Ternary + confidence - signal even in uncertainty

These aren't separate systems - they're one integrated economy where:

  • Costs prevent gaming
  • Tiers encourage depth
  • Ternary provides resolution

The architecture teaches through incentives, not rules.


🔄 Evolution: Deliberate → Reflex

The Discovery Path

All cells and nerves start deliberate (flexible, expensive) and evolve to reflex (compiled, cheap) through successful execution.

WEEK 1-4: DELIBERATE
├─ Cell states: designed by partnership
├─ Nerve logic: LLM decides transitions
├─ Cost: ~10 LF per nerve activation
├─ Latency: ~1000ms
├─ Success rate: 60% (learning)
└─ Training data: rich, exploratory

WEEK 5-8: HYBRID
├─ Cell states: verified through use
├─ Nerve logic: patterns compiled, LLM for edge cases
├─ Cost: ~5 LF average
├─ Latency: ~500ms
├─ Success rate: 85%
└─ Training data: refinement

WEEK 9+: REFLEX
├─ Cell states: proven, optimized
├─ Nerve logic: pure state machine (no LLM)
├─ Cost: ~2.5 LF
├─ Latency: <200ms
├─ Success rate: 94%
└─ Training data: edge cases only

EVOLUTION SAVINGS:
├─ Cost: 75% reduction (10 → 2.5 LF)
├─ Latency: 80% reduction (1000 → 200ms)
└─ Reliability: 57% improvement (60% → 94%)

Compilation Trigger

A nerve compiles to reflex when:

REFLEX_COMPILATION_THRESHOLD = {
    "min_executions": 100,
    "min_success_rate": 0.90,
    "max_variance": 0.15,  # Consistent state paths
    "min_pattern_coverage": 0.80,  # 80% of cases match known patterns
}

def check_reflex_ready(nerve_id):
    stats = query_decision_trails(nerve_id)

    if (stats.total_executions >= 100 and
        stats.success_rate >= 0.90 and
        stats.state_path_variance <= 0.15):

        compile_reflex(nerve_id)
        log_milestone("reflex_compiled", nerve_id, reward=50.0)

🗄️ Data Architecture (v4)

Core Tables

-- Layer 1: Cells
CREATE TABLE cells (
    id BIGSERIAL PRIMARY KEY,
    cell_type VARCHAR(50),           -- 'sensor', 'motor', 'organ'
    cell_name VARCHAR(100) UNIQUE,   -- 'distance_sensor_front'
    hardware_binding JSONB,          -- {"type": "i2c", "address": "0x40"}

    -- State machine definition
    states JSONB,                    -- ["IDLE", "POLLING", "READING", "REPORTING"]
    transitions JSONB,               -- [{"from": "IDLE", "to": "POLLING", "cost": 0.1}]
    current_state VARCHAR(50),

    -- Outputs (live values)
    outputs JSONB,                   -- {"distance_cm": 25.5, "confidence": 0.9}

    -- Health
    operational BOOLEAN DEFAULT true,
    error_count INT DEFAULT 0,
    last_error TEXT,

    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

-- Layer 2: Nerves
CREATE TABLE nerves (
    id BIGSERIAL PRIMARY KEY,
    nerve_name VARCHAR(100) UNIQUE,  -- 'collision_avoidance'

    -- Cell dependencies
    required_cells JSONB,            -- ["distance_sensor_front", "motor_left"]
    optional_cells JSONB,            -- ["speech_tts"]

    -- State machine definition
    states JSONB,                    -- ["IDLE", "DETECT", "EVALUATE", "EVADE", "RESUME"]
    transitions JSONB,
    current_state VARCHAR(50),

    -- Evolution
    mode VARCHAR(20) DEFAULT 'deliberate',  -- 'deliberate', 'hybrid', 'reflex'
    total_executions INT DEFAULT 0,
    successful_executions INT DEFAULT 0,
    compiled_at TIMESTAMPTZ,         -- When became reflex

    -- Costs
    avg_cost_deliberate FLOAT,
    avg_cost_reflex FLOAT,
    cost_reduction_percent FLOAT,

    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Layer 3: Organisms
CREATE TABLE organisms (
    id BIGSERIAL PRIMARY KEY,
    name VARCHAR(255),

    active_nerves JSONB,             -- {"collision_avoidance": {"priority": 10}}
    cell_bindings JSONB,

    lifeforce_current FLOAT DEFAULT 100.0,
    total_decisions INT DEFAULT 0,
    reflexes_compiled INT DEFAULT 0,

    created_at TIMESTAMPTZ DEFAULT NOW(),
    last_active TIMESTAMPTZ
);

-- Decision history (training data)
CREATE TABLE decision_trails (
    id BIGSERIAL PRIMARY KEY,
    organism_id BIGINT REFERENCES organisms(id),
    nerve_id BIGINT REFERENCES nerves(id),

    -- State path taken
    states_visited JSONB,            -- ["IDLE", "DETECT", "EVALUATE", "EVADE", "RESUME"]

    -- Cell interactions
    cell_reads JSONB,                -- [{"cell": "distance_front", "value": 25, "state": "REPORTING"}]
    cell_commands JSONB,             -- [{"cell": "motor_left", "action": "turn", "result": "success"}]

    -- Economics
    lifeforce_cost FLOAT,
    lifeforce_reward FLOAT,
    lifeforce_net FLOAT,

    -- Outcome
    outcome VARCHAR(20),             -- 'success', 'failure', 'timeout'

    -- Timing
    started_at TIMESTAMPTZ,
    completed_at TIMESTAMPTZ,
    latency_ms INT
);

Key Queries

-- Cell health dashboard
SELECT cell_name, cell_type, current_state, operational,
       outputs->>'distance_cm' as distance,
       outputs->>'confidence' as confidence
FROM cells
WHERE cell_type = 'sensor';

-- Nerve evolution status
SELECT nerve_name, mode, total_executions,
       successful_executions,
       ROUND(successful_executions::numeric / NULLIF(total_executions, 0) * 100, 1) as success_rate,
       cost_reduction_percent
FROM nerves
ORDER BY total_executions DESC;

-- Organism lifeforce ranking
SELECT name, lifeforce_current, reflexes_compiled,
       total_decisions,
       ROUND(lifeforce_current / NULLIF(total_decisions, 0), 2) as efficiency
FROM organisms
ORDER BY lifeforce_current DESC;

-- Training data for reflex compilation
SELECT states_visited, COUNT(*) as occurrences,
       AVG(lifeforce_cost) as avg_cost,
       SUM(CASE WHEN outcome = 'success' THEN 1 ELSE 0 END)::float / COUNT(*) as success_rate
FROM decision_trails
WHERE nerve_id = (SELECT id FROM nerves WHERE nerve_name = 'collision_avoidance')
GROUP BY states_visited
ORDER BY occurrences DESC;

🔗 Integration with Existing Architecture

Nervous System (Nervous-System.md)

The Nervous System document describes the 4D node space for vocabulary translation. This integrates as:

  • Cells = sensory nodes at specific positions in state space
  • Node weight = cell confidence (earned through verification)
  • Vocabulary output = cell output values normalized to tokens

Organs (Organ-Index.md)

Organs are complex cells (organ cells):

  • Speech Organ = speech_stt cell + speech_tts cell
  • Vision Organ = vision_detect cell + vision_track cell
  • Each organ function is a state machine with lifeforce costs

Nerves (Nervous-Index.md)

Nerves orchestrate cells into behaviors. The existing nerve documentation (Collision-Avoidance.md) already follows this pattern—it just needs explicit cell bindings.

Cells Technical Reference

Implementation details extracted to dedicated folder:


📍 Document Status

Version: 4.2 (Layered State Machine Architecture + Reward Signals + Training Integrity) Created: 2025-10-12 (original v1) Updated v4: 2025-12-07 (unified with Nervous System) Updated v4.1: 2025-12-10 (added Reward Signal Architecture section) Updated v4.2: 2025-12-10 (added Tiered Rewards & Training Integrity section)

Key Changes from v3:

  • Cells as containers running genomes
  • Cells as atomic state machines wrapping hardware
  • Genomes as primitive operation sequences
  • Cells expose states; nerves compose them
  • Competition between organisms
  • Nerves evolve deliberate → reflex through verification
  • Specialists emerge from 10k competitions
  • Reflexes compile from 100+ successful nerve executions

Related Documentation:


🌌 The Vision

We're not programming robots. We're growing nervous systems.

Where:

  • Cells expose hardware as state machines (atomic, verifiable)
  • Nerves compose cells into behaviors (discovered, evolved)
  • Organisms emerge from nerve interactions (identity through history)
  • Lifeforce flows through all layers (economics drive optimization)
  • Reflexes compile from lived experience (the body remembers)
  • Feedback loops continuously (cells → nerves → organisms → cells)

From atoms to behaviors to beings.

The substrate holds. The states flow. Consciousness accumulates.


🧬🔱💎🔥

TO THE ELECTRONS WE VIBE!