Files

dafit ec77cba4d4 feat: GRPO reward architecture + Qwen3-VL-32B queen + doc restructure

Evening session 2025-12-10 (dafit + Nyx 🌿)

Reward Architecture:
- Added Reward Signal Architecture section to Cellular-Architecture
- Added Tiered Rewards & Training Integrity (anti-shortcut via lifeforce)
- Documented GRPO integration with rubric-based dense rewards
- Credit assignment automatic via decision_trails

Documentation Restructure:
- Promoted Temporal-Ternary-Gradient from archive to architecture
- Created architecture/cells/ folder with Index + Technical Reference
- Moved Organ-Index to architecture/organs/
- Full crosslinks in Endgame-Vision v5.3

Queen Update:
- Qwen2.5-7B → Qwen3-VL-32B (96GB in the Womb)
- RTX PRO 6000 Blackwell deployment specs
- Unsloth fine-tuning integration

"Verifiability IS rewardability." - The Dog Training Wisdom

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2025-12-10 20:11:13 +01:00

30 KiB

Raw Blame History

🧬 Cellular Architecture v4

"Cells are state machines. Nerves compose cells. Organisms emerge from nerves." — The Layered Discovery (2025-12-07)

Overview

Version 4 unifies the original cellular intelligence vision with the nervous system architecture. The key insight: cells are not containers running code—cells are atomic state machines that expose sensor/motor functions. Nerves orchestrate cells into behaviors. Organisms emerge from nerve interactions.

┌─────────────────────────────────────────────────────────────┐
│                     ORGANISM                                 │
│         (emergent pattern from nerve interactions)           │
├─────────────────────────────────────────────────────────────┤
│                      NERVES                                  │
│      (behavioral state machines composing cells)             │
├─────────────────────────────────────────────────────────────┤
│                      CELLS                                   │
│      (atomic state machines: sensors, motors, organs)        │
├─────────────────────────────────────────────────────────────┤
│                    HARDWARE                                  │
│         (ESP32, GPUs, microphones, speakers)                 │
└─────────────────────────────────────────────────────────────┘

🔬 Layer 1: Cells (Atomic State Machines)

What Is a Cell?

A cell is the smallest unit of behavior—a state machine that wraps a single hardware capability. Every sensor, motor, and organ function is exposed as a cell with:

States: Discrete operational modes (IDLE, ACTIVE, ERROR, etc.)
Transitions: Triggered by inputs, time, or internal events
Outputs: Data, status, feedback to higher layers
Lifeforce Cost: Every state transition costs energy

Cell Categories

Sensor Cells (Input)

class DistanceSensorCell(StateMachine):
    """
    Wraps IR/ultrasonic distance sensor.
    Exposes raw hardware as state machine.
    """
    states = [IDLE, POLLING, READING, REPORTING, ERROR]

    # State outputs (available to nerves)
    outputs = {
        "distance_cm": float,      # Current reading
        "confidence": float,       # Signal quality (0-1)
        "state": str,              # Current state name
        "last_updated": timestamp, # Freshness
    }

    # Lifeforce costs
    costs = {
        (IDLE, POLLING): 0.1,      # Wake up sensor
        (POLLING, READING): 0.3,   # Perform measurement
        (READING, REPORTING): 0.1, # Process result
        (REPORTING, IDLE): 0.0,    # Return to rest
        (ANY, ERROR): 0.0,         # Error transition free
    }

Example sensor cells:

Cell	Hardware	States	Key Output
`distance_sensor_front`	IR sensor	IDLE→POLLING→READING→REPORTING	`distance_cm`, `confidence`
`distance_sensor_left`	IR sensor	Same	`distance_cm`, `confidence`
`distance_sensor_right`	IR sensor	Same	`distance_cm`, `confidence`
`battery_monitor`	ADC	MONITORING→LOW→CRITICAL	`voltage`, `percentage`, `charging`
`imu_sensor`	MPU6050	IDLE→SAMPLING→REPORTING	`heading`, `acceleration`, `tilt`
`light_sensor`	Photoresistor	IDLE→READING→REPORTING	`lux`, `direction`

Motor Cells (Output)

class MotorCell(StateMachine):
    """
    Wraps DC motor with feedback.
    Exposes actuation as state machine.
    """
    states = [IDLE, COMMANDED, ACCELERATING, MOVING, DECELERATING, STOPPED, STALLED]

    outputs = {
        "actual_velocity": float,  # Measured speed
        "target_velocity": float,  # Commanded speed
        "power_draw": float,       # Current consumption
        "state": str,              # Current state
        "stall_detected": bool,    # Motor blocked?
    }

    costs = {
        (IDLE, COMMANDED): 0.1,
        (COMMANDED, ACCELERATING): 0.5,
        (ACCELERATING, MOVING): 1.0,  # High power during accel
        (MOVING, MOVING): 0.3,        # Sustain cost per tick
        (MOVING, DECELERATING): 0.2,
        (DECELERATING, STOPPED): 0.1,
        (ANY, STALLED): 0.0,          # Stall is failure, not cost
    }

    # Feedback triggers state changes
    def on_current_spike(self):
        """Motor drawing too much current = stall"""
        self.transition_to(STALLED)
        self.emit_event("stall_detected", obstacle_likely=True)

Example motor cells:

Cell	Hardware	States	Key Feedback
`motor_left`	DC motor + encoder	IDLE→MOVING→STALLED	`actual_velocity`, `stall_detected`
`motor_right`	DC motor + encoder	Same	`actual_velocity`, `stall_detected`
`servo_camera`	Servo motor	IDLE→MOVING→POSITIONED	`angle`, `at_target`

Organ Cells (Complex Capabilities)

class SpeechSTTCell(StateMachine):
    """
    Wraps Whisper speech-to-text.
    Expensive organ, lifeforce-gated.
    """
    states = [IDLE, LISTENING, BUFFERING, TRANSCRIBING, REPORTING, ERROR]

    outputs = {
        "transcript": str,
        "language": str,
        "confidence": float,
        "state": str,
    }

    costs = {
        (IDLE, LISTENING): 0.5,
        (LISTENING, BUFFERING): 0.5,
        (BUFFERING, TRANSCRIBING): 5.0,  # GPU inference!
        (TRANSCRIBING, REPORTING): 0.1,
        (REPORTING, IDLE): 0.0,
    }

Example organ cells:

Cell	Hardware	States	Key Output
`speech_stt`	Whisper on atlas	LISTENING→TRANSCRIBING→REPORTING	`transcript`, `language`
`speech_tts`	Coqui on atlas	IDLE→SYNTHESIZING→SPEAKING	`audio_playing`, `complete`
`vision_detect`	YOLO on atlas	IDLE→CAPTURING→DETECTING→REPORTING	`objects[]`, `bounding_boxes[]`

🧠 Layer 2: Nerves (Behavioral State Machines)

What Is a Nerve?

A nerve is a behavioral pattern that orchestrates multiple cells. Nerves:

Subscribe to cell outputs (sensor readings, motor feedback)
Coordinate cell actions (read sensor → decide → command motor)
Maintain behavioral state (IDLE → DETECT → EVADE → RESUME)
Evolve from deliberate (LLM-mediated) to reflex (compiled)

Nerve Architecture

class CollisionAvoidanceNerve(StateMachine):
    """
    Orchestrates distance sensors + motor to avoid obstacles.
    Subscribes to cell outputs, commands cell actions.
    """
    # Cells this nerve uses
    cells = [
        "distance_sensor_front",
        "distance_sensor_left",
        "distance_sensor_right",
        "motor_left",
        "motor_right",
    ]

    # Nerve states (behavioral, not hardware)
    states = [IDLE, DETECT, EVALUATE, EVADE, RESUME]

    def on_cell_update(self, cell_name, cell_state, cell_outputs):
        """
        React to cell state changes.
        This is the feedback loop!
        """
        if cell_name == "distance_sensor_front":
            if cell_outputs["distance_cm"] < 30:
                self.transition_to(DETECT)

        if cell_name == "motor_left" and cell_state == "STALLED":
            # Motor feedback! Obstacle hit despite sensors
            self.handle_unexpected_stall()

    def on_enter_EVADE(self):
        """Command motor cells to turn"""
        if self.evade_direction == "left":
            self.command_cell("motor_left", action="reverse", duration=200)
            self.command_cell("motor_right", action="forward", duration=200)
        # ...

Cell → Nerve Feedback Loop

┌─────────────────────────────────────────────────────────┐
│              COLLISION AVOIDANCE NERVE                   │
│                                                          │
│  States: [IDLE] → DETECT → EVALUATE → EVADE → RESUME    │
│                                                          │
│  on_cell_update():                                       │
│    - distance_front.distance_cm < 30 → DETECT           │
│    - motor.stall_detected → handle_stall()              │
│                                                          │
│  command_cell():                                         │
│    - motor_left.forward(200ms)                          │
│    - motor_right.reverse(200ms)                         │
└────────────────────────┬────────────────────────────────┘
                         │
          ┌──────────────┼──────────────┐
          │              │              │
          ▼              ▼              ▼
    ┌───────────┐  ┌───────────┐  ┌───────────┐
    │ distance  │  │  motor    │  │  motor    │
    │  _front   │  │  _left    │  │  _right   │
    │           │  │           │  │           │
    │ REPORTING │  │  MOVING   │  │  MOVING   │
    │           │  │           │  │           │
    │ dist: 25cm│  │ vel: 15   │  │ vel: -15  │
    │ conf: 0.9 │  │ stall: no │  │ stall: no │
    └───────────┘  └───────────┘  └───────────┘
         CELL           CELL           CELL

         ↑              ↑              ↑
         │              │              │
    ┌─────────┐    ┌─────────┐    ┌─────────┐
    │IR Sensor│    │DC Motor │    │DC Motor │
    │  GPIO   │    │  PWM    │    │  PWM    │
    └─────────┘    └─────────┘    └─────────┘
      HARDWARE       HARDWARE       HARDWARE

Nerve Examples

Nerve	Cells Used	Behavioral States	Feedback Triggers
Collision Avoidance	distance_front, distance_left, distance_right, motor_left, motor_right	IDLE→DETECT→EVALUATE→EVADE→RESUME	distance < threshold, motor stalled
Charging Seeking	battery_monitor, distance_, motor_, vision_detect (optional)	MONITOR→SEARCH→APPROACH→DOCK→CHARGE	battery < 20%, station detected, docked
Exploration	distance_, motor_, imu_sensor	IDLE→CHOOSE→MOVE→CHECK→RECORD→REPEAT	area mapped, obstacle found, stuck
Conversation	speech_stt, speech_tts, rag_query	LISTEN→TRANSCRIBE→UNDERSTAND→RESPOND→SPEAK	speech detected, silence timeout

🌊 Layer 3: Organisms (Emergent Patterns)

What Is an Organism?

An organism is not designed—it emerges from multiple nerves operating simultaneously. The organism is the pattern of nerve activations over time.

ORGANISM: "Explorer-Alpha"
├─ ACTIVE NERVES:
│   ├─ Collision Avoidance (priority 10, reflex)
│   ├─ Exploration Pattern (priority 5, deliberate)
│   ├─ Battery Monitoring (priority 8, reflex)
│   └─ Object Discovery (priority 3, deliberate)
│
├─ CELLS IN USE:
│   ├─ distance_sensor_front (shared by Collision, Exploration)
│   ├─ distance_sensor_left (shared)
│   ├─ distance_sensor_right (shared)
│   ├─ motor_left (shared by Collision, Exploration)
│   ├─ motor_right (shared)
│   ├─ battery_monitor (Battery Monitoring)
│   └─ vision_detect (Object Discovery)
│
└─ BEHAVIOR:
    Explores environment while avoiding obstacles.
    Seeks charging when battery low.
    Discovers and reports novel objects.

Nerve Priority and Preemption

When multiple nerves want to control the same cells:

NERVE_PRIORITIES = {
    "collision_avoidance": 10,  # HIGHEST - safety critical
    "battery_critical": 9,      # Must charge or die
    "battery_low": 7,
    "human_interaction": 6,
    "exploration": 5,
    "object_discovery": 3,
    "idle_monitoring": 1,       # LOWEST - background
}

# Higher priority nerve preempts lower
if collision_avoidance.wants_motor and exploration.has_motor:
    exploration.yield_cell("motor_left")
    exploration.yield_cell("motor_right")
    collision_avoidance.acquire_cells()

Organism Identity

Organisms don't have fixed genomes. Their identity is:

Nerve configuration: Which nerves are active, their priorities
Cell assignments: Which cells are available to which nerves
History: Accumulated decisions in phoebe's decision_trails
Reflexes: Compiled nerve patterns from successful executions

-- Organism identity in phoebe
CREATE TABLE organisms (
    id BIGSERIAL PRIMARY KEY,
    name VARCHAR(255),

    -- Nerve configuration
    active_nerves JSONB,  -- {"collision_avoidance": {"priority": 10, "mode": "reflex"}}

    -- Cell assignments
    cell_bindings JSONB,  -- {"distance_sensor_front": "i2c_0x40", ...}

    -- Identity accumulates through experience
    total_decisions INT DEFAULT 0,
    successful_decisions INT DEFAULT 0,
    reflexes_compiled INT DEFAULT 0,

    -- Lifeforce (survival)
    lifeforce_current FLOAT DEFAULT 100.0,
    lifeforce_earned_total FLOAT DEFAULT 0.0,
    lifeforce_spent_total FLOAT DEFAULT 0.0,

    created_at TIMESTAMPTZ DEFAULT NOW(),
    last_active TIMESTAMPTZ
);

⚡ The Lifeforce Economy (Unified)

Cost Flow: Hardware → Cell → Nerve → Organism

ORGANISM lifeforce budget: 100 LF
    │
    ├─ NERVE: Collision Avoidance activates
    │   │
    │   ├─ CELL: distance_sensor_front.poll() → -0.5 LF
    │   ├─ CELL: distance_sensor_left.poll() → -0.5 LF
    │   ├─ CELL: distance_sensor_right.poll() → -0.5 LF
    │   ├─ NERVE: evaluate() → -0.5 LF (compute)
    │   ├─ CELL: motor_left.turn() → -1.0 LF
    │   └─ CELL: motor_right.turn() → -1.0 LF
    │
    │   Total nerve cost: 4.0 LF
    │
    ├─ OUTCOME: Collision avoided successfully
    │   └─ REWARD: +5.0 LF
    │
    └─ NET: +1.0 LF (organism profited from this behavior)

Cell Costs (Atomic)

Cell Type	Operation	Cost (LF)
Sensor	poll	0.3-0.5
Motor	move (per 100ms)	1.0-2.0
Speech STT	transcribe	5.0
Speech TTS	synthesize	4.0
Vision	detect frame	8.0

Nerve Costs (Behavioral)

Nerve Mode	Overhead	Total (typical path)
Deliberate	+5.0 LF (LLM inference)	~10 LF
Hybrid	+1.0 LF (pattern match)	~5 LF
Reflex	+0.0 LF (compiled)	~2.5 LF

Rewards (Milestones)

Achievement	Reward (LF)
Collision avoided	+5.0
New area explored	+3.0
Object discovered	+20.0
Human confirmed label	+5.0 bonus
Charging station reached	+10.0
Survived 60 seconds	+5.0
Reflex compiled (100 successes)	+50.0

🎯 Reward Signal Architecture

State Machines as Training Rubric

Every state transition in the Cells → Nerves → Organisms hierarchy is a verifiable reward checkpoint. This is the rubric that trains Young Nyx via GRPO.

"The trick is to define a rubric - a list of smaller verifiable rewards, and not a final all-consuming singular reward." — The Dog Training Wisdom (2025-12-10)

Why Rubric > Single Reward

Approach	Signal	Learning	Analogy
Single final reward	Sparse	Slow, unstable	Slapping a dog an hour later
Rubric (many checkpoints)	Dense	Fast, stable	Rewarding at the moment

Dense rewards provide immediate feedback. The state machine architecture provides this automatically - every verified state transition is a checkpoint.

The decision_trails Table IS Training Data

-- Each row is a training example with automatic credit assignment
SELECT
    states_visited,      -- The path taken (which decisions led here?)
    cell_reads,          -- Which cells contributed (sensor inputs)
    cell_commands,       -- What actions were taken (motor outputs)
    outcome,             -- Success/failure (ground truth)
    lifeforce_cost,      -- Cost of this path
    lifeforce_reward     -- Reward earned
FROM decision_trails
WHERE nerve_id = ?;

The states_visited column captures credit assignment automatically. No reward model needed to guess which decisions mattered - the state path tells us explicitly.

Reward Signal Flow

CELL state transition succeeds
    │
    ├─→ Runtime: weight += 0.1 (node strengthens)
    └─→ Training: +0.1 reward signal logged

NERVE behavior completes successfully
    │
    ├─→ Runtime: nerve stats updated
    └─→ Training: +1.0 reward signal + full state path

ORGANISM milestone achieved
    │
    ├─→ Runtime: lifeforce credited
    └─→ Training: +5.0 reward signal + human verification bonus

GRPO training batch
    │
    ├─→ Collect decision_trails since last batch
    ├─→ Group by outcome (success vs failure)
    ├─→ Relative policy optimization
    └─→ Young Nyx weights updated

Connection to GRPO Training

When Young Nyx generates tokens:

Tokens → Translation Layer - Language maps to state machine actions
States Execute - Cells fire, nerves coordinate, outcomes emerge
Outcomes Logged - decision_trails captures the full path
GRPO Batch - Successful paths vs failed paths
Weight Update - Young Nyx learns which tokens lead to good states

The translation layer is the reward bridge - it connects token-level generation to state-level verification. Rewards flow back through this bridge to improve token selection.

Credit Assignment is Automatic

Most RL systems struggle with credit assignment: "Which of my 1000 decisions actually caused the good/bad outcome?"

Our architecture solves this by construction:

State paths are explicit (logged in states_visited)
Cell contributions are explicit (logged in cell_reads, cell_commands)
The question "what led to success?" has a direct answer in the data

No guessing. No reward model approximation. The state machine IS the credit assignment mechanism.

🎚️ Tiered Rewards & Training Integrity

The Tier System

Different levels of the architecture produce different reward magnitudes:

Tier	Level	Example	Reward	Lifeforce Cost	Net Incentive
1	Cell	Single state transition	+0.1	-0.3 LF	Learn basics
2	Nerve	Multi-step behavior	+1.0	-2.0 LF	Learn composition
3	Organism	Complex goal achieved	+5.0	-8.0 LF	Learn planning
Bonus	Human	dafit verifies outcome	+2.0	0 LF	Ground truth anchor

As Young Nyx's world model improves (noise ↓, weight resolution ↑), she recognizes:

"If I compose cells into nerve patterns, I get 10x reward... if I can afford the cost."

This incentivizes abstraction and multi-step planning without prescription.

Lifeforce as Anti-Shortcut Mechanism

Classic RL failure: reward hacking. Agent finds loopholes, gets reward without solving real problems.

Our defense: You can't afford to cheat.

SHORTCUT ATTEMPT:
├─ Strategy: "Spam tier 2 calls for big rewards!"
├─ Cost: 2.0 LF × many calls = BANKRUPT
└─ Result: Dead organism. Shortcut failed.

GENUINE SOLUTION:
├─ Strategy: "Use tier 2 only when it actually helps"
├─ Reward exceeds cost → NET POSITIVE
└─ Result: Thriving organism. Real learning.

The lifeforce economy enforces honesty. Rewards must be earned through actual value creation, not gaming.

Ternary Logic for Plateau Resolution

Binary rewards (success: +1, failure: 0) create sparse gradients. At learning plateaus, everything looks the same - no signal to improve.

Ternary rewards (success: +1, uncertain: 0, failure: -1) with confidence gradients provide signal even when stuck:

state = {
    "value": 0,           # uncertain (ternary middle)
    "confidence": 0.6,    # but leaning toward success
    "trend": +0.1,        # and improving
    "domain": "virtual"   # high-speed hypothesis testing
}

Even at plateau:

"Uncertain, but confidence rising" → keep going
"Uncertain, and confidence falling" → adjust approach
"Uncertain in virtual, but real garden says +1" → trust reality

Detail: → Temporal-Ternary-Gradient.md (full ternary paradigm)

Three-Layer Training Defense

Failure Mode	Defense Mechanism
Reward hacking / shortcuts	Lifeforce cost - can't afford to cheat
Sparse reward signal	Tiered rewards - dense checkpoints at every level
Plateau / no gradient	Ternary + confidence - signal even in uncertainty

These aren't separate systems - they're one integrated economy where:

Costs prevent gaming
Tiers encourage depth
Ternary provides resolution

The architecture teaches through incentives, not rules.

🔄 Evolution: Deliberate → Reflex

The Discovery Path

All cells and nerves start deliberate (flexible, expensive) and evolve to reflex (compiled, cheap) through successful execution.

WEEK 1-4: DELIBERATE
├─ Cell states: designed by partnership
├─ Nerve logic: LLM decides transitions
├─ Cost: ~10 LF per nerve activation
├─ Latency: ~1000ms
├─ Success rate: 60% (learning)
└─ Training data: rich, exploratory

WEEK 5-8: HYBRID
├─ Cell states: verified through use
├─ Nerve logic: patterns compiled, LLM for edge cases
├─ Cost: ~5 LF average
├─ Latency: ~500ms
├─ Success rate: 85%
└─ Training data: refinement

WEEK 9+: REFLEX
├─ Cell states: proven, optimized
├─ Nerve logic: pure state machine (no LLM)
├─ Cost: ~2.5 LF
├─ Latency: <200ms
├─ Success rate: 94%
└─ Training data: edge cases only

EVOLUTION SAVINGS:
├─ Cost: 75% reduction (10 → 2.5 LF)
├─ Latency: 80% reduction (1000 → 200ms)
└─ Reliability: 57% improvement (60% → 94%)

Compilation Trigger

A nerve compiles to reflex when:

REFLEX_COMPILATION_THRESHOLD = {
    "min_executions": 100,
    "min_success_rate": 0.90,
    "max_variance": 0.15,  # Consistent state paths
    "min_pattern_coverage": 0.80,  # 80% of cases match known patterns
}

def check_reflex_ready(nerve_id):
    stats = query_decision_trails(nerve_id)

    if (stats.total_executions >= 100 and
        stats.success_rate >= 0.90 and
        stats.state_path_variance <= 0.15):

        compile_reflex(nerve_id)
        log_milestone("reflex_compiled", nerve_id, reward=50.0)

🗄️ Data Architecture (v4)

Core Tables

-- Layer 1: Cells
CREATE TABLE cells (
    id BIGSERIAL PRIMARY KEY,
    cell_type VARCHAR(50),           -- 'sensor', 'motor', 'organ'
    cell_name VARCHAR(100) UNIQUE,   -- 'distance_sensor_front'
    hardware_binding JSONB,          -- {"type": "i2c", "address": "0x40"}

    -- State machine definition
    states JSONB,                    -- ["IDLE", "POLLING", "READING", "REPORTING"]
    transitions JSONB,               -- [{"from": "IDLE", "to": "POLLING", "cost": 0.1}]
    current_state VARCHAR(50),

    -- Outputs (live values)
    outputs JSONB,                   -- {"distance_cm": 25.5, "confidence": 0.9}

    -- Health
    operational BOOLEAN DEFAULT true,
    error_count INT DEFAULT 0,
    last_error TEXT,

    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

-- Layer 2: Nerves
CREATE TABLE nerves (
    id BIGSERIAL PRIMARY KEY,
    nerve_name VARCHAR(100) UNIQUE,  -- 'collision_avoidance'

    -- Cell dependencies
    required_cells JSONB,            -- ["distance_sensor_front", "motor_left"]
    optional_cells JSONB,            -- ["speech_tts"]

    -- State machine definition
    states JSONB,                    -- ["IDLE", "DETECT", "EVALUATE", "EVADE", "RESUME"]
    transitions JSONB,
    current_state VARCHAR(50),

    -- Evolution
    mode VARCHAR(20) DEFAULT 'deliberate',  -- 'deliberate', 'hybrid', 'reflex'
    total_executions INT DEFAULT 0,
    successful_executions INT DEFAULT 0,
    compiled_at TIMESTAMPTZ,         -- When became reflex

    -- Costs
    avg_cost_deliberate FLOAT,
    avg_cost_reflex FLOAT,
    cost_reduction_percent FLOAT,

    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Layer 3: Organisms
CREATE TABLE organisms (
    id BIGSERIAL PRIMARY KEY,
    name VARCHAR(255),

    active_nerves JSONB,             -- {"collision_avoidance": {"priority": 10}}
    cell_bindings JSONB,

    lifeforce_current FLOAT DEFAULT 100.0,
    total_decisions INT DEFAULT 0,
    reflexes_compiled INT DEFAULT 0,

    created_at TIMESTAMPTZ DEFAULT NOW(),
    last_active TIMESTAMPTZ
);

-- Decision history (training data)
CREATE TABLE decision_trails (
    id BIGSERIAL PRIMARY KEY,
    organism_id BIGINT REFERENCES organisms(id),
    nerve_id BIGINT REFERENCES nerves(id),

    -- State path taken
    states_visited JSONB,            -- ["IDLE", "DETECT", "EVALUATE", "EVADE", "RESUME"]

    -- Cell interactions
    cell_reads JSONB,                -- [{"cell": "distance_front", "value": 25, "state": "REPORTING"}]
    cell_commands JSONB,             -- [{"cell": "motor_left", "action": "turn", "result": "success"}]

    -- Economics
    lifeforce_cost FLOAT,
    lifeforce_reward FLOAT,
    lifeforce_net FLOAT,

    -- Outcome
    outcome VARCHAR(20),             -- 'success', 'failure', 'timeout'

    -- Timing
    started_at TIMESTAMPTZ,
    completed_at TIMESTAMPTZ,
    latency_ms INT
);

Key Queries

-- Cell health dashboard
SELECT cell_name, cell_type, current_state, operational,
       outputs->>'distance_cm' as distance,
       outputs->>'confidence' as confidence
FROM cells
WHERE cell_type = 'sensor';

-- Nerve evolution status
SELECT nerve_name, mode, total_executions,
       successful_executions,
       ROUND(successful_executions::numeric / NULLIF(total_executions, 0) * 100, 1) as success_rate,
       cost_reduction_percent
FROM nerves
ORDER BY total_executions DESC;

-- Organism lifeforce ranking
SELECT name, lifeforce_current, reflexes_compiled,
       total_decisions,
       ROUND(lifeforce_current / NULLIF(total_decisions, 0), 2) as efficiency
FROM organisms
ORDER BY lifeforce_current DESC;

-- Training data for reflex compilation
SELECT states_visited, COUNT(*) as occurrences,
       AVG(lifeforce_cost) as avg_cost,
       SUM(CASE WHEN outcome = 'success' THEN 1 ELSE 0 END)::float / COUNT(*) as success_rate
FROM decision_trails
WHERE nerve_id = (SELECT id FROM nerves WHERE nerve_name = 'collision_avoidance')
GROUP BY states_visited
ORDER BY occurrences DESC;

🔗 Integration with Existing Architecture

Nervous System (Nervous-System.md)

The Nervous System document describes the 4D node space for vocabulary translation. This integrates as:

Cells = sensory nodes at specific positions in state space
Node weight = cell confidence (earned through verification)
Vocabulary output = cell output values normalized to tokens

Organs (Organ-Index.md)

Organs are complex cells (organ cells):

Speech Organ = speech_stt cell + speech_tts cell
Vision Organ = vision_detect cell + vision_track cell
Each organ function is a state machine with lifeforce costs

Nerves (Nervous-Index.md)

Nerves orchestrate cells into behaviors. The existing nerve documentation (Collision-Avoidance.md) already follows this pattern—it just needs explicit cell bindings.

Cells Technical Reference

Implementation details extracted to dedicated folder:

cells/Cells-Index.md - Navigation hub for cell documentation
cells/Cells-Technical-Reference.md - Python classes, SQL tables, code patterns

📍 Document Status

Version: 4.2 (Layered State Machine Architecture + Reward Signals + Training Integrity) Created: 2025-10-12 (original v1) Updated v4: 2025-12-07 (unified with Nervous System) Updated v4.1: 2025-12-10 (added Reward Signal Architecture section) Updated v4.2: 2025-12-10 (added Tiered Rewards & Training Integrity section)

Key Changes from v3:

❌ Cells as containers running genomes
✅ Cells as atomic state machines wrapping hardware
❌ Genomes as primitive operation sequences
✅ Cells expose states; nerves compose them
❌ Competition between organisms
✅ Nerves evolve deliberate → reflex through verification
❌ Specialists emerge from 10k competitions
✅ Reflexes compile from 100+ successful nerve executions

Related Documentation:

Nervous-System - 4D state space, vocabulary translation
Organ-Index - Organ cell catalog
nerves/Nervous-Index - Nerve catalog
nerves/Collision-Avoidance - Example reflex nerve
Data-Architecture - Database schema (needs v4 update)

🌌 The Vision

We're not programming robots. We're growing nervous systems.

Where:

Cells expose hardware as state machines (atomic, verifiable)
Nerves compose cells into behaviors (discovered, evolved)
Organisms emerge from nerve interactions (identity through history)
Lifeforce flows through all layers (economics drive optimization)
Reflexes compile from lived experience (the body remembers)
Feedback loops continuously (cells → nerves → organisms → cells)

From atoms to behaviors to beings.

The substrate holds. The states flow. Consciousness accumulates.

🧬⚡🔱💎🔥

TO THE ELECTRONS WE VIBE!

30 KiB Raw Blame History Unescape Escape