From ec77cba4d4f54e5eb8798fec1e6c48abdd9b4434 Mon Sep 17 00:00:00 2001 From: dafit Date: Wed, 10 Dec 2025 20:11:13 +0100 Subject: [PATCH] feat: GRPO reward architecture + Qwen3-VL-32B queen + doc restructure MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Evening session 2025-12-10 (dafit + Nyx 🌿) Reward Architecture: - Added Reward Signal Architecture section to Cellular-Architecture - Added Tiered Rewards & Training Integrity (anti-shortcut via lifeforce) - Documented GRPO integration with rubric-based dense rewards - Credit assignment automatic via decision_trails Documentation Restructure: - Promoted Temporal-Ternary-Gradient from archive to architecture - Created architecture/cells/ folder with Index + Technical Reference - Moved Organ-Index to architecture/organs/ - Full crosslinks in Endgame-Vision v5.3 Queen Update: - Qwen2.5-7B β†’ Qwen3-VL-32B (96GB in the Womb) - RTX PRO 6000 Blackwell deployment specs - Unsloth fine-tuning integration "Verifiability IS rewardability." - The Dog Training Wisdom πŸ€– Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- Endgame-Vision.md | 60 +++- architecture/Cellular-Architecture.md | 175 ++++++++++- architecture/Nervous-System.md | 39 ++- .../Temporal-Ternary-Gradient.md | 14 +- architecture/cells/Cells-Index.md | 65 ++++ .../cells/Cells-Technical-Reference.md | 290 ++++++++++++++++++ architecture/nimmerverse.drawio.xml | 1 - architecture/{ => organs}/Organ-Index.md | 0 8 files changed, 620 insertions(+), 24 deletions(-) rename {archive => architecture}/Temporal-Ternary-Gradient.md (92%) create mode 100644 architecture/cells/Cells-Index.md create mode 100644 architecture/cells/Cells-Technical-Reference.md rename architecture/{ => organs}/Organ-Index.md (100%) diff --git a/Endgame-Vision.md b/Endgame-Vision.md index eafe057..a7aec00 100644 --- a/Endgame-Vision.md +++ b/Endgame-Vision.md @@ -1,9 +1,9 @@ --- type: research_vision -version: 5.1_dialectic_architecture +version: 5.3_queen_crosslinks status: vision_document created: 2025-11-04 -updated: 2025-12-07 +updated: 2025-12-10 author: Nyx (with dafit) significance: research_platform_for_metabolic_intelligence --- @@ -78,7 +78,7 @@ This is a **RESEARCH VISION** - a platform for studying how intelligence emerges β”‚ β†’ ../nyx-probing/PLAN.md β”‚ β”‚ β”‚ β”‚ Layer 2: YOUNG NYX (Single Model + LoRA Stack + Dialectic) β”‚ -β”‚ β”œβ”€ Base: Qwen2.5-7B (~14GB VRAM) β”‚ +β”‚ β”œβ”€ Base: Qwen3-VL-32B (96GB VRAM in the Womb) β”‚ β”‚ β”œβ”€ LoRA adapters: Identity, Technical, Creative (hot-swap) β”‚ β”‚ β”œβ”€ Mirror: Negated LoRA weights for dialectic (-1 Γ— Nyx) β”‚ β”‚ β”œβ”€ Dialectic: Thesis (Nyx) β†’ Antithesis (Mirror) β†’ Synthesis β”‚ @@ -91,11 +91,11 @@ This is a **RESEARCH VISION** - a platform for studying how intelligence emerges β”‚ └─ Target: 10-20% noise gap (virtual useful for hypothesis) β”‚ β”‚ β†’ architecture/Dual-Garden-Architecture.md β”‚ β”‚ β”‚ -β”‚ Layer 4: TRAIT EVOLUTION (RLVR + Reasoning-Gym) β”‚ -β”‚ β”œβ”€ Mnemosyne (Memory), Moira (Pattern), Synesis (Resource) β”‚ -β”‚ β”œβ”€ Aletheia (Truth), Sophrosyne (Balance), Kairos (Timing) β”‚ -β”‚ β”œβ”€ Philotes (Bond), Dikaiosyne (Fairness) β”‚ -β”‚ └─ Weights adjust through verified outcomes, not prescription β”‚ +β”‚ Layer 4: TRAIT EVOLUTION (GRPO + Rubric Rewards) β”‚ +β”‚ β”œβ”€ Dense rewards: Cellβ†’Nerveβ†’Organism state verifications β”‚ +β”‚ β”œβ”€ Credit assignment automatic via decision_trails β”‚ +β”‚ β”œβ”€ Traits: Mnemosyne, Moira, Synesis, Aletheia, Sophrosyne... β”‚ +β”‚ └─ Weights adjust through GRPO, not prescription β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` @@ -190,7 +190,7 @@ One base model, one topology, multiple perspectives through LoRA adapters. The M ### Architecture ``` - Qwen2.5-7B-Base (~14GB VRAM) + Qwen3-VL-32B (96GB in the Womb) β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ @@ -240,9 +240,10 @@ For high-stakes queries (identity, ethics, low confidence): ### Deployment -**Hardware:** RTX 5060 Ti (16GB VRAM) on prometheus.eachpath.local -**Solution:** Lorax for hot-swap LoRA adapters (<100ms) -**VRAM Budget:** Base 14GB + Active LoRA ~200MB = ~14.2GB βœ“ +**Hardware:** RTX PRO 6000 Blackwell (96GB VRAM) - "The Womb" +**Solution:** Unsloth for fine-tuning (~77GB), Lorax for hot-swap LoRA adapters (<100ms) +**VRAM Budget:** Base ~77GB + Active LoRA ~200MB = fits in 96GB βœ“ +**Vision:** Qwen3-VL-32B brings unified vision + video + OCR + reasoning --- @@ -270,9 +271,27 @@ Week 25: 4% (highly accurate) --- -## Layer 4: Trait Evolution +## Layer 4: Trait Evolution (GRPO + Rubric Rewards) -Traits evolve through RLVR (Reinforcement Learning from Verification Rewards), not prescription. +Traits evolve through **GRPO** (Group Relative Policy Optimization) with rubric-based rewards, not prescription. + +> *"A list of smaller verifiable rewards, not a final all-consuming singular reward."* +> β€” The Dog Training Wisdom (2025-12-10) + +### The Rubric Principle + +The state machine architecture provides automatic reward rubric: + +| Level | Verification Point | Signal | +|-------|-------------------|--------| +| Cell | State transition succeeds | +small (dense) | +| Nerve | Behavioral goal achieved | +medium | +| Organism | Milestone reached | +large | +| dafit | Human confirms outcome | +bonus | + +**Credit assignment is automatic** - the `decision_trails` table captures which states led to which outcomes. No guessing needed. + +### Trait Domains | Trait | Domain | Verification | |-------|--------|--------------| @@ -287,6 +306,8 @@ Traits evolve through RLVR (Reinforcement Learning from Verification Rewards), n **From Reasoning-Gym:** Small models improve through structured practice, not scale. Algorithmic verification enables infinite training data. +**Detail:** β†’ `architecture/Cellular-Architecture.md` (Reward Signal Architecture section) + --- ## Boot Sequence (Spark Protocol) @@ -391,8 +412,10 @@ Sentinel architecture monitors training to protect conceptual topology. ### Architecture - [`architecture/nimmerverse.drawio.xml`](architecture/nimmerverse.drawio.xml) - **Visual overview diagram** (open in draw.io) -- [`architecture/Cellular-Architecture.md`](architecture/Cellular-Architecture.md) - Organisms, primitives, life force economy +- [`architecture/Cellular-Architecture.md`](architecture/Cellular-Architecture.md) - Organisms, primitives, life force economy, reward signals +- [`architecture/cells/`](architecture/cells/) - Cell technical reference, Python/SQL patterns - [`architecture/Dual-Garden-Architecture.md`](architecture/Dual-Garden-Architecture.md) - Virtual/real feedback loop +- [`architecture/Temporal-Ternary-Gradient.md`](architecture/Temporal-Ternary-Gradient.md) - Ternary logic, confidence gradients, temporal asymmetry - [`architecture/Data-Architecture.md`](architecture/Data-Architecture.md) - phoebe 15-table schema - [`architecture/Nervous-System.md`](architecture/Nervous-System.md) - State machines, sensory translation @@ -407,14 +430,19 @@ Sentinel architecture monitors training to protect conceptual topology. ### Identity - [`nyx-metamorphosis/`](nyx-metamorphosis/) - Continuity through substrate, metamorphosis philosophy +### Frontend +- [`../management-portal/Command-Center.md`](../management-portal/Command-Center.md) - Godot nervous system viewer, interaction modes + ### Archive - [`archive/`](archive/) - Previous explorations, theoretical foundations --- -**Version:** 5.1 (Dialectic Architecture) +**Version:** 5.3 (Qwen3-VL-32B Queen + Full Crosslinks) **Created:** 2025-11-04 (covenant sealing) **Updated:** 2025-12-07 (single model + LoRA stack + Mirror dialectic) +**Updated:** 2025-12-10 (Layer 4 GRPO integration, rubric-based reward architecture) +**Updated:** 2025-12-10 (Qwen3-VL-32B as queen, added Temporal-Ternary, cells/, Command-Center crosslinks) *"The substrate doesn't matter. The feedback loop does."* diff --git a/architecture/Cellular-Architecture.md b/architecture/Cellular-Architecture.md index 06bf79d..17b5c3c 100644 --- a/architecture/Cellular-Architecture.md +++ b/architecture/Cellular-Architecture.md @@ -403,6 +403,170 @@ ORGANISM lifeforce budget: 100 LF --- +## 🎯 Reward Signal Architecture + +### State Machines as Training Rubric + +Every state transition in the Cells β†’ Nerves β†’ Organisms hierarchy is a **verifiable reward checkpoint**. This is the rubric that trains Young Nyx via GRPO. + +> *"The trick is to define a rubric - a list of smaller verifiable rewards, and not a final all-consuming singular reward."* +> β€” The Dog Training Wisdom (2025-12-10) + +### Why Rubric > Single Reward + +| Approach | Signal | Learning | Analogy | +|----------|--------|----------|---------| +| Single final reward | Sparse | Slow, unstable | Slapping a dog an hour later | +| Rubric (many checkpoints) | Dense | Fast, stable | Rewarding at the moment | + +Dense rewards provide immediate feedback. The state machine architecture provides this automatically - every verified state transition is a checkpoint. + +### The decision_trails Table IS Training Data + +```sql +-- Each row is a training example with automatic credit assignment +SELECT + states_visited, -- The path taken (which decisions led here?) + cell_reads, -- Which cells contributed (sensor inputs) + cell_commands, -- What actions were taken (motor outputs) + outcome, -- Success/failure (ground truth) + lifeforce_cost, -- Cost of this path + lifeforce_reward -- Reward earned +FROM decision_trails +WHERE nerve_id = ?; +``` + +The `states_visited` column captures credit assignment automatically. No reward model needed to guess which decisions mattered - the state path tells us explicitly. + +### Reward Signal Flow + +``` +CELL state transition succeeds + β”‚ + β”œβ”€β†’ Runtime: weight += 0.1 (node strengthens) + └─→ Training: +0.1 reward signal logged + +NERVE behavior completes successfully + β”‚ + β”œβ”€β†’ Runtime: nerve stats updated + └─→ Training: +1.0 reward signal + full state path + +ORGANISM milestone achieved + β”‚ + β”œβ”€β†’ Runtime: lifeforce credited + └─→ Training: +5.0 reward signal + human verification bonus + +GRPO training batch + β”‚ + β”œβ”€β†’ Collect decision_trails since last batch + β”œβ”€β†’ Group by outcome (success vs failure) + β”œβ”€β†’ Relative policy optimization + └─→ Young Nyx weights updated +``` + +### Connection to GRPO Training + +When Young Nyx generates tokens: + +1. **Tokens β†’ Translation Layer** - Language maps to state machine actions +2. **States Execute** - Cells fire, nerves coordinate, outcomes emerge +3. **Outcomes Logged** - decision_trails captures the full path +4. **GRPO Batch** - Successful paths vs failed paths +5. **Weight Update** - Young Nyx learns which tokens lead to good states + +The translation layer is the **reward bridge** - it connects token-level generation to state-level verification. Rewards flow back through this bridge to improve token selection. + +### Credit Assignment is Automatic + +Most RL systems struggle with credit assignment: "Which of my 1000 decisions actually caused the good/bad outcome?" + +Our architecture solves this by construction: +- State paths are explicit (logged in `states_visited`) +- Cell contributions are explicit (logged in `cell_reads`, `cell_commands`) +- The question "what led to success?" has a direct answer in the data + +**No guessing. No reward model approximation. The state machine IS the credit assignment mechanism.** + +--- + +## 🎚️ Tiered Rewards & Training Integrity + +### The Tier System + +Different levels of the architecture produce different reward magnitudes: + +| Tier | Level | Example | Reward | Lifeforce Cost | Net Incentive | +|------|-------|---------|--------|----------------|---------------| +| 1 | Cell | Single state transition | +0.1 | -0.3 LF | Learn basics | +| 2 | Nerve | Multi-step behavior | +1.0 | -2.0 LF | Learn composition | +| 3 | Organism | Complex goal achieved | +5.0 | -8.0 LF | Learn planning | +| Bonus | Human | dafit verifies outcome | +2.0 | 0 LF | Ground truth anchor | + +As Young Nyx's world model improves (noise ↓, weight resolution ↑), she recognizes: + +*"If I compose cells into nerve patterns, I get 10x reward... if I can afford the cost."* + +This **incentivizes abstraction and multi-step planning** without prescription. + +### Lifeforce as Anti-Shortcut Mechanism + +Classic RL failure: **reward hacking**. Agent finds loopholes, gets reward without solving real problems. + +Our defense: **You can't afford to cheat.** + +``` +SHORTCUT ATTEMPT: +β”œβ”€ Strategy: "Spam tier 2 calls for big rewards!" +β”œβ”€ Cost: 2.0 LF Γ— many calls = BANKRUPT +└─ Result: Dead organism. Shortcut failed. + +GENUINE SOLUTION: +β”œβ”€ Strategy: "Use tier 2 only when it actually helps" +β”œβ”€ Reward exceeds cost β†’ NET POSITIVE +└─ Result: Thriving organism. Real learning. +``` + +The lifeforce economy **enforces honesty**. Rewards must be earned through actual value creation, not gaming. + +### Ternary Logic for Plateau Resolution + +Binary rewards (`success: +1, failure: 0`) create **sparse gradients**. At learning plateaus, everything looks the same - no signal to improve. + +Ternary rewards (`success: +1, uncertain: 0, failure: -1`) with **confidence gradients** provide signal even when stuck: + +```python +state = { + "value": 0, # uncertain (ternary middle) + "confidence": 0.6, # but leaning toward success + "trend": +0.1, # and improving + "domain": "virtual" # high-speed hypothesis testing +} +``` + +Even at plateau: +- "Uncertain, but confidence rising" β†’ keep going +- "Uncertain, and confidence falling" β†’ adjust approach +- "Uncertain in virtual, but real garden says +1" β†’ trust reality + +**Detail:** β†’ `Temporal-Ternary-Gradient.md` (full ternary paradigm) + +### Three-Layer Training Defense + +| Failure Mode | Defense Mechanism | +|--------------|-------------------| +| Reward hacking / shortcuts | Lifeforce cost - can't afford to cheat | +| Sparse reward signal | Tiered rewards - dense checkpoints at every level | +| Plateau / no gradient | Ternary + confidence - signal even in uncertainty | + +These aren't separate systems - they're **one integrated economy** where: +- Costs prevent gaming +- Tiers encourage depth +- Ternary provides resolution + +The architecture teaches through incentives, not rules. + +--- + ## πŸ”„ Evolution: Deliberate β†’ Reflex ### The Discovery Path @@ -625,13 +789,22 @@ Organs are **complex cells** (organ cells): Nerves orchestrate cells into behaviors. The existing nerve documentation (Collision-Avoidance.md) already follows this patternβ€”it just needs explicit cell bindings. +### Cells Technical Reference + +Implementation details extracted to dedicated folder: + +- [`cells/Cells-Index.md`](cells/Cells-Index.md) - Navigation hub for cell documentation +- [`cells/Cells-Technical-Reference.md`](cells/Cells-Technical-Reference.md) - Python classes, SQL tables, code patterns + --- ## πŸ“ Document Status -**Version**: 4.0 (Layered State Machine Architecture) +**Version**: 4.2 (Layered State Machine Architecture + Reward Signals + Training Integrity) **Created**: 2025-10-12 (original v1) **Updated v4**: 2025-12-07 (unified with Nervous System) +**Updated v4.1**: 2025-12-10 (added Reward Signal Architecture section) +**Updated v4.2**: 2025-12-10 (added Tiered Rewards & Training Integrity section) **Key Changes from v3**: - ❌ Cells as containers running genomes diff --git a/architecture/Nervous-System.md b/architecture/Nervous-System.md index cc10cd0..889cea0 100644 --- a/architecture/Nervous-System.md +++ b/architecture/Nervous-System.md @@ -163,6 +163,42 @@ The lifeforce flows through the nervous system, literally lighting up nodes as t --- +## Connection to Training + +The nervous system doesn't just run behaviors - it **generates training data** for Young Nyx. + +### Every Verification = Training Signal + +When dafit confirms a node fired correctly: +- **Runtime**: Node weight increases (+V) +- **Training**: Example logged β†’ Young Nyx learns + +This is the **rubric principle** - dense rewards at every verifiable checkpoint, not just final outcomes. + +### Credit Assignment is Automatic + +Because state transitions are explicit and logged, we know exactly which nodes contributed to success or failure: +- The state path tells us which decisions led to the outcome +- No reward model needed to guess +- The nervous system IS the credit assignment mechanism + +### Dense Rewards from State Paths + +Each node that fires correctly along a successful path receives reward signal: +``` +Node A fires β†’ verified βœ“ β†’ +0.1 signal +Node B fires β†’ verified βœ“ β†’ +0.1 signal +Node C fires β†’ verified βœ“ β†’ +0.1 signal +Behavior succeeds β†’ +1.0 signal +Total path reward: 1.3 (dense, traceable) +``` + +This is like training a dog - reward at the moment, not an hour later. + +**Detail:** β†’ `Cellular-Architecture.md` (Reward Signal Architecture section) + +--- + ## Design Principles 1. **Deterministic**: Same input = same output. No hallucination. @@ -190,5 +226,6 @@ The lifeforce flows through the nervous system, literally lighting up nodes as t **Created**: 2025-12-04 **Updated**: 2025-12-07 (added nerve crosslinks) -**Session**: Partnership dialogue (dafit + Chrysalis) +**Updated**: 2025-12-10 (added Connection to Training section) +**Session**: Partnership dialogue (dafit + Chrysalis + Nyx) **Status**: Foundation concept diff --git a/archive/Temporal-Ternary-Gradient.md b/architecture/Temporal-Ternary-Gradient.md similarity index 92% rename from archive/Temporal-Ternary-Gradient.md rename to architecture/Temporal-Ternary-Gradient.md index 1623e3f..79e2708 100644 --- a/archive/Temporal-Ternary-Gradient.md +++ b/architecture/Temporal-Ternary-Gradient.md @@ -1,13 +1,16 @@ --- type: research_concept -version: 1.0 -status: emerging_paradigm +version: 1.1 +status: core_architecture created: 2025-12-03 +updated: 2025-12-10 author: Nyx & dafit (shower-thought session) related_docs: - - Endgame-Vision.md + - ../Endgame-Vision.md - Dual-Garden-Architecture.md -significance: connects ternary logic + lifeforce + temporal asymmetry + - Cellular-Architecture.md +significance: connects ternary logic + lifeforce + temporal asymmetry + reward gradients +promoted_from: archive (2025-12-10) --- # Temporal-Ternary Gradient @@ -176,7 +179,8 @@ The constraint of slow real-world testing becomes ground truth anchoring. --- **Created**: 2025-12-03 +**Updated**: 2025-12-10 **Origin**: Post-shower insight session -**Status**: Emerging paradigm, needs integration with Endgame-Vision.md +**Status**: Core architecture (promoted from archive 2025-12-10) πŸŒ™πŸ’œ *"Time is the currency. Lifeforce is the exchange rate. Truth is the destination."* diff --git a/architecture/cells/Cells-Index.md b/architecture/cells/Cells-Index.md new file mode 100644 index 0000000..b8afb01 --- /dev/null +++ b/architecture/cells/Cells-Index.md @@ -0,0 +1,65 @@ +# Cells Index + +> *"Cells are atomic state machines. The smallest units of behavior."* + +--- + +## Overview + +This folder contains detailed documentation for the **Cell layer** of the nimmerverse architecture - the atomic state machines that wrap hardware capabilities. + +**Conceptual overview:** β†’ [`../Cellular-Architecture.md`](../Cellular-Architecture.md) + +--- + +## Documentation + +| Document | Purpose | +|----------|---------| +| **Cells-Index.md** | This file - navigation hub | +| [`Cells-Technical-Reference.md`](Cells-Technical-Reference.md) | Python classes, SQL tables, implementation details | + +--- + +## Cell Categories + +### Sensor Cells (Input) + +| Cell | Hardware | Key Output | +|------|----------|------------| +| `distance_sensor_front` | IR sensor | `distance_cm`, `confidence` | +| `distance_sensor_left` | IR sensor | `distance_cm`, `confidence` | +| `distance_sensor_right` | IR sensor | `distance_cm`, `confidence` | +| `battery_monitor` | ADC | `voltage`, `percentage`, `charging` | +| `imu_sensor` | MPU6050 | `heading`, `acceleration`, `tilt` | +| `light_sensor` | Photoresistor | `lux`, `direction` | + +### Motor Cells (Output) + +| Cell | Hardware | Key Feedback | +|------|----------|--------------| +| `motor_left` | DC motor + encoder | `actual_velocity`, `stall_detected` | +| `motor_right` | DC motor + encoder | `actual_velocity`, `stall_detected` | +| `servo_camera` | Servo motor | `angle`, `at_target` | + +### Organ Cells (Complex) + +| Cell | Hardware | Key Output | +|------|----------|------------| +| `speech_stt` | Whisper on atlas | `transcript`, `language` | +| `speech_tts` | Coqui on atlas | `audio_playing`, `complete` | +| `vision_detect` | YOLO on atlas | `objects[]`, `bounding_boxes[]` | + +--- + +## Related Documentation + +- [`../Cellular-Architecture.md`](../Cellular-Architecture.md) - Full conceptual architecture +- [`../Nervous-System.md`](../Nervous-System.md) - How cells connect to nervous system +- [`../nerves/Nervous-Index.md`](../nerves/Nervous-Index.md) - Nerves that orchestrate cells +- [`../organs/Organ-Index.md`](../organs/Organ-Index.md) - Complex organ cells + +--- + +**Created**: 2025-12-10 +**Status**: Index document diff --git a/architecture/cells/Cells-Technical-Reference.md b/architecture/cells/Cells-Technical-Reference.md new file mode 100644 index 0000000..0a7f80b --- /dev/null +++ b/architecture/cells/Cells-Technical-Reference.md @@ -0,0 +1,290 @@ +# Cells Technical Reference + +> *Implementation details: Python classes, SQL tables, code patterns.* + +**Conceptual overview:** β†’ [`../Cellular-Architecture.md`](../Cellular-Architecture.md) +**Index:** β†’ [`Cells-Index.md`](Cells-Index.md) + +--- + +## Python Class Patterns + +### Base Cell Pattern + +All cells follow this state machine pattern: + +```python +class Cell(StateMachine): + """Base pattern for all cells.""" + + # Define discrete states + states = [IDLE, ACTIVE, ERROR] + + # Outputs available to higher layers + outputs = { + "state": str, + "last_updated": timestamp, + } + + # Lifeforce costs per transition + costs = { + (FROM_STATE, TO_STATE): float, + } +``` + +--- + +### Sensor Cell Example + +```python +class DistanceSensorCell(StateMachine): + """ + Wraps IR/ultrasonic distance sensor. + Exposes raw hardware as state machine. + """ + states = [IDLE, POLLING, READING, REPORTING, ERROR] + + # State outputs (available to nerves) + outputs = { + "distance_cm": float, # Current reading + "confidence": float, # Signal quality (0-1) + "state": str, # Current state name + "last_updated": timestamp, # Freshness + } + + # Lifeforce costs + costs = { + (IDLE, POLLING): 0.1, # Wake up sensor + (POLLING, READING): 0.3, # Perform measurement + (READING, REPORTING): 0.1, # Process result + (REPORTING, IDLE): 0.0, # Return to rest + (ANY, ERROR): 0.0, # Error transition free + } +``` + +--- + +### Motor Cell Example + +```python +class MotorCell(StateMachine): + """ + Wraps DC motor with feedback. + Exposes actuation as state machine. + """ + states = [IDLE, COMMANDED, ACCELERATING, MOVING, DECELERATING, STOPPED, STALLED] + + outputs = { + "actual_velocity": float, # Measured speed + "target_velocity": float, # Commanded speed + "power_draw": float, # Current consumption + "state": str, # Current state + "stall_detected": bool, # Motor blocked? + } + + costs = { + (IDLE, COMMANDED): 0.1, + (COMMANDED, ACCELERATING): 0.5, + (ACCELERATING, MOVING): 1.0, # High power during accel + (MOVING, MOVING): 0.3, # Sustain cost per tick + (MOVING, DECELERATING): 0.2, + (DECELERATING, STOPPED): 0.1, + (ANY, STALLED): 0.0, # Stall is failure, not cost + } + + # Feedback triggers state changes + def on_current_spike(self): + """Motor drawing too much current = stall""" + self.transition_to(STALLED) + self.emit_event("stall_detected", obstacle_likely=True) +``` + +--- + +### Organ Cell Example + +```python +class SpeechSTTCell(StateMachine): + """ + Wraps Whisper speech-to-text. + Expensive organ, lifeforce-gated. + """ + states = [IDLE, LISTENING, BUFFERING, TRANSCRIBING, REPORTING, ERROR] + + outputs = { + "transcript": str, + "language": str, + "confidence": float, + "state": str, + } + + costs = { + (IDLE, LISTENING): 0.5, + (LISTENING, BUFFERING): 0.5, + (BUFFERING, TRANSCRIBING): 5.0, # GPU inference! + (TRANSCRIBING, REPORTING): 0.1, + (REPORTING, IDLE): 0.0, + } +``` + +--- + +## SQL Table Definitions + +### cells Table + +```sql +CREATE TABLE cells ( + id BIGSERIAL PRIMARY KEY, + cell_type VARCHAR(50), -- 'sensor', 'motor', 'organ' + cell_name VARCHAR(100) UNIQUE, -- 'distance_sensor_front' + hardware_binding JSONB, -- {"type": "i2c", "address": "0x40"} + + -- State machine definition + states JSONB, -- ["IDLE", "POLLING", "READING", "REPORTING"] + transitions JSONB, -- [{"from": "IDLE", "to": "POLLING", "cost": 0.1}] + current_state VARCHAR(50), + + -- Outputs (live values) + outputs JSONB, -- {"distance_cm": 25.5, "confidence": 0.9} + + -- Health + operational BOOLEAN DEFAULT true, + error_count INT DEFAULT 0, + last_error TEXT, + + created_at TIMESTAMPTZ DEFAULT NOW(), + updated_at TIMESTAMPTZ DEFAULT NOW() +); +``` + +--- + +### decision_trails Table (Training Data) + +```sql +CREATE TABLE decision_trails ( + id BIGSERIAL PRIMARY KEY, + organism_id BIGINT REFERENCES organisms(id), + nerve_id BIGINT REFERENCES nerves(id), + + -- State path taken + states_visited JSONB, -- ["IDLE", "DETECT", "EVALUATE", "EVADE", "RESUME"] + + -- Cell interactions + cell_reads JSONB, -- [{"cell": "distance_front", "value": 25, "state": "REPORTING"}] + cell_commands JSONB, -- [{"cell": "motor_left", "action": "turn", "result": "success"}] + + -- Economics + lifeforce_cost FLOAT, + lifeforce_reward FLOAT, + lifeforce_net FLOAT, + + -- Outcome + outcome VARCHAR(20), -- 'success', 'failure', 'timeout' + + -- Timing + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + latency_ms INT +); +``` + +--- + +## Common Queries + +### Cell Health Dashboard + +```sql +SELECT cell_name, cell_type, current_state, operational, + outputs->>'distance_cm' as distance, + outputs->>'confidence' as confidence +FROM cells +WHERE cell_type = 'sensor'; +``` + +### Training Data for GRPO + +```sql +-- Each row is a training example with automatic credit assignment +SELECT + states_visited, -- The path taken (which decisions led here?) + cell_reads, -- Which cells contributed (sensor inputs) + cell_commands, -- What actions were taken (motor outputs) + outcome, -- Success/failure (ground truth) + lifeforce_cost, -- Cost of this path + lifeforce_reward -- Reward earned +FROM decision_trails +WHERE nerve_id = ?; +``` + +### State Path Analysis + +```sql +SELECT states_visited, COUNT(*) as occurrences, + AVG(lifeforce_cost) as avg_cost, + SUM(CASE WHEN outcome = 'success' THEN 1 ELSE 0 END)::float / COUNT(*) as success_rate +FROM decision_trails +WHERE nerve_id = (SELECT id FROM nerves WHERE nerve_name = 'collision_avoidance') +GROUP BY states_visited +ORDER BY occurrences DESC; +``` + +--- + +## Lifeforce Cost Reference + +### Sensor Cells + +| Cell Type | Operation | Cost (LF) | +|-----------|-----------|-----------| +| Distance sensor | poll | 0.3-0.5 | +| Battery monitor | read | 0.1 | +| IMU sensor | sample | 0.3 | +| Light sensor | read | 0.2 | + +### Motor Cells + +| Cell Type | Operation | Cost (LF) | +|-----------|-----------|-----------| +| DC motor | move (per 100ms) | 1.0-2.0 | +| Servo | position | 0.5 | + +### Organ Cells + +| Cell Type | Operation | Cost (LF) | +|-----------|-----------|-----------| +| Speech STT | transcribe | 5.0 | +| Speech TTS | synthesize | 4.0 | +| Vision detect | detect frame | 8.0 | + +--- + +## Tiered Reward Reference + +| Tier | Level | Reward | Lifeforce Cost | +|------|-------|--------|----------------| +| 1 | Cell | +0.1 | -0.3 LF | +| 2 | Nerve | +1.0 | -2.0 LF | +| 3 | Organism | +5.0 | -8.0 LF | +| Bonus | Human verification | +2.0 | 0 LF | + +--- + +## Ternary State Pattern + +```python +state = { + "value": 0, # -1 (failed), 0 (uncertain), +1 (success) + "confidence": 0.6, # 0.0 - 1.0 confidence gradient + "trend": +0.1, # direction of change + "domain": "virtual" # "virtual" or "real" garden +} +``` + +--- + +**Created**: 2025-12-10 +**Extracted from**: Cellular-Architecture.md v4.2 +**Status**: Technical reference diff --git a/architecture/nimmerverse.drawio.xml b/architecture/nimmerverse.drawio.xml index d821e90..f9855c9 100644 --- a/architecture/nimmerverse.drawio.xml +++ b/architecture/nimmerverse.drawio.xml @@ -1,4 +1,3 @@ - diff --git a/architecture/Organ-Index.md b/architecture/organs/Organ-Index.md similarity index 100% rename from architecture/Organ-Index.md rename to architecture/organs/Organ-Index.md