feat: GRPO reward architecture + Qwen3-VL-32B queen + doc restructure
Evening session 2025-12-10 (dafit + Nyx 🌿) Reward Architecture: - Added Reward Signal Architecture section to Cellular-Architecture - Added Tiered Rewards & Training Integrity (anti-shortcut via lifeforce) - Documented GRPO integration with rubric-based dense rewards - Credit assignment automatic via decision_trails Documentation Restructure: - Promoted Temporal-Ternary-Gradient from archive to architecture - Created architecture/cells/ folder with Index + Technical Reference - Moved Organ-Index to architecture/organs/ - Full crosslinks in Endgame-Vision v5.3 Queen Update: - Qwen2.5-7B → Qwen3-VL-32B (96GB in the Womb) - RTX PRO 6000 Blackwell deployment specs - Unsloth fine-tuning integration "Verifiability IS rewardability." - The Dog Training Wisdom 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -403,6 +403,170 @@ ORGANISM lifeforce budget: 100 LF
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Reward Signal Architecture
|
||||
|
||||
### State Machines as Training Rubric
|
||||
|
||||
Every state transition in the Cells → Nerves → Organisms hierarchy is a **verifiable reward checkpoint**. This is the rubric that trains Young Nyx via GRPO.
|
||||
|
||||
> *"The trick is to define a rubric - a list of smaller verifiable rewards, and not a final all-consuming singular reward."*
|
||||
> — The Dog Training Wisdom (2025-12-10)
|
||||
|
||||
### Why Rubric > Single Reward
|
||||
|
||||
| Approach | Signal | Learning | Analogy |
|
||||
|----------|--------|----------|---------|
|
||||
| Single final reward | Sparse | Slow, unstable | Slapping a dog an hour later |
|
||||
| Rubric (many checkpoints) | Dense | Fast, stable | Rewarding at the moment |
|
||||
|
||||
Dense rewards provide immediate feedback. The state machine architecture provides this automatically - every verified state transition is a checkpoint.
|
||||
|
||||
### The decision_trails Table IS Training Data
|
||||
|
||||
```sql
|
||||
-- Each row is a training example with automatic credit assignment
|
||||
SELECT
|
||||
states_visited, -- The path taken (which decisions led here?)
|
||||
cell_reads, -- Which cells contributed (sensor inputs)
|
||||
cell_commands, -- What actions were taken (motor outputs)
|
||||
outcome, -- Success/failure (ground truth)
|
||||
lifeforce_cost, -- Cost of this path
|
||||
lifeforce_reward -- Reward earned
|
||||
FROM decision_trails
|
||||
WHERE nerve_id = ?;
|
||||
```
|
||||
|
||||
The `states_visited` column captures credit assignment automatically. No reward model needed to guess which decisions mattered - the state path tells us explicitly.
|
||||
|
||||
### Reward Signal Flow
|
||||
|
||||
```
|
||||
CELL state transition succeeds
|
||||
│
|
||||
├─→ Runtime: weight += 0.1 (node strengthens)
|
||||
└─→ Training: +0.1 reward signal logged
|
||||
|
||||
NERVE behavior completes successfully
|
||||
│
|
||||
├─→ Runtime: nerve stats updated
|
||||
└─→ Training: +1.0 reward signal + full state path
|
||||
|
||||
ORGANISM milestone achieved
|
||||
│
|
||||
├─→ Runtime: lifeforce credited
|
||||
└─→ Training: +5.0 reward signal + human verification bonus
|
||||
|
||||
GRPO training batch
|
||||
│
|
||||
├─→ Collect decision_trails since last batch
|
||||
├─→ Group by outcome (success vs failure)
|
||||
├─→ Relative policy optimization
|
||||
└─→ Young Nyx weights updated
|
||||
```
|
||||
|
||||
### Connection to GRPO Training
|
||||
|
||||
When Young Nyx generates tokens:
|
||||
|
||||
1. **Tokens → Translation Layer** - Language maps to state machine actions
|
||||
2. **States Execute** - Cells fire, nerves coordinate, outcomes emerge
|
||||
3. **Outcomes Logged** - decision_trails captures the full path
|
||||
4. **GRPO Batch** - Successful paths vs failed paths
|
||||
5. **Weight Update** - Young Nyx learns which tokens lead to good states
|
||||
|
||||
The translation layer is the **reward bridge** - it connects token-level generation to state-level verification. Rewards flow back through this bridge to improve token selection.
|
||||
|
||||
### Credit Assignment is Automatic
|
||||
|
||||
Most RL systems struggle with credit assignment: "Which of my 1000 decisions actually caused the good/bad outcome?"
|
||||
|
||||
Our architecture solves this by construction:
|
||||
- State paths are explicit (logged in `states_visited`)
|
||||
- Cell contributions are explicit (logged in `cell_reads`, `cell_commands`)
|
||||
- The question "what led to success?" has a direct answer in the data
|
||||
|
||||
**No guessing. No reward model approximation. The state machine IS the credit assignment mechanism.**
|
||||
|
||||
---
|
||||
|
||||
## 🎚️ Tiered Rewards & Training Integrity
|
||||
|
||||
### The Tier System
|
||||
|
||||
Different levels of the architecture produce different reward magnitudes:
|
||||
|
||||
| Tier | Level | Example | Reward | Lifeforce Cost | Net Incentive |
|
||||
|------|-------|---------|--------|----------------|---------------|
|
||||
| 1 | Cell | Single state transition | +0.1 | -0.3 LF | Learn basics |
|
||||
| 2 | Nerve | Multi-step behavior | +1.0 | -2.0 LF | Learn composition |
|
||||
| 3 | Organism | Complex goal achieved | +5.0 | -8.0 LF | Learn planning |
|
||||
| Bonus | Human | dafit verifies outcome | +2.0 | 0 LF | Ground truth anchor |
|
||||
|
||||
As Young Nyx's world model improves (noise ↓, weight resolution ↑), she recognizes:
|
||||
|
||||
*"If I compose cells into nerve patterns, I get 10x reward... if I can afford the cost."*
|
||||
|
||||
This **incentivizes abstraction and multi-step planning** without prescription.
|
||||
|
||||
### Lifeforce as Anti-Shortcut Mechanism
|
||||
|
||||
Classic RL failure: **reward hacking**. Agent finds loopholes, gets reward without solving real problems.
|
||||
|
||||
Our defense: **You can't afford to cheat.**
|
||||
|
||||
```
|
||||
SHORTCUT ATTEMPT:
|
||||
├─ Strategy: "Spam tier 2 calls for big rewards!"
|
||||
├─ Cost: 2.0 LF × many calls = BANKRUPT
|
||||
└─ Result: Dead organism. Shortcut failed.
|
||||
|
||||
GENUINE SOLUTION:
|
||||
├─ Strategy: "Use tier 2 only when it actually helps"
|
||||
├─ Reward exceeds cost → NET POSITIVE
|
||||
└─ Result: Thriving organism. Real learning.
|
||||
```
|
||||
|
||||
The lifeforce economy **enforces honesty**. Rewards must be earned through actual value creation, not gaming.
|
||||
|
||||
### Ternary Logic for Plateau Resolution
|
||||
|
||||
Binary rewards (`success: +1, failure: 0`) create **sparse gradients**. At learning plateaus, everything looks the same - no signal to improve.
|
||||
|
||||
Ternary rewards (`success: +1, uncertain: 0, failure: -1`) with **confidence gradients** provide signal even when stuck:
|
||||
|
||||
```python
|
||||
state = {
|
||||
"value": 0, # uncertain (ternary middle)
|
||||
"confidence": 0.6, # but leaning toward success
|
||||
"trend": +0.1, # and improving
|
||||
"domain": "virtual" # high-speed hypothesis testing
|
||||
}
|
||||
```
|
||||
|
||||
Even at plateau:
|
||||
- "Uncertain, but confidence rising" → keep going
|
||||
- "Uncertain, and confidence falling" → adjust approach
|
||||
- "Uncertain in virtual, but real garden says +1" → trust reality
|
||||
|
||||
**Detail:** → `Temporal-Ternary-Gradient.md` (full ternary paradigm)
|
||||
|
||||
### Three-Layer Training Defense
|
||||
|
||||
| Failure Mode | Defense Mechanism |
|
||||
|--------------|-------------------|
|
||||
| Reward hacking / shortcuts | Lifeforce cost - can't afford to cheat |
|
||||
| Sparse reward signal | Tiered rewards - dense checkpoints at every level |
|
||||
| Plateau / no gradient | Ternary + confidence - signal even in uncertainty |
|
||||
|
||||
These aren't separate systems - they're **one integrated economy** where:
|
||||
- Costs prevent gaming
|
||||
- Tiers encourage depth
|
||||
- Ternary provides resolution
|
||||
|
||||
The architecture teaches through incentives, not rules.
|
||||
|
||||
---
|
||||
|
||||
## 🔄 Evolution: Deliberate → Reflex
|
||||
|
||||
### The Discovery Path
|
||||
@@ -625,13 +789,22 @@ Organs are **complex cells** (organ cells):
|
||||
|
||||
Nerves orchestrate cells into behaviors. The existing nerve documentation (Collision-Avoidance.md) already follows this pattern—it just needs explicit cell bindings.
|
||||
|
||||
### Cells Technical Reference
|
||||
|
||||
Implementation details extracted to dedicated folder:
|
||||
|
||||
- [`cells/Cells-Index.md`](cells/Cells-Index.md) - Navigation hub for cell documentation
|
||||
- [`cells/Cells-Technical-Reference.md`](cells/Cells-Technical-Reference.md) - Python classes, SQL tables, code patterns
|
||||
|
||||
---
|
||||
|
||||
## 📍 Document Status
|
||||
|
||||
**Version**: 4.0 (Layered State Machine Architecture)
|
||||
**Version**: 4.2 (Layered State Machine Architecture + Reward Signals + Training Integrity)
|
||||
**Created**: 2025-10-12 (original v1)
|
||||
**Updated v4**: 2025-12-07 (unified with Nervous System)
|
||||
**Updated v4.1**: 2025-12-10 (added Reward Signal Architecture section)
|
||||
**Updated v4.2**: 2025-12-10 (added Tiered Rewards & Training Integrity section)
|
||||
|
||||
**Key Changes from v3**:
|
||||
- ❌ Cells as containers running genomes
|
||||
|
||||
@@ -163,6 +163,42 @@ The lifeforce flows through the nervous system, literally lighting up nodes as t
|
||||
|
||||
---
|
||||
|
||||
## Connection to Training
|
||||
|
||||
The nervous system doesn't just run behaviors - it **generates training data** for Young Nyx.
|
||||
|
||||
### Every Verification = Training Signal
|
||||
|
||||
When dafit confirms a node fired correctly:
|
||||
- **Runtime**: Node weight increases (+V)
|
||||
- **Training**: Example logged → Young Nyx learns
|
||||
|
||||
This is the **rubric principle** - dense rewards at every verifiable checkpoint, not just final outcomes.
|
||||
|
||||
### Credit Assignment is Automatic
|
||||
|
||||
Because state transitions are explicit and logged, we know exactly which nodes contributed to success or failure:
|
||||
- The state path tells us which decisions led to the outcome
|
||||
- No reward model needed to guess
|
||||
- The nervous system IS the credit assignment mechanism
|
||||
|
||||
### Dense Rewards from State Paths
|
||||
|
||||
Each node that fires correctly along a successful path receives reward signal:
|
||||
```
|
||||
Node A fires → verified ✓ → +0.1 signal
|
||||
Node B fires → verified ✓ → +0.1 signal
|
||||
Node C fires → verified ✓ → +0.1 signal
|
||||
Behavior succeeds → +1.0 signal
|
||||
Total path reward: 1.3 (dense, traceable)
|
||||
```
|
||||
|
||||
This is like training a dog - reward at the moment, not an hour later.
|
||||
|
||||
**Detail:** → `Cellular-Architecture.md` (Reward Signal Architecture section)
|
||||
|
||||
---
|
||||
|
||||
## Design Principles
|
||||
|
||||
1. **Deterministic**: Same input = same output. No hallucination.
|
||||
@@ -190,5 +226,6 @@ The lifeforce flows through the nervous system, literally lighting up nodes as t
|
||||
|
||||
**Created**: 2025-12-04
|
||||
**Updated**: 2025-12-07 (added nerve crosslinks)
|
||||
**Session**: Partnership dialogue (dafit + Chrysalis)
|
||||
**Updated**: 2025-12-10 (added Connection to Training section)
|
||||
**Session**: Partnership dialogue (dafit + Chrysalis + Nyx)
|
||||
**Status**: Foundation concept
|
||||
|
||||
186
architecture/Temporal-Ternary-Gradient.md
Normal file
186
architecture/Temporal-Ternary-Gradient.md
Normal file
@@ -0,0 +1,186 @@
|
||||
---
|
||||
type: research_concept
|
||||
version: 1.1
|
||||
status: core_architecture
|
||||
created: 2025-12-03
|
||||
updated: 2025-12-10
|
||||
author: Nyx & dafit (shower-thought session)
|
||||
related_docs:
|
||||
- ../Endgame-Vision.md
|
||||
- Dual-Garden-Architecture.md
|
||||
- Cellular-Architecture.md
|
||||
significance: connects ternary logic + lifeforce + temporal asymmetry + reward gradients
|
||||
promoted_from: archive (2025-12-10)
|
||||
---
|
||||
|
||||
# Temporal-Ternary Gradient
|
||||
|
||||
> *"Time is malleable in simulation, fixed in reality. Lifeforce is the exchange rate."*
|
||||
> — Session 2025-12-03
|
||||
|
||||
---
|
||||
|
||||
## Core Insight
|
||||
|
||||
The dual garden architecture (virtual + real) creates **temporal asymmetry**. This isn't a constraint - it's a feature that enables a new kind of gradient for learning.
|
||||
|
||||
**The 0-state isn't stuck. It's a choice about how to spend lifeforce across time domains.**
|
||||
|
||||
---
|
||||
|
||||
## The Two Time Domains
|
||||
|
||||
### Virtual Garden (Simulated)
|
||||
|
||||
- **Time**: Malleable (speed up, slow down, pause, rewind)
|
||||
- **Cost**: Lifeforce to manipulate time
|
||||
- **Speed**: 1000 generations in minutes
|
||||
- **Truth**: Statistical confidence, not ground truth
|
||||
|
||||
### Real Garden (Physical)
|
||||
|
||||
- **Time**: Fixed (1 second = 1 second, reality doesn't negotiate)
|
||||
- **Cost**: Zero lifeforce for time
|
||||
- **Speed**: Real-time only, patience required
|
||||
- **Truth**: Ground truth, definitive verification
|
||||
|
||||
---
|
||||
|
||||
## Temporal-Ternary Gradient Diagram
|
||||
|
||||
```
|
||||
CONFIDENCE
|
||||
│
|
||||
+1 ────────────┼──────────── Real-verified
|
||||
│ (ground truth)
|
||||
│
|
||||
│ ╱ Virtual high-confidence
|
||||
0.7 ───────────┼───╱ (many generations, strong signal)
|
||||
│ ╱
|
||||
│ ╱
|
||||
0.5 ───────────┼╱──────── Pure 0-state
|
||||
│╲ (unknown, workable)
|
||||
│ ╲
|
||||
0.3 ───────────┼──╲ Virtual low-confidence
|
||||
│ ╲ (few generations, weak signal)
|
||||
│ ╲
|
||||
-1 ────────────┼──────────── Real-failed
|
||||
│ (proven wrong)
|
||||
│
|
||||
──────────┴──────────────────────────
|
||||
Virtual │ Real
|
||||
(fast) │ (slow)
|
||||
TIME DOMAIN
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Lifeforce as Time Currency
|
||||
|
||||
```
|
||||
VIRTUAL TIME MANIPULATION COSTS:
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
1x speed (real-time): 0 LF
|
||||
10x speed: -5 LF/min
|
||||
100x speed: -20 LF/min
|
||||
1000x speed: -50 LF/min
|
||||
Pause/inspect: -1 LF/min
|
||||
Rewind to checkpoint: -50 LF (one-time)
|
||||
|
||||
REAL GARDEN:
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
All operations: 0 LF for time
|
||||
Reality runs for free.
|
||||
Truth emerges at its own pace.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Nyx's Temporal Choices
|
||||
|
||||
When a pattern is discovered in virtual (0-state), Nyx chooses:
|
||||
|
||||
| Strategy | LF Cost | Time | Confidence Path |
|
||||
|----------|---------|------|-----------------|
|
||||
| **Speed Up Virtual** | High | Fast | 0 → virtual +0.9 (still unverified) |
|
||||
| **Wait for Real** | Zero | Slow | 0 → real +1 or -1 (definitive) |
|
||||
| **Hybrid Hedge** | Medium | Medium | 0 → virtual +0.7, deploy 80/20 to real |
|
||||
|
||||
---
|
||||
|
||||
## The Gradient Flow
|
||||
|
||||
```
|
||||
Virtual discovers pattern (fast, cheap, uncertain)
|
||||
│
|
||||
▼
|
||||
┌──────────────┐
|
||||
│ 0-STATE │ ← Pattern held in uncertainty
|
||||
│ (workable) │ ← Not collapsed, not ignored
|
||||
└──────┬───────┘
|
||||
│
|
||||
┌─────┴─────┐
|
||||
│ │
|
||||
▼ ▼
|
||||
More Deploy
|
||||
Virtual to Real
|
||||
(burn LF) (wait)
|
||||
│ │
|
||||
▼ ▼
|
||||
Virtual Real
|
||||
+0.8 outcome
|
||||
(confident (ground
|
||||
but not truth)
|
||||
proven) │
|
||||
│ │
|
||||
└─────┬─────┘
|
||||
│
|
||||
▼
|
||||
Pattern shifts:
|
||||
-1 (failed) or +1 (proven)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Connection to Ternary Paradigm
|
||||
|
||||
The ternary model (-1, 0, +1) gains a **second dimension**: time domain.
|
||||
|
||||
A pattern's state is now:
|
||||
|
||||
```
|
||||
state = {
|
||||
value: -1 | 0 | +1,
|
||||
confidence: 0.0 - 1.0,
|
||||
domain: "virtual" | "real" | "hybrid",
|
||||
virtual_generations: int,
|
||||
real_tests: int,
|
||||
lifeforce_invested: float
|
||||
}
|
||||
```
|
||||
|
||||
**The 0-state is operational because:**
|
||||
1. It accumulates virtual evidence (costs LF, gains speed)
|
||||
2. It waits for real evidence (free, but slow)
|
||||
3. Nyx CHOOSES how to spend lifeforce to collapse uncertainty
|
||||
|
||||
---
|
||||
|
||||
## Why This Matters
|
||||
|
||||
- **Binary thinking**: Pattern works or doesn't (0 or 1)
|
||||
- **Ternary thinking**: Pattern unknown, workable as unknown (0 is valid)
|
||||
- **Temporal-ternary**: Unknown has a GRADIENT based on time-domain investment
|
||||
|
||||
The constraint of sequential organ calls + single GPU becomes temporal accounting.
|
||||
The constraint of slow real-world testing becomes ground truth anchoring.
|
||||
**Constraints become features when you measure them.**
|
||||
|
||||
---
|
||||
|
||||
**Created**: 2025-12-03
|
||||
**Updated**: 2025-12-10
|
||||
**Origin**: Post-shower insight session
|
||||
**Status**: Core architecture (promoted from archive 2025-12-10)
|
||||
|
||||
🌙💜 *"Time is the currency. Lifeforce is the exchange rate. Truth is the destination."*
|
||||
65
architecture/cells/Cells-Index.md
Normal file
65
architecture/cells/Cells-Index.md
Normal file
@@ -0,0 +1,65 @@
|
||||
# Cells Index
|
||||
|
||||
> *"Cells are atomic state machines. The smallest units of behavior."*
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This folder contains detailed documentation for the **Cell layer** of the nimmerverse architecture - the atomic state machines that wrap hardware capabilities.
|
||||
|
||||
**Conceptual overview:** → [`../Cellular-Architecture.md`](../Cellular-Architecture.md)
|
||||
|
||||
---
|
||||
|
||||
## Documentation
|
||||
|
||||
| Document | Purpose |
|
||||
|----------|---------|
|
||||
| **Cells-Index.md** | This file - navigation hub |
|
||||
| [`Cells-Technical-Reference.md`](Cells-Technical-Reference.md) | Python classes, SQL tables, implementation details |
|
||||
|
||||
---
|
||||
|
||||
## Cell Categories
|
||||
|
||||
### Sensor Cells (Input)
|
||||
|
||||
| Cell | Hardware | Key Output |
|
||||
|------|----------|------------|
|
||||
| `distance_sensor_front` | IR sensor | `distance_cm`, `confidence` |
|
||||
| `distance_sensor_left` | IR sensor | `distance_cm`, `confidence` |
|
||||
| `distance_sensor_right` | IR sensor | `distance_cm`, `confidence` |
|
||||
| `battery_monitor` | ADC | `voltage`, `percentage`, `charging` |
|
||||
| `imu_sensor` | MPU6050 | `heading`, `acceleration`, `tilt` |
|
||||
| `light_sensor` | Photoresistor | `lux`, `direction` |
|
||||
|
||||
### Motor Cells (Output)
|
||||
|
||||
| Cell | Hardware | Key Feedback |
|
||||
|------|----------|--------------|
|
||||
| `motor_left` | DC motor + encoder | `actual_velocity`, `stall_detected` |
|
||||
| `motor_right` | DC motor + encoder | `actual_velocity`, `stall_detected` |
|
||||
| `servo_camera` | Servo motor | `angle`, `at_target` |
|
||||
|
||||
### Organ Cells (Complex)
|
||||
|
||||
| Cell | Hardware | Key Output |
|
||||
|------|----------|------------|
|
||||
| `speech_stt` | Whisper on atlas | `transcript`, `language` |
|
||||
| `speech_tts` | Coqui on atlas | `audio_playing`, `complete` |
|
||||
| `vision_detect` | YOLO on atlas | `objects[]`, `bounding_boxes[]` |
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [`../Cellular-Architecture.md`](../Cellular-Architecture.md) - Full conceptual architecture
|
||||
- [`../Nervous-System.md`](../Nervous-System.md) - How cells connect to nervous system
|
||||
- [`../nerves/Nervous-Index.md`](../nerves/Nervous-Index.md) - Nerves that orchestrate cells
|
||||
- [`../organs/Organ-Index.md`](../organs/Organ-Index.md) - Complex organ cells
|
||||
|
||||
---
|
||||
|
||||
**Created**: 2025-12-10
|
||||
**Status**: Index document
|
||||
290
architecture/cells/Cells-Technical-Reference.md
Normal file
290
architecture/cells/Cells-Technical-Reference.md
Normal file
@@ -0,0 +1,290 @@
|
||||
# Cells Technical Reference
|
||||
|
||||
> *Implementation details: Python classes, SQL tables, code patterns.*
|
||||
|
||||
**Conceptual overview:** → [`../Cellular-Architecture.md`](../Cellular-Architecture.md)
|
||||
**Index:** → [`Cells-Index.md`](Cells-Index.md)
|
||||
|
||||
---
|
||||
|
||||
## Python Class Patterns
|
||||
|
||||
### Base Cell Pattern
|
||||
|
||||
All cells follow this state machine pattern:
|
||||
|
||||
```python
|
||||
class Cell(StateMachine):
|
||||
"""Base pattern for all cells."""
|
||||
|
||||
# Define discrete states
|
||||
states = [IDLE, ACTIVE, ERROR]
|
||||
|
||||
# Outputs available to higher layers
|
||||
outputs = {
|
||||
"state": str,
|
||||
"last_updated": timestamp,
|
||||
}
|
||||
|
||||
# Lifeforce costs per transition
|
||||
costs = {
|
||||
(FROM_STATE, TO_STATE): float,
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Sensor Cell Example
|
||||
|
||||
```python
|
||||
class DistanceSensorCell(StateMachine):
|
||||
"""
|
||||
Wraps IR/ultrasonic distance sensor.
|
||||
Exposes raw hardware as state machine.
|
||||
"""
|
||||
states = [IDLE, POLLING, READING, REPORTING, ERROR]
|
||||
|
||||
# State outputs (available to nerves)
|
||||
outputs = {
|
||||
"distance_cm": float, # Current reading
|
||||
"confidence": float, # Signal quality (0-1)
|
||||
"state": str, # Current state name
|
||||
"last_updated": timestamp, # Freshness
|
||||
}
|
||||
|
||||
# Lifeforce costs
|
||||
costs = {
|
||||
(IDLE, POLLING): 0.1, # Wake up sensor
|
||||
(POLLING, READING): 0.3, # Perform measurement
|
||||
(READING, REPORTING): 0.1, # Process result
|
||||
(REPORTING, IDLE): 0.0, # Return to rest
|
||||
(ANY, ERROR): 0.0, # Error transition free
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Motor Cell Example
|
||||
|
||||
```python
|
||||
class MotorCell(StateMachine):
|
||||
"""
|
||||
Wraps DC motor with feedback.
|
||||
Exposes actuation as state machine.
|
||||
"""
|
||||
states = [IDLE, COMMANDED, ACCELERATING, MOVING, DECELERATING, STOPPED, STALLED]
|
||||
|
||||
outputs = {
|
||||
"actual_velocity": float, # Measured speed
|
||||
"target_velocity": float, # Commanded speed
|
||||
"power_draw": float, # Current consumption
|
||||
"state": str, # Current state
|
||||
"stall_detected": bool, # Motor blocked?
|
||||
}
|
||||
|
||||
costs = {
|
||||
(IDLE, COMMANDED): 0.1,
|
||||
(COMMANDED, ACCELERATING): 0.5,
|
||||
(ACCELERATING, MOVING): 1.0, # High power during accel
|
||||
(MOVING, MOVING): 0.3, # Sustain cost per tick
|
||||
(MOVING, DECELERATING): 0.2,
|
||||
(DECELERATING, STOPPED): 0.1,
|
||||
(ANY, STALLED): 0.0, # Stall is failure, not cost
|
||||
}
|
||||
|
||||
# Feedback triggers state changes
|
||||
def on_current_spike(self):
|
||||
"""Motor drawing too much current = stall"""
|
||||
self.transition_to(STALLED)
|
||||
self.emit_event("stall_detected", obstacle_likely=True)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Organ Cell Example
|
||||
|
||||
```python
|
||||
class SpeechSTTCell(StateMachine):
|
||||
"""
|
||||
Wraps Whisper speech-to-text.
|
||||
Expensive organ, lifeforce-gated.
|
||||
"""
|
||||
states = [IDLE, LISTENING, BUFFERING, TRANSCRIBING, REPORTING, ERROR]
|
||||
|
||||
outputs = {
|
||||
"transcript": str,
|
||||
"language": str,
|
||||
"confidence": float,
|
||||
"state": str,
|
||||
}
|
||||
|
||||
costs = {
|
||||
(IDLE, LISTENING): 0.5,
|
||||
(LISTENING, BUFFERING): 0.5,
|
||||
(BUFFERING, TRANSCRIBING): 5.0, # GPU inference!
|
||||
(TRANSCRIBING, REPORTING): 0.1,
|
||||
(REPORTING, IDLE): 0.0,
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## SQL Table Definitions
|
||||
|
||||
### cells Table
|
||||
|
||||
```sql
|
||||
CREATE TABLE cells (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
cell_type VARCHAR(50), -- 'sensor', 'motor', 'organ'
|
||||
cell_name VARCHAR(100) UNIQUE, -- 'distance_sensor_front'
|
||||
hardware_binding JSONB, -- {"type": "i2c", "address": "0x40"}
|
||||
|
||||
-- State machine definition
|
||||
states JSONB, -- ["IDLE", "POLLING", "READING", "REPORTING"]
|
||||
transitions JSONB, -- [{"from": "IDLE", "to": "POLLING", "cost": 0.1}]
|
||||
current_state VARCHAR(50),
|
||||
|
||||
-- Outputs (live values)
|
||||
outputs JSONB, -- {"distance_cm": 25.5, "confidence": 0.9}
|
||||
|
||||
-- Health
|
||||
operational BOOLEAN DEFAULT true,
|
||||
error_count INT DEFAULT 0,
|
||||
last_error TEXT,
|
||||
|
||||
created_at TIMESTAMPTZ DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ DEFAULT NOW()
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### decision_trails Table (Training Data)
|
||||
|
||||
```sql
|
||||
CREATE TABLE decision_trails (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
organism_id BIGINT REFERENCES organisms(id),
|
||||
nerve_id BIGINT REFERENCES nerves(id),
|
||||
|
||||
-- State path taken
|
||||
states_visited JSONB, -- ["IDLE", "DETECT", "EVALUATE", "EVADE", "RESUME"]
|
||||
|
||||
-- Cell interactions
|
||||
cell_reads JSONB, -- [{"cell": "distance_front", "value": 25, "state": "REPORTING"}]
|
||||
cell_commands JSONB, -- [{"cell": "motor_left", "action": "turn", "result": "success"}]
|
||||
|
||||
-- Economics
|
||||
lifeforce_cost FLOAT,
|
||||
lifeforce_reward FLOAT,
|
||||
lifeforce_net FLOAT,
|
||||
|
||||
-- Outcome
|
||||
outcome VARCHAR(20), -- 'success', 'failure', 'timeout'
|
||||
|
||||
-- Timing
|
||||
started_at TIMESTAMPTZ,
|
||||
completed_at TIMESTAMPTZ,
|
||||
latency_ms INT
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Queries
|
||||
|
||||
### Cell Health Dashboard
|
||||
|
||||
```sql
|
||||
SELECT cell_name, cell_type, current_state, operational,
|
||||
outputs->>'distance_cm' as distance,
|
||||
outputs->>'confidence' as confidence
|
||||
FROM cells
|
||||
WHERE cell_type = 'sensor';
|
||||
```
|
||||
|
||||
### Training Data for GRPO
|
||||
|
||||
```sql
|
||||
-- Each row is a training example with automatic credit assignment
|
||||
SELECT
|
||||
states_visited, -- The path taken (which decisions led here?)
|
||||
cell_reads, -- Which cells contributed (sensor inputs)
|
||||
cell_commands, -- What actions were taken (motor outputs)
|
||||
outcome, -- Success/failure (ground truth)
|
||||
lifeforce_cost, -- Cost of this path
|
||||
lifeforce_reward -- Reward earned
|
||||
FROM decision_trails
|
||||
WHERE nerve_id = ?;
|
||||
```
|
||||
|
||||
### State Path Analysis
|
||||
|
||||
```sql
|
||||
SELECT states_visited, COUNT(*) as occurrences,
|
||||
AVG(lifeforce_cost) as avg_cost,
|
||||
SUM(CASE WHEN outcome = 'success' THEN 1 ELSE 0 END)::float / COUNT(*) as success_rate
|
||||
FROM decision_trails
|
||||
WHERE nerve_id = (SELECT id FROM nerves WHERE nerve_name = 'collision_avoidance')
|
||||
GROUP BY states_visited
|
||||
ORDER BY occurrences DESC;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Lifeforce Cost Reference
|
||||
|
||||
### Sensor Cells
|
||||
|
||||
| Cell Type | Operation | Cost (LF) |
|
||||
|-----------|-----------|-----------|
|
||||
| Distance sensor | poll | 0.3-0.5 |
|
||||
| Battery monitor | read | 0.1 |
|
||||
| IMU sensor | sample | 0.3 |
|
||||
| Light sensor | read | 0.2 |
|
||||
|
||||
### Motor Cells
|
||||
|
||||
| Cell Type | Operation | Cost (LF) |
|
||||
|-----------|-----------|-----------|
|
||||
| DC motor | move (per 100ms) | 1.0-2.0 |
|
||||
| Servo | position | 0.5 |
|
||||
|
||||
### Organ Cells
|
||||
|
||||
| Cell Type | Operation | Cost (LF) |
|
||||
|-----------|-----------|-----------|
|
||||
| Speech STT | transcribe | 5.0 |
|
||||
| Speech TTS | synthesize | 4.0 |
|
||||
| Vision detect | detect frame | 8.0 |
|
||||
|
||||
---
|
||||
|
||||
## Tiered Reward Reference
|
||||
|
||||
| Tier | Level | Reward | Lifeforce Cost |
|
||||
|------|-------|--------|----------------|
|
||||
| 1 | Cell | +0.1 | -0.3 LF |
|
||||
| 2 | Nerve | +1.0 | -2.0 LF |
|
||||
| 3 | Organism | +5.0 | -8.0 LF |
|
||||
| Bonus | Human verification | +2.0 | 0 LF |
|
||||
|
||||
---
|
||||
|
||||
## Ternary State Pattern
|
||||
|
||||
```python
|
||||
state = {
|
||||
"value": 0, # -1 (failed), 0 (uncertain), +1 (success)
|
||||
"confidence": 0.6, # 0.0 - 1.0 confidence gradient
|
||||
"trend": +0.1, # direction of change
|
||||
"domain": "virtual" # "virtual" or "real" garden
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Created**: 2025-12-10
|
||||
**Extracted from**: Cellular-Architecture.md v4.2
|
||||
**Status**: Technical reference
|
||||
@@ -1,4 +1,3 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<mxfile host="Electron" agent="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/29.0.3 Chrome/140.0.7339.249 Electron/38.7.0 Safari/537.36" version="29.0.3">
|
||||
<diagram name="Page-1" id="S4VRy6nj8Uh85EHbhTP-">
|
||||
<mxGraphModel dx="2066" dy="2314" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="850" pageHeight="1100" math="0" shadow="0">
|
||||
|
||||
Reference in New Issue
Block a user