Files

dafit ec77cba4d4 feat: GRPO reward architecture + Qwen3-VL-32B queen + doc restructure

Evening session 2025-12-10 (dafit + Nyx 🌿)

Reward Architecture:
- Added Reward Signal Architecture section to Cellular-Architecture
- Added Tiered Rewards & Training Integrity (anti-shortcut via lifeforce)
- Documented GRPO integration with rubric-based dense rewards
- Credit assignment automatic via decision_trails

Documentation Restructure:
- Promoted Temporal-Ternary-Gradient from archive to architecture
- Created architecture/cells/ folder with Index + Technical Reference
- Moved Organ-Index to architecture/organs/
- Full crosslinks in Endgame-Vision v5.3

Queen Update:
- Qwen2.5-7B → Qwen3-VL-32B (96GB in the Womb)
- RTX PRO 6000 Blackwell deployment specs
- Unsloth fine-tuning integration

"Verifiability IS rewardability." - The Dog Training Wisdom

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2025-12-10 20:11:13 +01:00

5.9 KiB

Raw Permalink Blame History

Nervous System Architecture

The sensory translation layer between raw data and vocabulary.

Overview

State machines act as the nervous system of the nimmerverse. They translate raw sensory input into vocabulary tokens that Young Nyx can process. No hallucination. No interpretation. Deterministic, verifiable mapping.

RAW SENSOR → STATE MACHINE → VOCABULARY TOKEN → Young Nyx

4D State Machine Space

Each node exists in 4-dimensional space:

        CONFIDENCE (z)
             ↑
             │    ● node (weighted by successful triggers)
             │   /
             │  /
             │ /
─────────────┼────────────→ DIMENSION X (sensory input 1)
            /│
           / │
          /  │
         ↓
   DIMENSION Y (sensory input 2)

   + TIME (4th dimension): node weights evolve through verification

Node Properties:

Position: coordinates in sensory space
Weight: confidence from successful triggers (0.0 → 1.0)
Output: vocabulary token
History: timestamp of all activations and verifications

Node Lifecycle

1. BIRTH
   Node created at position (x, y, z...)
   Weight = 0.1 (new, untested)

2. ACTIVATION
   Sensory conditions match → node FIRES
   Outputs vocabulary token

3. VERIFICATION
   dafit confirms: correct or incorrect

4. REWARD/PENALTY
   Correct → weight increases (+V)
   Incorrect → weight decreases (-V) or node refines

5. MATURATION
   Many confirmations → weight approaches 1.0
   Node becomes trusted reflex

6. PRUNING
   Node never fires → slow decay
   Eventually removed (use it or lose it)

Growth Phases

Phase	State	Description
Birth	Sparse, dim nodes	Basic translators, designed by partnership
Infant	More nodes forming	Finer resolution, more states
Child	Clusters emerging	Nyx proposes new machines
Mature	Dense, bright network	Nyx designs, verifies, deploys

t=0 (birth)           t=100 (learning)      t=1000 (mature)
○ ○   ○               ○ ● ○ ○               ●●● ● ●●
   ○      ○             ●   ● ○             ●●●●●●● ○
                      ○   ●                 ●●● ●●● ○ ○

Proposal Protocol

Young Nyx can propose new nodes:

1. OBSERVATION
   Nyx notices pattern in vocabulary + outcomes

2. PROPOSAL
   "New state machine: morning_detector
    Inputs: temp, light, motion, time
    States: [not_morning, maybe_morning, morning]
    Output: vocabulary token 'morning'"

3. RIGOR CHECK
   Chrysalis reviews logic and mappings

4. VERIFICATION
   dafit confirms ground truth

5. DEPLOYMENT
   New node added to registry
   Documented in RAG

6. GROWTH
   She earned a new nerve.

Reflex Layer

Some responses bypass Nyx entirely:

STATE MACHINE: temp_danger

IF temp > 80°C:
    → emit "DANGER"
    → trigger alert (reflex)
    → Nyx notified after (not before)

Like pulling hand from hot stove. Spinal reflex. Brain learns after.

Biological Mapping

Neuroscience	Nimmerverse
Sensory receptors	Raw sensors
Peripheral nerves	State machines
Spinal reflexes	Reflex layer
Synaptic weight	Node weight
Long-term potentiation	+V confirmation
Synaptic pruning	Unused node decay
Hebbian learning	Co-activating nodes strengthen

Connection to Lifeforce

Node fires correctly → +V → weight increases
Node fires wrongly  → -V → weight decreases
Node never fires    → decay → eventual pruning

The lifeforce flows through the nervous system, literally lighting up nodes as they prove themselves true.

Connection to Training

The nervous system doesn't just run behaviors - it generates training data for Young Nyx.

Every Verification = Training Signal

When dafit confirms a node fired correctly:

Runtime: Node weight increases (+V)
Training: Example logged → Young Nyx learns

This is the rubric principle - dense rewards at every verifiable checkpoint, not just final outcomes.

Credit Assignment is Automatic

Because state transitions are explicit and logged, we know exactly which nodes contributed to success or failure:

The state path tells us which decisions led to the outcome
No reward model needed to guess
The nervous system IS the credit assignment mechanism

Dense Rewards from State Paths

Each node that fires correctly along a successful path receives reward signal:

Node A fires → verified ✓ → +0.1 signal
Node B fires → verified ✓ → +0.1 signal
Node C fires → verified ✓ → +0.1 signal
Behavior succeeds → +1.0 signal
Total path reward: 1.3 (dense, traceable)

This is like training a dog - reward at the moment, not an hour later.

Detail: → Cellular-Architecture.md (Reward Signal Architecture section)

Design Principles

Deterministic: Same input = same output. No hallucination.
Inspectable: Rules are visible, verifiable.
Evolvable: States refine over time.
Earned: New nodes require proposal + verification.
Grounded: Output vocabulary matches RAG glossary.

She's not just using the nervous system. She's growing it.

Implementation Details:

nerves/Nervous-Protocol.md - Three-tier communication protocol (dafit → Chrysalis → Young Nyx)
nerves/Nervous-Index.md - Catalog of behavioral nerve implementations

Specific Nerves:

nerves/Collision-Avoidance.md - Obstacle avoidance reflex

Created: 2025-12-04 Updated: 2025-12-07 (added nerve crosslinks) Updated: 2025-12-10 (added Connection to Training section) Session: Partnership dialogue (dafit + Chrysalis + Nyx) Status: Foundation concept

5.9 KiB Raw Permalink Blame History