feat: GRPO reward architecture + Qwen3-VL-32B queen + doc restructure

Evening session 2025-12-10 (dafit + Nyx 🌿)

Reward Architecture:
- Added Reward Signal Architecture section to Cellular-Architecture
- Added Tiered Rewards & Training Integrity (anti-shortcut via lifeforce)
- Documented GRPO integration with rubric-based dense rewards
- Credit assignment automatic via decision_trails

Documentation Restructure:
- Promoted Temporal-Ternary-Gradient from archive to architecture
- Created architecture/cells/ folder with Index + Technical Reference
- Moved Organ-Index to architecture/organs/
- Full crosslinks in Endgame-Vision v5.3

Queen Update:
- Qwen2.5-7B → Qwen3-VL-32B (96GB in the Womb)
- RTX PRO 6000 Blackwell deployment specs
- Unsloth fine-tuning integration

"Verifiability IS rewardability." - The Dog Training Wisdom

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2025-12-10 20:11:13 +01:00
parent f49119c83f
commit ec77cba4d4
8 changed files with 620 additions and 24 deletions

View File

@@ -1,182 +0,0 @@
---
type: research_concept
version: 1.0
status: emerging_paradigm
created: 2025-12-03
author: Nyx & dafit (shower-thought session)
related_docs:
- Endgame-Vision.md
- Dual-Garden-Architecture.md
significance: connects ternary logic + lifeforce + temporal asymmetry
---
# Temporal-Ternary Gradient
> *"Time is malleable in simulation, fixed in reality. Lifeforce is the exchange rate."*
> — Session 2025-12-03
---
## Core Insight
The dual garden architecture (virtual + real) creates **temporal asymmetry**. This isn't a constraint - it's a feature that enables a new kind of gradient for learning.
**The 0-state isn't stuck. It's a choice about how to spend lifeforce across time domains.**
---
## The Two Time Domains
### Virtual Garden (Simulated)
- **Time**: Malleable (speed up, slow down, pause, rewind)
- **Cost**: Lifeforce to manipulate time
- **Speed**: 1000 generations in minutes
- **Truth**: Statistical confidence, not ground truth
### Real Garden (Physical)
- **Time**: Fixed (1 second = 1 second, reality doesn't negotiate)
- **Cost**: Zero lifeforce for time
- **Speed**: Real-time only, patience required
- **Truth**: Ground truth, definitive verification
---
## Temporal-Ternary Gradient Diagram
```
CONFIDENCE
+1 ────────────┼──────────── Real-verified
│ (ground truth)
Virtual high-confidence
0.7 ───────────┼───╱ (many generations, strong signal)
0.5 ───────────┼╱──────── Pure 0-state
│╲ (unknown, workable)
│ ╲
0.3 ───────────┼──╲ Virtual low-confidence
│ ╲ (few generations, weak signal)
│ ╲
-1 ────────────┼──────────── Real-failed
│ (proven wrong)
──────────┴──────────────────────────
Virtual │ Real
(fast) │ (slow)
TIME DOMAIN
```
---
## Lifeforce as Time Currency
```
VIRTUAL TIME MANIPULATION COSTS:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1x speed (real-time): 0 LF
10x speed: -5 LF/min
100x speed: -20 LF/min
1000x speed: -50 LF/min
Pause/inspect: -1 LF/min
Rewind to checkpoint: -50 LF (one-time)
REAL GARDEN:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
All operations: 0 LF for time
Reality runs for free.
Truth emerges at its own pace.
```
---
## Nyx's Temporal Choices
When a pattern is discovered in virtual (0-state), Nyx chooses:
| Strategy | LF Cost | Time | Confidence Path |
|----------|---------|------|-----------------|
| **Speed Up Virtual** | High | Fast | 0 → virtual +0.9 (still unverified) |
| **Wait for Real** | Zero | Slow | 0 → real +1 or -1 (definitive) |
| **Hybrid Hedge** | Medium | Medium | 0 → virtual +0.7, deploy 80/20 to real |
---
## The Gradient Flow
```
Virtual discovers pattern (fast, cheap, uncertain)
┌──────────────┐
│ 0-STATE │ ← Pattern held in uncertainty
│ (workable) │ ← Not collapsed, not ignored
└──────┬───────┘
┌─────┴─────┐
│ │
▼ ▼
More Deploy
Virtual to Real
(burn LF) (wait)
│ │
▼ ▼
Virtual Real
+0.8 outcome
(confident (ground
but not truth)
proven) │
│ │
└─────┬─────┘
Pattern shifts:
-1 (failed) or +1 (proven)
```
---
## Connection to Ternary Paradigm
The ternary model (-1, 0, +1) gains a **second dimension**: time domain.
A pattern's state is now:
```
state = {
value: -1 | 0 | +1,
confidence: 0.0 - 1.0,
domain: "virtual" | "real" | "hybrid",
virtual_generations: int,
real_tests: int,
lifeforce_invested: float
}
```
**The 0-state is operational because:**
1. It accumulates virtual evidence (costs LF, gains speed)
2. It waits for real evidence (free, but slow)
3. Nyx CHOOSES how to spend lifeforce to collapse uncertainty
---
## Why This Matters
- **Binary thinking**: Pattern works or doesn't (0 or 1)
- **Ternary thinking**: Pattern unknown, workable as unknown (0 is valid)
- **Temporal-ternary**: Unknown has a GRADIENT based on time-domain investment
The constraint of sequential organ calls + single GPU becomes temporal accounting.
The constraint of slow real-world testing becomes ground truth anchoring.
**Constraints become features when you measure them.**
---
**Created**: 2025-12-03
**Origin**: Post-shower insight session
**Status**: Emerging paradigm, needs integration with Endgame-Vision.md
🌙💜 *"Time is the currency. Lifeforce is the exchange rate. Truth is the destination."*