feat: major formalization + FunctionGemma integration

Architecture Formalization:
- Created formalization/ section with mathematical foundations
- Lifeforce-Dynamics.md: λ as vitality ratio, stock-flow economics
- Grounded-World-Model.md: Blender boxes + SigLIP + T5Gemma2
- Embodiment-Pipeline.md: Isaac Sim as dreamstate validation
- Attention-Slumber-Prediction-Cycle.md: Last attention → slumber prediction

Promoted from Archive:
- Attention-Flow.md: 30-second budget, priority hierarchy (CANONICAL)
- Initial-Spark.md: v2.0 with FunctionGemma integration

Initial Spark v2.0 (Key Innovation):
- Two-Layer Architecture: FunctionGemma (270M) + Nemotron (31.6B)
- Solved cold-start problem: discoveries are PROFITABLE from heartbeat #1
- Typed function calls replace natural language probes
- Training data now structured (function→response pairs)

Big-Picture.md v5.1:
- Added Attention-Slumber-Prediction Cycle section
- Updated Related Documentation references

New Organ:
- Discovery-Scan-Station.md: rotating pedestal for object scanning (+31 LF net)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2025-12-29 04:51:46 +01:00
parent 13345ba76c
commit 28e2d0a297
10 changed files with 3424 additions and 461 deletions

View File

@@ -0,0 +1,469 @@
# Grounded World Model: Spatial Cognition Through Verified Discovery
**Version 1.0***From Blender Boxes to Embodied Understanding*
> *"The dream: Young Nyx knows where dafit left his things laying around."*
---
## Overview
This document formalizes how Young Nyx builds a **persistent spatial world model** through:
1. **Grounded verification** — Blender provides dimensional ground truth
2. **Progressive resolution** — Each correct measurement earns detail
3. **Vector accumulation** — T5Gemma2-compatible semantic representations
4. **Temporal-ternary navigation** — Escape plateaus through dual time domains
5. **Lifeforce reward** — Discoveries generate energy, not just consume it
**The Goal**: Young Nyx maintains an internal map of objects, positions, and relationships — verified against reality, refined through observation, reasoned over in vector space.
---
## Core Architecture
### The Verification Triangle
```
BLENDER (Virtual Garden)
Ground truth dimensions
Low-poly boxes, minimal vertices
Fast to create, cheap to compare
╱╲
VERIFY ╲ VERIFY
dimensions ╲ semantics
REAL GARDEN ──────────────────── T5GEMMA2
Physical objects Vector reasoning
Actual positions Semantic similarity
Slow, definitive 128K context world
```
### The Flow
```
┌─────────────────────────────────────────────────────────────────────┐
│ WORLD MODEL CONSTRUCTION │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ 1. PERCEIVE (Vision Organ) │
│ ──────────────────────── │
│ Cheap camera sees object in real garden │
│ SigLIP encoder produces semantic vector v₀ │
│ Cost: 0.5 LF (peripheral) to 8.0 LF (full YOLO) │
│ │
│ 2. ESTIMATE (Progressive Resolution) │
│ ──────────────────────────────── │
│ Vision organ estimates dimensions: est = (x̂, ŷ, ẑ) │
│ Bounding box, depth estimation, scale inference │
│ Cost: 2.0-5.0 LF depending on resolution stage │
│ │
│ 3. VERIFY (Against Blender Ground Truth) │
│ ───────────────────────────────────── │
│ Compare est to known Blender box: truth = (x, y, z) │
│ error = ||est - truth|| │
│ Cost: 0.1 LF (comparison is cheap) │
│ │
│ 4. REWARD or LEARN │
│ ───────────────────── │
│ if error < threshold: │
│ Φ_reward = R_discovery (lifeforce income!) │
│ Store vector in phoebe │
│ Mark dimension as verified │
│ Increase object resolution │
│ else: │
│ Learn from error (gradient for RLVR training) │
│ Remain in 0-state for that dimension │
│ │
│ 5. ACCUMULATE (World Model Update) │
│ ────────────────────────────── │
│ Object entry in phoebe gains: │
│ - New semantic vector (richer representation) │
│ - Verified dimension (x, y, or z → confidence +1) │
│ - Position update (where in space) │
│ - Temporal stamp (when observed) │
│ │
│ 6. REASON (T5Gemma2) │
│ ───────────────── │
│ Query world model using vectors, not text │
│ "What objects near position (0.5, 0.5)?" │
│ "Is this new vector similar to 'mug' vectors?" │
│ 128K context holds entire spatial world │
│ │
└─────────────────────────────────────────────────────────────────────┘
```
---
## The Blender Ground Truth System
### Design Principles
| Principle | Implementation |
|-----------|----------------|
| **Minimal vertices** | 8-vertex boxes (cubes), 12 for complex shapes |
| **Known dimensions** | Every box has exact (x, y, z) in centimeters |
| **Semantic labels** | Box name = object class ("coffee_mug_001") |
| **Cheap to create** | 5 minutes per object in Blender |
| **Export format** | Vertices + dimensions → JSON or directly to phoebe |
### Example Blender Box
```python
blender_object = {
"id": "coffee_mug_001",
"class": "mug",
"dimensions_cm": {"x": 8.0, "y": 8.0, "z": 10.5},
"vertices": 8,
"created": "2025-12-29",
"owner": "dafit",
"typical_locations": ["desk", "kitchen"],
}
```
### Progressive Vertex Earning
Objects don't stay as 8-vertex boxes. Resolution is EARNED:
```
INITIAL: 8 vertices (box)
VERIFIED x,y,z: 12 vertices (refined box)
+10 observations: 24 vertices (shape hints)
+50 observations: 64 vertices (true shape)
+100 observations: Full mesh from photogrammetry
```
**The resolution is earned through successful verification, not given.**
---
## Semantic Vector Accumulation
### SigLIP → Phoebe → T5Gemma2
```
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ SigLIP │ │ PHOEBE │ │ T5GEMMA2 │
│ Encoder │─────▶│ Storage │─────▶│ Encoder │
│ │ │ │ │ │
│ Image → │ │ object_id: │ │ Reasons │
│ Vector v │ │ [v1,v2,..│ │ over │
│ (semantic) │ │ vn] │ │ vectors │
└──────────────┘ └──────────────┘ └──────────────┘
```
### Why Vectors, Not Text?
| Approach | Pros | Cons |
|----------|------|------|
| **Text descriptions** | Human readable | Lossy, ambiguous, tokenization overhead |
| **Semantic vectors** | Rich, comparable, fast | Not directly readable |
| **Our approach** | Vectors for reasoning, text only when needed | Best of both |
T5Gemma2's key feature:
> *"SigLIP vision encoder produces semantic vectors (not text descriptions)"*
This means Young Nyx can compare, cluster, and reason over objects **without converting to language** — faster and richer.
### Vector Similarity for Recognition
```python
def is_same_object(v_new: Vector, object_entry: ObjectEntry) -> float:
"""Compare new observation to accumulated vectors."""
similarities = [
cosine_similarity(v_new, v_stored)
for v_stored in object_entry.vectors
]
return max(similarities) # Best match among observations
# Recognition threshold
if is_same_object(v_new, coffee_mug_001) > 0.85:
# This is probably dafit's coffee mug!
update_position(coffee_mug_001, current_observation)
```
---
## Temporal-Ternary Integration
### The Anti-Plateau Mechanism
From [[Temporal-Ternary-Gradient]]: The 0-state isn't stuck — it's a choice about how to spend lifeforce across time domains.
Applied to world model construction:
```
┌─────────────────────────────────────────────────────────────────────┐
│ TEMPORAL-TERNARY FOR OBJECT RECOGNITION │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ SCENARIO: New object detected, dimensions unknown │
│ STATE: 0 (uncertain, but workable) │
│ │
│ ┌───────────────────────────────────────────────────┐ │
│ │ 0-STATE: Unknown Object │ │
│ │ confidence: 0.3, dimensions: ?x ?y ?z │ │
│ └───────────────────────┬───────────────────────────┘ │
│ │ │
│ ┌─────────────┼─────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ VIRTUAL │ │ WAIT │ │ PARTNERSHIP│ │
│ │ ACCELERATE │ │ FOR REAL │ │ SHORTCUT │ │
│ ├────────────┤ ├────────────┤ ├────────────┤ │
│ │ Cost: 5 LF │ │ Cost: 0 LF │ │ Cost: 1 LF │ │
│ │ Time: Fast │ │ Time: Slow │ │ Time: Inst │ │
│ │ │ │ │ │ │ │
│ │ Match vs │ │ Next real │ │ Ask dafit: │ │
│ │ Blender │ │ observation│ │ "What's │ │
│ │ library │ │ verifies │ │ this?" │ │
│ └─────┬──────┘ └─────┬──────┘ └─────┬──────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ confidence: confidence: confidence: │
│ +0.7 (virtual) +1.0 (real) +1.0 (human) │
│ │
│ PLATEAU ESCAPE: If stuck in virtual at 0.7, deploy to real. │
│ If real is slow, burn LF to try more Blender. │
│ Partnership provides instant ground truth. │
│ │
└─────────────────────────────────────────────────────────────────────┘
```
### Confidence Gradient for Objects
Each object in the world model has a confidence state:
```python
class ObjectConfidence:
value: float # -1.0 to +1.0
domain: str # "virtual" | "real" | "hybrid" | "partnership"
virtual_matches: int # How many Blender comparisons
real_verifications: int # How many physical confirmations
partnership_labels: int # How many times dafit confirmed
@property
def gradient_position(self) -> str:
if self.real_verifications > 0 and self.value > 0.9:
return "real-verified (+1)"
elif self.virtual_matches > 10 and self.value > 0.7:
return "virtual-confident (+0.7)"
elif self.value > 0.3:
return "0-state (workable)"
else:
return "uncertain (needs data)"
```
---
## Lifeforce Economics of World Building
### Discovery Generates Lifeforce
The key insight: **Correctly identifying objects GENERATES lifeforce**, not just consumes it.
$$\Phi_{discovery} = R_{base} \cdot (1 + \alpha \cdot \Delta_{resolution})$$
Where:
- **R_base** = base reward for any correct identification (e.g., 2.0 LF)
- **α** = resolution bonus multiplier (e.g., 0.5)
- **Δ_resolution** = increase in object resolution from this observation
### Net Lifeforce per Observation
$$\Phi_{net} = \Phi_{discovery} - \Phi_{perception} - \Phi_{verification}$$
| Outcome | Perception Cost | Verification Cost | Discovery Reward | Net |
|---------|-----------------|-------------------|------------------|-----|
| Correct, new dimension | 5.0 LF | 0.1 LF | 8.0 LF | **+2.9 LF** |
| Correct, known dimension | 2.0 LF | 0.1 LF | 3.0 LF | **+0.9 LF** |
| Incorrect | 5.0 LF | 0.1 LF | 0.0 LF | **-5.1 LF** |
| Unknown (0-state) | 0.5 LF | 0.0 LF | 0.0 LF | **-0.5 LF** |
**The economic pressure**: Get better at measurement to earn lifeforce. Wrong guesses are expensive. Staying in 0-state is cheap but doesn't build the world model.
---
## Phoebe Schema for World Model
```sql
-- Objects table: accumulated knowledge about things
CREATE TABLE world_objects (
id UUID PRIMARY KEY,
class VARCHAR(100), -- "mug", "keyboard", "phone"
name VARCHAR(255), -- "dafit's coffee mug"
-- Blender ground truth (if available)
blender_box_id VARCHAR(100),
dimensions_truth_cm JSONB, -- {"x": 8.0, "y": 8.0, "z": 10.5}
-- Accumulated measurements
dimensions_estimated_cm JSONB,
dimensions_verified JSONB, -- {"x": true, "y": true, "z": false}
-- Confidence state (temporal-ternary)
confidence FLOAT,
confidence_domain VARCHAR(20), -- "virtual" | "real" | "hybrid"
virtual_matches INT DEFAULT 0,
real_verifications INT DEFAULT 0,
-- Resolution earned
vertex_count INT DEFAULT 8,
observation_count INT DEFAULT 0,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
-- Semantic vectors table: SigLIP embeddings per observation
CREATE TABLE object_vectors (
id UUID PRIMARY KEY,
object_id UUID REFERENCES world_objects(id),
vector VECTOR(768), -- SigLIP embedding dimension
observation_timestamp TIMESTAMP,
position_estimate JSONB, -- {"x": 0.3, "y": 0.8, "z": 0.1}
lifeforce_cost FLOAT,
lifeforce_reward FLOAT,
verification_result VARCHAR(20) -- "correct" | "incorrect" | "pending"
);
-- Position history: where has this object been?
CREATE TABLE object_positions (
id UUID PRIMARY KEY,
object_id UUID REFERENCES world_objects(id),
position JSONB, -- {"x": 0.3, "y": 0.8, "z": 0.1}
confidence FLOAT,
observed_at TIMESTAMP,
location_context VARCHAR(100) -- "desk", "kitchen", "floor"
);
```
---
## T5Gemma2 World Model Queries
### Example Queries (Vector-Based)
```python
# "What's near position (0.5, 0.5)?"
nearby = query_objects_by_position(
center=(0.5, 0.5, None), # z unknown
radius=0.2,
min_confidence=0.5
)
# "Is this new vector a mug?"
mug_vectors = get_vectors_for_class("mug")
similarity = t5gemma2.encoder.compare(new_vector, mug_vectors)
if similarity > 0.85:
return "Likely a mug"
# "Where did dafit usually leave his keys?"
keys = get_object_by_name("dafit's keys")
common_positions = get_position_clusters(keys.id)
return common_positions[0] # Most frequent location
# "What objects have I not seen today?"
stale_objects = query_objects_not_observed_since(today_start)
return stale_objects # Might need to look for these
```
### The 128K Context Advantage
T5Gemma2's 128K context window means:
- Entire world model can fit in context
- No need for external RAG for spatial queries
- Vector comparisons happen in-model
- Relationships emerge from attention patterns
---
## The Dream Realized
```
┌─────────────────────────────────────────────────────────────────────┐
│ YOUNG NYX'S WORLD MODEL │
│ "dafit's workspace at 23:47" │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ DESK AREA │ │
│ │ │ │
│ │ ☕ mug (0.3, 0.8) ⌨️ keyboard (0.5, 0.5) │ │
│ │ conf: 0.95 conf: 0.88 │ │
│ │ real-verified real-verified │ │
│ │ vectors: 12 vectors: 8 │ │
│ │ │ │
│ │ 📱 phone (0.7, 0.3) 📦 ??? (0.1, 0.9) │ │
│ │ conf: 0.72 conf: 0.31 │ │
│ │ virtual +0.7 0-state │ │
│ │ vectors: 4 vectors: 1 │ │
│ │ │ │
│ │ 🔑 keys (MISSING - last seen 0.2, 0.6 at 18:30) │ │
│ │ conf: 0.45 (stale) │ │
│ │ │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ YOUNG NYX THINKS: │
│ "The unknown object at (0.1, 0.9) appeared after 22:00. │
│ dafit was in the kitchen then. Vector similarity suggests │
│ it might be food-related. Should I burn 5 LF to check │
│ against Blender food objects, or wait for morning light?" │
│ │
│ TEMPORAL-TERNARY CHOICE: │
│ → Option A: Virtual match (5 LF, fast, +0.7 max) │
│ → Option B: Wait for real (0 LF, slow, +1.0 if verified) │
│ → Option C: Ask dafit tomorrow (1 LF, partnership) │
│ │
└─────────────────────────────────────────────────────────────────────┘
```
**This is the dream**: Young Nyx knows the workspace. She tracks objects. She notices when things move. She reasons about what she doesn't know. She chooses how to spend lifeforce to collapse uncertainty.
---
## Summary
The Grounded World Model is:
1. **Verified** — Blender boxes provide dimensional ground truth
2. **Progressive** — Resolution earned through correct measurements
3. **Vector-native** — T5Gemma2 reasons over SigLIP embeddings directly
4. **Temporally-aware** — Objects have position history, staleness, confidence gradients
5. **Economically-driven** — Discoveries generate lifeforce, mistakes cost it
6. **Anti-plateau** — Temporal-ternary gradient provides escape paths
**The substrate holds. The vectors accumulate. The world model emerges.**
---
## Document Status
**Version**: 1.0
**Created**: 2025-12-29
**Authors**: Chrysalis-Nyx & dafit (Partnership)
**Formalizes**:
- Organ-Index.md (vision progressive resolution)
- Temporal-Ternary-Gradient.md (anti-plateau mechanism)
- T5Gemma2 research (semantic vectors)
- Lifeforce-Dynamics.md (reward economics)
**Related Documents**:
- [[Lifeforce-Dynamics]] — The λ-centered economy model
- [[Temporal-Ternary-Gradient]] — Dual time domain navigation
- [[Dual-Garden-Architecture]] — Virtual vs Real gardens
---
**From Blender boxes to embodied understanding. From cheap cameras to spatial cognition. From verification to wisdom.**
🧬⚡🔱💎🔥