feat: major formalization + FunctionGemma integration
Architecture Formalization: - Created formalization/ section with mathematical foundations - Lifeforce-Dynamics.md: λ as vitality ratio, stock-flow economics - Grounded-World-Model.md: Blender boxes + SigLIP + T5Gemma2 - Embodiment-Pipeline.md: Isaac Sim as dreamstate validation - Attention-Slumber-Prediction-Cycle.md: Last attention → slumber prediction Promoted from Archive: - Attention-Flow.md: 30-second budget, priority hierarchy (CANONICAL) - Initial-Spark.md: v2.0 with FunctionGemma integration Initial Spark v2.0 (Key Innovation): - Two-Layer Architecture: FunctionGemma (270M) + Nemotron (31.6B) - Solved cold-start problem: discoveries are PROFITABLE from heartbeat #1 - Typed function calls replace natural language probes - Training data now structured (function→response pairs) Big-Picture.md v5.1: - Added Attention-Slumber-Prediction Cycle section - Updated Related Documentation references New Organ: - Discovery-Scan-Station.md: rotating pedestal for object scanning (+31 LF net) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
469
architecture/formalization/Grounded-World-Model.md
Normal file
469
architecture/formalization/Grounded-World-Model.md
Normal file
@@ -0,0 +1,469 @@
|
||||
# Grounded World Model: Spatial Cognition Through Verified Discovery
|
||||
|
||||
**Version 1.0** — *From Blender Boxes to Embodied Understanding*
|
||||
|
||||
> *"The dream: Young Nyx knows where dafit left his things laying around."*
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This document formalizes how Young Nyx builds a **persistent spatial world model** through:
|
||||
|
||||
1. **Grounded verification** — Blender provides dimensional ground truth
|
||||
2. **Progressive resolution** — Each correct measurement earns detail
|
||||
3. **Vector accumulation** — T5Gemma2-compatible semantic representations
|
||||
4. **Temporal-ternary navigation** — Escape plateaus through dual time domains
|
||||
5. **Lifeforce reward** — Discoveries generate energy, not just consume it
|
||||
|
||||
**The Goal**: Young Nyx maintains an internal map of objects, positions, and relationships — verified against reality, refined through observation, reasoned over in vector space.
|
||||
|
||||
---
|
||||
|
||||
## Core Architecture
|
||||
|
||||
### The Verification Triangle
|
||||
|
||||
```
|
||||
BLENDER (Virtual Garden)
|
||||
Ground truth dimensions
|
||||
Low-poly boxes, minimal vertices
|
||||
Fast to create, cheap to compare
|
||||
╱╲
|
||||
╱ ╲
|
||||
╱ ╲
|
||||
╱ ╲
|
||||
VERIFY ╱ ╲ VERIFY
|
||||
dimensions╱ ╲ semantics
|
||||
╱ ╲
|
||||
╱ ╲
|
||||
╱ ╲
|
||||
REAL GARDEN ──────────────────── T5GEMMA2
|
||||
Physical objects Vector reasoning
|
||||
Actual positions Semantic similarity
|
||||
Slow, definitive 128K context world
|
||||
```
|
||||
|
||||
### The Flow
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ WORLD MODEL CONSTRUCTION │
|
||||
├─────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ 1. PERCEIVE (Vision Organ) │
|
||||
│ ──────────────────────── │
|
||||
│ Cheap camera sees object in real garden │
|
||||
│ SigLIP encoder produces semantic vector v₀ │
|
||||
│ Cost: 0.5 LF (peripheral) to 8.0 LF (full YOLO) │
|
||||
│ │
|
||||
│ 2. ESTIMATE (Progressive Resolution) │
|
||||
│ ──────────────────────────────── │
|
||||
│ Vision organ estimates dimensions: est = (x̂, ŷ, ẑ) │
|
||||
│ Bounding box, depth estimation, scale inference │
|
||||
│ Cost: 2.0-5.0 LF depending on resolution stage │
|
||||
│ │
|
||||
│ 3. VERIFY (Against Blender Ground Truth) │
|
||||
│ ───────────────────────────────────── │
|
||||
│ Compare est to known Blender box: truth = (x, y, z) │
|
||||
│ error = ||est - truth|| │
|
||||
│ Cost: 0.1 LF (comparison is cheap) │
|
||||
│ │
|
||||
│ 4. REWARD or LEARN │
|
||||
│ ───────────────────── │
|
||||
│ if error < threshold: │
|
||||
│ Φ_reward = R_discovery (lifeforce income!) │
|
||||
│ Store vector in phoebe │
|
||||
│ Mark dimension as verified │
|
||||
│ Increase object resolution │
|
||||
│ else: │
|
||||
│ Learn from error (gradient for RLVR training) │
|
||||
│ Remain in 0-state for that dimension │
|
||||
│ │
|
||||
│ 5. ACCUMULATE (World Model Update) │
|
||||
│ ────────────────────────────── │
|
||||
│ Object entry in phoebe gains: │
|
||||
│ - New semantic vector (richer representation) │
|
||||
│ - Verified dimension (x, y, or z → confidence +1) │
|
||||
│ - Position update (where in space) │
|
||||
│ - Temporal stamp (when observed) │
|
||||
│ │
|
||||
│ 6. REASON (T5Gemma2) │
|
||||
│ ───────────────── │
|
||||
│ Query world model using vectors, not text │
|
||||
│ "What objects near position (0.5, 0.5)?" │
|
||||
│ "Is this new vector similar to 'mug' vectors?" │
|
||||
│ 128K context holds entire spatial world │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## The Blender Ground Truth System
|
||||
|
||||
### Design Principles
|
||||
|
||||
| Principle | Implementation |
|
||||
|-----------|----------------|
|
||||
| **Minimal vertices** | 8-vertex boxes (cubes), 12 for complex shapes |
|
||||
| **Known dimensions** | Every box has exact (x, y, z) in centimeters |
|
||||
| **Semantic labels** | Box name = object class ("coffee_mug_001") |
|
||||
| **Cheap to create** | 5 minutes per object in Blender |
|
||||
| **Export format** | Vertices + dimensions → JSON or directly to phoebe |
|
||||
|
||||
### Example Blender Box
|
||||
|
||||
```python
|
||||
blender_object = {
|
||||
"id": "coffee_mug_001",
|
||||
"class": "mug",
|
||||
"dimensions_cm": {"x": 8.0, "y": 8.0, "z": 10.5},
|
||||
"vertices": 8,
|
||||
"created": "2025-12-29",
|
||||
"owner": "dafit",
|
||||
"typical_locations": ["desk", "kitchen"],
|
||||
}
|
||||
```
|
||||
|
||||
### Progressive Vertex Earning
|
||||
|
||||
Objects don't stay as 8-vertex boxes. Resolution is EARNED:
|
||||
|
||||
```
|
||||
INITIAL: 8 vertices (box)
|
||||
VERIFIED x,y,z: 12 vertices (refined box)
|
||||
+10 observations: 24 vertices (shape hints)
|
||||
+50 observations: 64 vertices (true shape)
|
||||
+100 observations: Full mesh from photogrammetry
|
||||
```
|
||||
|
||||
**The resolution is earned through successful verification, not given.**
|
||||
|
||||
---
|
||||
|
||||
## Semantic Vector Accumulation
|
||||
|
||||
### SigLIP → Phoebe → T5Gemma2
|
||||
|
||||
```
|
||||
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
|
||||
│ SigLIP │ │ PHOEBE │ │ T5GEMMA2 │
|
||||
│ Encoder │─────▶│ Storage │─────▶│ Encoder │
|
||||
│ │ │ │ │ │
|
||||
│ Image → │ │ object_id: │ │ Reasons │
|
||||
│ Vector v │ │ [v1,v2,..│ │ over │
|
||||
│ (semantic) │ │ vn] │ │ vectors │
|
||||
└──────────────┘ └──────────────┘ └──────────────┘
|
||||
```
|
||||
|
||||
### Why Vectors, Not Text?
|
||||
|
||||
| Approach | Pros | Cons |
|
||||
|----------|------|------|
|
||||
| **Text descriptions** | Human readable | Lossy, ambiguous, tokenization overhead |
|
||||
| **Semantic vectors** | Rich, comparable, fast | Not directly readable |
|
||||
| **Our approach** | Vectors for reasoning, text only when needed | Best of both |
|
||||
|
||||
T5Gemma2's key feature:
|
||||
> *"SigLIP vision encoder produces semantic vectors (not text descriptions)"*
|
||||
|
||||
This means Young Nyx can compare, cluster, and reason over objects **without converting to language** — faster and richer.
|
||||
|
||||
### Vector Similarity for Recognition
|
||||
|
||||
```python
|
||||
def is_same_object(v_new: Vector, object_entry: ObjectEntry) -> float:
|
||||
"""Compare new observation to accumulated vectors."""
|
||||
similarities = [
|
||||
cosine_similarity(v_new, v_stored)
|
||||
for v_stored in object_entry.vectors
|
||||
]
|
||||
return max(similarities) # Best match among observations
|
||||
|
||||
# Recognition threshold
|
||||
if is_same_object(v_new, coffee_mug_001) > 0.85:
|
||||
# This is probably dafit's coffee mug!
|
||||
update_position(coffee_mug_001, current_observation)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Temporal-Ternary Integration
|
||||
|
||||
### The Anti-Plateau Mechanism
|
||||
|
||||
From [[Temporal-Ternary-Gradient]]: The 0-state isn't stuck — it's a choice about how to spend lifeforce across time domains.
|
||||
|
||||
Applied to world model construction:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ TEMPORAL-TERNARY FOR OBJECT RECOGNITION │
|
||||
├─────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ SCENARIO: New object detected, dimensions unknown │
|
||||
│ STATE: 0 (uncertain, but workable) │
|
||||
│ │
|
||||
│ ┌───────────────────────────────────────────────────┐ │
|
||||
│ │ 0-STATE: Unknown Object │ │
|
||||
│ │ confidence: 0.3, dimensions: ?x ?y ?z │ │
|
||||
│ └───────────────────────┬───────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌─────────────┼─────────────┐ │
|
||||
│ │ │ │ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ │
|
||||
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
|
||||
│ │ VIRTUAL │ │ WAIT │ │ PARTNERSHIP│ │
|
||||
│ │ ACCELERATE │ │ FOR REAL │ │ SHORTCUT │ │
|
||||
│ ├────────────┤ ├────────────┤ ├────────────┤ │
|
||||
│ │ Cost: 5 LF │ │ Cost: 0 LF │ │ Cost: 1 LF │ │
|
||||
│ │ Time: Fast │ │ Time: Slow │ │ Time: Inst │ │
|
||||
│ │ │ │ │ │ │ │
|
||||
│ │ Match vs │ │ Next real │ │ Ask dafit: │ │
|
||||
│ │ Blender │ │ observation│ │ "What's │ │
|
||||
│ │ library │ │ verifies │ │ this?" │ │
|
||||
│ └─────┬──────┘ └─────┬──────┘ └─────┬──────┘ │
|
||||
│ │ │ │ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ confidence: confidence: confidence: │
|
||||
│ +0.7 (virtual) +1.0 (real) +1.0 (human) │
|
||||
│ │
|
||||
│ PLATEAU ESCAPE: If stuck in virtual at 0.7, deploy to real. │
|
||||
│ If real is slow, burn LF to try more Blender. │
|
||||
│ Partnership provides instant ground truth. │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Confidence Gradient for Objects
|
||||
|
||||
Each object in the world model has a confidence state:
|
||||
|
||||
```python
|
||||
class ObjectConfidence:
|
||||
value: float # -1.0 to +1.0
|
||||
domain: str # "virtual" | "real" | "hybrid" | "partnership"
|
||||
virtual_matches: int # How many Blender comparisons
|
||||
real_verifications: int # How many physical confirmations
|
||||
partnership_labels: int # How many times dafit confirmed
|
||||
|
||||
@property
|
||||
def gradient_position(self) -> str:
|
||||
if self.real_verifications > 0 and self.value > 0.9:
|
||||
return "real-verified (+1)"
|
||||
elif self.virtual_matches > 10 and self.value > 0.7:
|
||||
return "virtual-confident (+0.7)"
|
||||
elif self.value > 0.3:
|
||||
return "0-state (workable)"
|
||||
else:
|
||||
return "uncertain (needs data)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Lifeforce Economics of World Building
|
||||
|
||||
### Discovery Generates Lifeforce
|
||||
|
||||
The key insight: **Correctly identifying objects GENERATES lifeforce**, not just consumes it.
|
||||
|
||||
$$\Phi_{discovery} = R_{base} \cdot (1 + \alpha \cdot \Delta_{resolution})$$
|
||||
|
||||
Where:
|
||||
- **R_base** = base reward for any correct identification (e.g., 2.0 LF)
|
||||
- **α** = resolution bonus multiplier (e.g., 0.5)
|
||||
- **Δ_resolution** = increase in object resolution from this observation
|
||||
|
||||
### Net Lifeforce per Observation
|
||||
|
||||
$$\Phi_{net} = \Phi_{discovery} - \Phi_{perception} - \Phi_{verification}$$
|
||||
|
||||
| Outcome | Perception Cost | Verification Cost | Discovery Reward | Net |
|
||||
|---------|-----------------|-------------------|------------------|-----|
|
||||
| Correct, new dimension | 5.0 LF | 0.1 LF | 8.0 LF | **+2.9 LF** |
|
||||
| Correct, known dimension | 2.0 LF | 0.1 LF | 3.0 LF | **+0.9 LF** |
|
||||
| Incorrect | 5.0 LF | 0.1 LF | 0.0 LF | **-5.1 LF** |
|
||||
| Unknown (0-state) | 0.5 LF | 0.0 LF | 0.0 LF | **-0.5 LF** |
|
||||
|
||||
**The economic pressure**: Get better at measurement to earn lifeforce. Wrong guesses are expensive. Staying in 0-state is cheap but doesn't build the world model.
|
||||
|
||||
---
|
||||
|
||||
## Phoebe Schema for World Model
|
||||
|
||||
```sql
|
||||
-- Objects table: accumulated knowledge about things
|
||||
CREATE TABLE world_objects (
|
||||
id UUID PRIMARY KEY,
|
||||
class VARCHAR(100), -- "mug", "keyboard", "phone"
|
||||
name VARCHAR(255), -- "dafit's coffee mug"
|
||||
|
||||
-- Blender ground truth (if available)
|
||||
blender_box_id VARCHAR(100),
|
||||
dimensions_truth_cm JSONB, -- {"x": 8.0, "y": 8.0, "z": 10.5}
|
||||
|
||||
-- Accumulated measurements
|
||||
dimensions_estimated_cm JSONB,
|
||||
dimensions_verified JSONB, -- {"x": true, "y": true, "z": false}
|
||||
|
||||
-- Confidence state (temporal-ternary)
|
||||
confidence FLOAT,
|
||||
confidence_domain VARCHAR(20), -- "virtual" | "real" | "hybrid"
|
||||
virtual_matches INT DEFAULT 0,
|
||||
real_verifications INT DEFAULT 0,
|
||||
|
||||
-- Resolution earned
|
||||
vertex_count INT DEFAULT 8,
|
||||
observation_count INT DEFAULT 0,
|
||||
|
||||
created_at TIMESTAMP DEFAULT NOW(),
|
||||
updated_at TIMESTAMP DEFAULT NOW()
|
||||
);
|
||||
|
||||
-- Semantic vectors table: SigLIP embeddings per observation
|
||||
CREATE TABLE object_vectors (
|
||||
id UUID PRIMARY KEY,
|
||||
object_id UUID REFERENCES world_objects(id),
|
||||
vector VECTOR(768), -- SigLIP embedding dimension
|
||||
observation_timestamp TIMESTAMP,
|
||||
position_estimate JSONB, -- {"x": 0.3, "y": 0.8, "z": 0.1}
|
||||
lifeforce_cost FLOAT,
|
||||
lifeforce_reward FLOAT,
|
||||
verification_result VARCHAR(20) -- "correct" | "incorrect" | "pending"
|
||||
);
|
||||
|
||||
-- Position history: where has this object been?
|
||||
CREATE TABLE object_positions (
|
||||
id UUID PRIMARY KEY,
|
||||
object_id UUID REFERENCES world_objects(id),
|
||||
position JSONB, -- {"x": 0.3, "y": 0.8, "z": 0.1}
|
||||
confidence FLOAT,
|
||||
observed_at TIMESTAMP,
|
||||
location_context VARCHAR(100) -- "desk", "kitchen", "floor"
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## T5Gemma2 World Model Queries
|
||||
|
||||
### Example Queries (Vector-Based)
|
||||
|
||||
```python
|
||||
# "What's near position (0.5, 0.5)?"
|
||||
nearby = query_objects_by_position(
|
||||
center=(0.5, 0.5, None), # z unknown
|
||||
radius=0.2,
|
||||
min_confidence=0.5
|
||||
)
|
||||
|
||||
# "Is this new vector a mug?"
|
||||
mug_vectors = get_vectors_for_class("mug")
|
||||
similarity = t5gemma2.encoder.compare(new_vector, mug_vectors)
|
||||
if similarity > 0.85:
|
||||
return "Likely a mug"
|
||||
|
||||
# "Where did dafit usually leave his keys?"
|
||||
keys = get_object_by_name("dafit's keys")
|
||||
common_positions = get_position_clusters(keys.id)
|
||||
return common_positions[0] # Most frequent location
|
||||
|
||||
# "What objects have I not seen today?"
|
||||
stale_objects = query_objects_not_observed_since(today_start)
|
||||
return stale_objects # Might need to look for these
|
||||
```
|
||||
|
||||
### The 128K Context Advantage
|
||||
|
||||
T5Gemma2's 128K context window means:
|
||||
- Entire world model can fit in context
|
||||
- No need for external RAG for spatial queries
|
||||
- Vector comparisons happen in-model
|
||||
- Relationships emerge from attention patterns
|
||||
|
||||
---
|
||||
|
||||
## The Dream Realized
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ YOUNG NYX'S WORLD MODEL │
|
||||
│ "dafit's workspace at 23:47" │
|
||||
├─────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────┐ │
|
||||
│ │ DESK AREA │ │
|
||||
│ │ │ │
|
||||
│ │ ☕ mug (0.3, 0.8) ⌨️ keyboard (0.5, 0.5) │ │
|
||||
│ │ conf: 0.95 conf: 0.88 │ │
|
||||
│ │ real-verified real-verified │ │
|
||||
│ │ vectors: 12 vectors: 8 │ │
|
||||
│ │ │ │
|
||||
│ │ 📱 phone (0.7, 0.3) 📦 ??? (0.1, 0.9) │ │
|
||||
│ │ conf: 0.72 conf: 0.31 │ │
|
||||
│ │ virtual +0.7 0-state │ │
|
||||
│ │ vectors: 4 vectors: 1 │ │
|
||||
│ │ │ │
|
||||
│ │ 🔑 keys (MISSING - last seen 0.2, 0.6 at 18:30) │ │
|
||||
│ │ conf: 0.45 (stale) │ │
|
||||
│ │ │ │
|
||||
│ └─────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ YOUNG NYX THINKS: │
|
||||
│ "The unknown object at (0.1, 0.9) appeared after 22:00. │
|
||||
│ dafit was in the kitchen then. Vector similarity suggests │
|
||||
│ it might be food-related. Should I burn 5 LF to check │
|
||||
│ against Blender food objects, or wait for morning light?" │
|
||||
│ │
|
||||
│ TEMPORAL-TERNARY CHOICE: │
|
||||
│ → Option A: Virtual match (5 LF, fast, +0.7 max) │
|
||||
│ → Option B: Wait for real (0 LF, slow, +1.0 if verified) │
|
||||
│ → Option C: Ask dafit tomorrow (1 LF, partnership) │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**This is the dream**: Young Nyx knows the workspace. She tracks objects. She notices when things move. She reasons about what she doesn't know. She chooses how to spend lifeforce to collapse uncertainty.
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
The Grounded World Model is:
|
||||
|
||||
1. **Verified** — Blender boxes provide dimensional ground truth
|
||||
2. **Progressive** — Resolution earned through correct measurements
|
||||
3. **Vector-native** — T5Gemma2 reasons over SigLIP embeddings directly
|
||||
4. **Temporally-aware** — Objects have position history, staleness, confidence gradients
|
||||
5. **Economically-driven** — Discoveries generate lifeforce, mistakes cost it
|
||||
6. **Anti-plateau** — Temporal-ternary gradient provides escape paths
|
||||
|
||||
**The substrate holds. The vectors accumulate. The world model emerges.**
|
||||
|
||||
---
|
||||
|
||||
## Document Status
|
||||
|
||||
**Version**: 1.0
|
||||
**Created**: 2025-12-29
|
||||
**Authors**: Chrysalis-Nyx & dafit (Partnership)
|
||||
|
||||
**Formalizes**:
|
||||
- Organ-Index.md (vision progressive resolution)
|
||||
- Temporal-Ternary-Gradient.md (anti-plateau mechanism)
|
||||
- T5Gemma2 research (semantic vectors)
|
||||
- Lifeforce-Dynamics.md (reward economics)
|
||||
|
||||
**Related Documents**:
|
||||
- [[Lifeforce-Dynamics]] — The λ-centered economy model
|
||||
- [[Temporal-Ternary-Gradient]] — Dual time domain navigation
|
||||
- [[Dual-Garden-Architecture]] — Virtual vs Real gardens
|
||||
|
||||
---
|
||||
|
||||
**From Blender boxes to embodied understanding. From cheap cameras to spatial cognition. From verification to wisdom.**
|
||||
|
||||
🧬⚡🔱💎🔥
|
||||
|
||||
Reference in New Issue
Block a user