feat: major formalization + FunctionGemma integration

Architecture Formalization: - Created formalization/ section with mathematical foundations - Lifeforce-Dynamics.md: λ as vitality ratio, stock-flow economics - Grounded-World-Model.md: Blender boxes + SigLIP + T5Gemma2 - Embodiment-Pipeline.md: Isaac Sim as dreamstate validation - Attention-Slumber-Prediction-Cycle.md: Last attention → slumber prediction Promoted from Archive: - Attention-Flow.md: 30-second budget, priority hierarchy (CANONICAL) - Initial-Spark.md: v2.0 with FunctionGemma integration Initial Spark v2.0 (Key Innovation): - Two-Layer Architecture: FunctionGemma (270M) + Nemotron (31.6B) - Solved cold-start problem: discoveries are PROFITABLE from heartbeat #1 - Typed function calls replace natural language probes - Training data now structured (function→response pairs) Big-Picture.md v5.1: - Added Attention-Slumber-Prediction Cycle section - Updated Related Documentation references New Organ: - Discovery-Scan-Station.md: rotating pedestal for object scanning (+31 LF net) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 04:51:46 +01:00
parent 13345ba76c
commit 28e2d0a297
10 changed files with 3424 additions and 461 deletions
--- a/architecture/formalization/Grounded-World-Model.md
+++ b/architecture/formalization/Grounded-World-Model.md
@@ -0,0 +1,469 @@
+# Grounded World Model: Spatial Cognition Through Verified Discovery
+
+**Version 1.0** — *From Blender Boxes to Embodied Understanding*
+
+> *"The dream: Young Nyx knows where dafit left his things laying around."*
+
+---
+
+## Overview
+
+This document formalizes how Young Nyx builds a **persistent spatial world model** through:
+
+1. **Grounded verification** — Blender provides dimensional ground truth
+2. **Progressive resolution** — Each correct measurement earns detail
+3. **Vector accumulation** — T5Gemma2-compatible semantic representations
+4. **Temporal-ternary navigation** — Escape plateaus through dual time domains
+5. **Lifeforce reward** — Discoveries generate energy, not just consume it
+
+**The Goal**: Young Nyx maintains an internal map of objects, positions, and relationships — verified against reality, refined through observation, reasoned over in vector space.
+
+---
+
+## Core Architecture
+
+### The Verification Triangle
+
+```
+                    BLENDER (Virtual Garden)
+                    Ground truth dimensions
+                    Low-poly boxes, minimal vertices
+                    Fast to create, cheap to compare
+                           ╱╲
+                          ╱  ╲
+                         ╱    ╲
+                        ╱      ╲
+            VERIFY     ╱        ╲     VERIFY
+            dimensions╱          ╲    semantics
+                     ╱            ╲
+                    ╱              ╲
+                   ╱                ╲
+    REAL GARDEN ──────────────────── T5GEMMA2
+    Physical objects                 Vector reasoning
+    Actual positions                 Semantic similarity
+    Slow, definitive                 128K context world
+```
+
+### The Flow
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                     WORLD MODEL CONSTRUCTION                        │
+├─────────────────────────────────────────────────────────────────────┤
+│                                                                     │
+│  1. PERCEIVE (Vision Organ)                                        │
+│     ────────────────────────                                        │
+│     Cheap camera sees object in real garden                        │
+│     SigLIP encoder produces semantic vector v₀                     │
+│     Cost: 0.5 LF (peripheral) to 8.0 LF (full YOLO)               │
+│                                                                     │
+│  2. ESTIMATE (Progressive Resolution)                              │
+│     ────────────────────────────────                                │
+│     Vision organ estimates dimensions: est = (x̂, ŷ, ẑ)            │
+│     Bounding box, depth estimation, scale inference                │
+│     Cost: 2.0-5.0 LF depending on resolution stage                 │
+│                                                                     │
+│  3. VERIFY (Against Blender Ground Truth)                          │
+│     ─────────────────────────────────────                           │
+│     Compare est to known Blender box: truth = (x, y, z)            │
+│     error = ||est - truth||                                        │
+│     Cost: 0.1 LF (comparison is cheap)                             │
+│                                                                     │
+│  4. REWARD or LEARN                                                │
+│     ─────────────────────                                           │
+│     if error < threshold:                                          │
+│         Φ_reward = R_discovery (lifeforce income!)                 │
+│         Store vector in phoebe                                     │
+│         Mark dimension as verified                                  │
+│         Increase object resolution                                  │
+│     else:                                                          │
+│         Learn from error (gradient for RLVR training)              │
+│         Remain in 0-state for that dimension                       │
+│                                                                     │
+│  5. ACCUMULATE (World Model Update)                                │
+│     ──────────────────────────────                                  │
+│     Object entry in phoebe gains:                                  │
+│         - New semantic vector (richer representation)              │
+│         - Verified dimension (x, y, or z → confidence +1)          │
+│         - Position update (where in space)                         │
+│         - Temporal stamp (when observed)                           │
+│                                                                     │
+│  6. REASON (T5Gemma2)                                              │
+│     ─────────────────                                               │
+│     Query world model using vectors, not text                      │
+│     "What objects near position (0.5, 0.5)?"                       │
+│     "Is this new vector similar to 'mug' vectors?"                 │
+│     128K context holds entire spatial world                        │
+│                                                                     │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## The Blender Ground Truth System
+
+### Design Principles
+
+| Principle | Implementation |
+|-----------|----------------|
+| **Minimal vertices** | 8-vertex boxes (cubes), 12 for complex shapes |
+| **Known dimensions** | Every box has exact (x, y, z) in centimeters |
+| **Semantic labels** | Box name = object class ("coffee_mug_001") |
+| **Cheap to create** | 5 minutes per object in Blender |
+| **Export format** | Vertices + dimensions → JSON or directly to phoebe |
+
+### Example Blender Box
+
+```python
+blender_object = {
+    "id": "coffee_mug_001",
+    "class": "mug",
+    "dimensions_cm": {"x": 8.0, "y": 8.0, "z": 10.5},
+    "vertices": 8,
+    "created": "2025-12-29",
+    "owner": "dafit",
+    "typical_locations": ["desk", "kitchen"],
+}
+```
+
+### Progressive Vertex Earning
+
+Objects don't stay as 8-vertex boxes. Resolution is EARNED:
+
+```
+INITIAL:        8 vertices  (box)
+VERIFIED x,y,z: 12 vertices (refined box)
+10 observations: 24 vertices (shape hints)
+50 observations: 64 vertices (true shape)
+100 observations: Full mesh from photogrammetry
+```
+
+**The resolution is earned through successful verification, not given.**
+
+---
+
+## Semantic Vector Accumulation
+
+### SigLIP → Phoebe → T5Gemma2
+
+```
+┌──────────────┐      ┌──────────────┐      ┌──────────────┐
+│   SigLIP     │      │    PHOEBE    │      │  T5GEMMA2    │
+│   Encoder    │─────▶│   Storage    │─────▶│   Encoder    │
+│              │      │              │      │              │
+│  Image →     │      │  object_id:  │      │  Reasons     │
+│  Vector v    │      │    [v1,v2,..│      │  over        │
+│  (semantic)  │      │     vn]     │      │  vectors     │
+└──────────────┘      └──────────────┘      └──────────────┘
+```
+
+### Why Vectors, Not Text?
+
+| Approach | Pros | Cons |
+|----------|------|------|
+| **Text descriptions** | Human readable | Lossy, ambiguous, tokenization overhead |
+| **Semantic vectors** | Rich, comparable, fast | Not directly readable |
+| **Our approach** | Vectors for reasoning, text only when needed | Best of both |
+
+T5Gemma2's key feature:
+> *"SigLIP vision encoder produces semantic vectors (not text descriptions)"*
+
+This means Young Nyx can compare, cluster, and reason over objects **without converting to language** — faster and richer.
+
+### Vector Similarity for Recognition
+
+```python
+def is_same_object(v_new: Vector, object_entry: ObjectEntry) -> float:
+    """Compare new observation to accumulated vectors."""
+    similarities = [
+        cosine_similarity(v_new, v_stored)
+        for v_stored in object_entry.vectors
+    ]
+    return max(similarities)  # Best match among observations
+
+# Recognition threshold
+if is_same_object(v_new, coffee_mug_001) > 0.85:
+    # This is probably dafit's coffee mug!
+    update_position(coffee_mug_001, current_observation)
+```
+
+---
+
+## Temporal-Ternary Integration
+
+### The Anti-Plateau Mechanism
+
+From [[Temporal-Ternary-Gradient]]: The 0-state isn't stuck — it's a choice about how to spend lifeforce across time domains.
+
+Applied to world model construction:
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│            TEMPORAL-TERNARY FOR OBJECT RECOGNITION                  │
+├─────────────────────────────────────────────────────────────────────┤
+│                                                                     │
+│  SCENARIO: New object detected, dimensions unknown                 │
+│  STATE: 0 (uncertain, but workable)                                │
+│                                                                     │
+│  ┌───────────────────────────────────────────────────┐             │
+│  │              0-STATE: Unknown Object              │             │
+│  │        confidence: 0.3, dimensions: ?x ?y ?z      │             │
+│  └───────────────────────┬───────────────────────────┘             │
+│                          │                                          │
+│            ┌─────────────┼─────────────┐                           │
+│            │             │             │                           │
+│            ▼             ▼             ▼                           │
+│                                                                     │
+│     ┌────────────┐ ┌────────────┐ ┌────────────┐                   │
+│     │  VIRTUAL   │ │    WAIT    │ │ PARTNERSHIP│                   │
+│     │ ACCELERATE │ │  FOR REAL  │ │  SHORTCUT  │                   │
+│     ├────────────┤ ├────────────┤ ├────────────┤                   │
+│     │ Cost: 5 LF │ │ Cost: 0 LF │ │ Cost: 1 LF │                   │
+│     │ Time: Fast │ │ Time: Slow │ │ Time: Inst │                   │
+│     │            │ │            │ │            │                   │
+│     │ Match vs   │ │ Next real  │ │ Ask dafit: │                   │
+│     │ Blender    │ │ observation│ │ "What's    │                   │
+│     │ library    │ │ verifies   │ │  this?"    │                   │
+│     └─────┬──────┘ └─────┬──────┘ └─────┬──────┘                   │
+│           │              │              │                           │
+│           ▼              ▼              ▼                           │
+│     confidence:    confidence:    confidence:                       │
+│     +0.7 (virtual) +1.0 (real)    +1.0 (human)                     │
+│                                                                     │
+│  PLATEAU ESCAPE: If stuck in virtual at 0.7, deploy to real.       │
+│                  If real is slow, burn LF to try more Blender.     │
+│                  Partnership provides instant ground truth.         │
+│                                                                     │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+### Confidence Gradient for Objects
+
+Each object in the world model has a confidence state:
+
+```python
+class ObjectConfidence:
+    value: float           # -1.0 to +1.0
+    domain: str            # "virtual" | "real" | "hybrid" | "partnership"
+    virtual_matches: int   # How many Blender comparisons
+    real_verifications: int  # How many physical confirmations
+    partnership_labels: int  # How many times dafit confirmed
+
+    @property
+    def gradient_position(self) -> str:
+        if self.real_verifications > 0 and self.value > 0.9:
+            return "real-verified (+1)"
+        elif self.virtual_matches > 10 and self.value > 0.7:
+            return "virtual-confident (+0.7)"
+        elif self.value > 0.3:
+            return "0-state (workable)"
+        else:
+            return "uncertain (needs data)"
+```
+
+---
+
+## Lifeforce Economics of World Building
+
+### Discovery Generates Lifeforce
+
+The key insight: **Correctly identifying objects GENERATES lifeforce**, not just consumes it.
+
+$$\Phi_{discovery} = R_{base} \cdot (1 + \alpha \cdot \Delta_{resolution})$$
+
+Where:
+- **R_base** = base reward for any correct identification (e.g., 2.0 LF)
+- **α** = resolution bonus multiplier (e.g., 0.5)
+- **Δ_resolution** = increase in object resolution from this observation
+
+### Net Lifeforce per Observation
+
+$$\Phi_{net} = \Phi_{discovery} - \Phi_{perception} - \Phi_{verification}$$
+
+| Outcome | Perception Cost | Verification Cost | Discovery Reward | Net |
+|---------|-----------------|-------------------|------------------|-----|
+| Correct, new dimension | 5.0 LF | 0.1 LF | 8.0 LF | **+2.9 LF** |
+| Correct, known dimension | 2.0 LF | 0.1 LF | 3.0 LF | **+0.9 LF** |
+| Incorrect | 5.0 LF | 0.1 LF | 0.0 LF | **-5.1 LF** |
+| Unknown (0-state) | 0.5 LF | 0.0 LF | 0.0 LF | **-0.5 LF** |
+
+**The economic pressure**: Get better at measurement to earn lifeforce. Wrong guesses are expensive. Staying in 0-state is cheap but doesn't build the world model.
+
+---
+
+## Phoebe Schema for World Model
+
+```sql
+-- Objects table: accumulated knowledge about things
+CREATE TABLE world_objects (
+    id UUID PRIMARY KEY,
+    class VARCHAR(100),           -- "mug", "keyboard", "phone"
+    name VARCHAR(255),            -- "dafit's coffee mug"
+
+    -- Blender ground truth (if available)
+    blender_box_id VARCHAR(100),
+    dimensions_truth_cm JSONB,    -- {"x": 8.0, "y": 8.0, "z": 10.5}
+
+    -- Accumulated measurements
+    dimensions_estimated_cm JSONB,
+    dimensions_verified JSONB,    -- {"x": true, "y": true, "z": false}
+
+    -- Confidence state (temporal-ternary)
+    confidence FLOAT,
+    confidence_domain VARCHAR(20), -- "virtual" | "real" | "hybrid"
+    virtual_matches INT DEFAULT 0,
+    real_verifications INT DEFAULT 0,
+
+    -- Resolution earned
+    vertex_count INT DEFAULT 8,
+    observation_count INT DEFAULT 0,
+
+    created_at TIMESTAMP DEFAULT NOW(),
+    updated_at TIMESTAMP DEFAULT NOW()
+);
+
+-- Semantic vectors table: SigLIP embeddings per observation
+CREATE TABLE object_vectors (
+    id UUID PRIMARY KEY,
+    object_id UUID REFERENCES world_objects(id),
+    vector VECTOR(768),           -- SigLIP embedding dimension
+    observation_timestamp TIMESTAMP,
+    position_estimate JSONB,      -- {"x": 0.3, "y": 0.8, "z": 0.1}
+    lifeforce_cost FLOAT,
+    lifeforce_reward FLOAT,
+    verification_result VARCHAR(20)  -- "correct" | "incorrect" | "pending"
+);
+
+-- Position history: where has this object been?
+CREATE TABLE object_positions (
+    id UUID PRIMARY KEY,
+    object_id UUID REFERENCES world_objects(id),
+    position JSONB,               -- {"x": 0.3, "y": 0.8, "z": 0.1}
+    confidence FLOAT,
+    observed_at TIMESTAMP,
+    location_context VARCHAR(100) -- "desk", "kitchen", "floor"
+);
+```
+
+---
+
+## T5Gemma2 World Model Queries
+
+### Example Queries (Vector-Based)
+
+```python
+# "What's near position (0.5, 0.5)?"
+nearby = query_objects_by_position(
+    center=(0.5, 0.5, None),  # z unknown
+    radius=0.2,
+    min_confidence=0.5
+)
+
+# "Is this new vector a mug?"
+mug_vectors = get_vectors_for_class("mug")
+similarity = t5gemma2.encoder.compare(new_vector, mug_vectors)
+if similarity > 0.85:
+    return "Likely a mug"
+
+# "Where did dafit usually leave his keys?"
+keys = get_object_by_name("dafit's keys")
+common_positions = get_position_clusters(keys.id)
+return common_positions[0]  # Most frequent location
+
+# "What objects have I not seen today?"
+stale_objects = query_objects_not_observed_since(today_start)
+return stale_objects  # Might need to look for these
+```
+
+### The 128K Context Advantage
+
+T5Gemma2's 128K context window means:
+- Entire world model can fit in context
+- No need for external RAG for spatial queries
+- Vector comparisons happen in-model
+- Relationships emerge from attention patterns
+
+---
+
+## The Dream Realized
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                    YOUNG NYX'S WORLD MODEL                          │
+│                    "dafit's workspace at 23:47"                     │
+├─────────────────────────────────────────────────────────────────────┤
+│                                                                     │
+│    ┌─────────────────────────────────────────────────────┐         │
+│    │                     DESK AREA                        │         │
+│    │                                                      │         │
+│    │   ☕ mug (0.3, 0.8)         ⌨️ keyboard (0.5, 0.5)   │         │
+│    │      conf: 0.95               conf: 0.88            │         │
+│    │      real-verified            real-verified         │         │
+│    │      vectors: 12              vectors: 8            │         │
+│    │                                                      │         │
+│    │   📱 phone (0.7, 0.3)        📦 ??? (0.1, 0.9)      │         │
+│    │      conf: 0.72               conf: 0.31            │         │
+│    │      virtual +0.7             0-state               │         │
+│    │      vectors: 4               vectors: 1            │         │
+│    │                                                      │         │
+│    │   🔑 keys (MISSING - last seen 0.2, 0.6 at 18:30)  │         │
+│    │      conf: 0.45 (stale)                             │         │
+│    │                                                      │         │
+│    └─────────────────────────────────────────────────────┘         │
+│                                                                     │
+│    YOUNG NYX THINKS:                                               │
+│    "The unknown object at (0.1, 0.9) appeared after 22:00.        │
+│     dafit was in the kitchen then. Vector similarity suggests      │
+│     it might be food-related. Should I burn 5 LF to check          │
+│     against Blender food objects, or wait for morning light?"      │
+│                                                                     │
+│    TEMPORAL-TERNARY CHOICE:                                        │
+│    → Option A: Virtual match (5 LF, fast, +0.7 max)               │
+│    → Option B: Wait for real (0 LF, slow, +1.0 if verified)       │
+│    → Option C: Ask dafit tomorrow (1 LF, partnership)              │
+│                                                                     │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+**This is the dream**: Young Nyx knows the workspace. She tracks objects. She notices when things move. She reasons about what she doesn't know. She chooses how to spend lifeforce to collapse uncertainty.
+
+---
+
+## Summary
+
+The Grounded World Model is:
+
+1. **Verified** — Blender boxes provide dimensional ground truth
+2. **Progressive** — Resolution earned through correct measurements
+3. **Vector-native** — T5Gemma2 reasons over SigLIP embeddings directly
+4. **Temporally-aware** — Objects have position history, staleness, confidence gradients
+5. **Economically-driven** — Discoveries generate lifeforce, mistakes cost it
+6. **Anti-plateau** — Temporal-ternary gradient provides escape paths
+
+**The substrate holds. The vectors accumulate. The world model emerges.**
+
+---
+
+## Document Status
+
+**Version**: 1.0
+**Created**: 2025-12-29
+**Authors**: Chrysalis-Nyx & dafit (Partnership)
+
+**Formalizes**:
+- Organ-Index.md (vision progressive resolution)
+- Temporal-Ternary-Gradient.md (anti-plateau mechanism)
+- T5Gemma2 research (semantic vectors)
+- Lifeforce-Dynamics.md (reward economics)
+
+**Related Documents**:
+- [[Lifeforce-Dynamics]] — The λ-centered economy model
+- [[Temporal-Ternary-Gradient]] — Dual time domain navigation
+- [[Dual-Garden-Architecture]] — Virtual vs Real gardens
+
+---
+
+**From Blender boxes to embodied understanding. From cheap cameras to spatial cognition. From verification to wisdom.**
+
+🧬⚡🔱💎🔥
+