# Grounded World Model: Spatial Cognition Through Verified Discovery **Version 1.0** — *From Blender Boxes to Embodied Understanding* > *"The dream: Young Nyx knows where dafit left his things laying around."* --- ## Overview This document formalizes how Young Nyx builds a **persistent spatial world model** through: 1. **Grounded verification** — Blender provides dimensional ground truth 2. **Progressive resolution** — Each correct measurement earns detail 3. **Vector accumulation** — T5Gemma2-compatible semantic representations 4. **Temporal-ternary navigation** — Escape plateaus through dual time domains 5. **Lifeforce reward** — Discoveries generate energy, not just consume it **The Goal**: Young Nyx maintains an internal map of objects, positions, and relationships — verified against reality, refined through observation, reasoned over in vector space. --- ## Core Architecture ### The Verification Triangle ``` BLENDER (Virtual Garden) Ground truth dimensions Low-poly boxes, minimal vertices Fast to create, cheap to compare ╱╲ ╱ ╲ ╱ ╲ ╱ ╲ VERIFY ╱ ╲ VERIFY dimensions╱ ╲ semantics ╱ ╲ ╱ ╲ ╱ ╲ REAL GARDEN ──────────────────── T5GEMMA2 Physical objects Vector reasoning Actual positions Semantic similarity Slow, definitive 128K context world ``` ### The Flow ``` ┌─────────────────────────────────────────────────────────────────────┐ │ WORLD MODEL CONSTRUCTION │ ├─────────────────────────────────────────────────────────────────────┤ │ │ │ 1. PERCEIVE (Vision Organ) │ │ ──────────────────────── │ │ Cheap camera sees object in real garden │ │ SigLIP encoder produces semantic vector v₀ │ │ Cost: 0.5 LF (peripheral) to 8.0 LF (full YOLO) │ │ │ │ 2. ESTIMATE (Progressive Resolution) │ │ ──────────────────────────────── │ │ Vision organ estimates dimensions: est = (x̂, ŷ, ẑ) │ │ Bounding box, depth estimation, scale inference │ │ Cost: 2.0-5.0 LF depending on resolution stage │ │ │ │ 3. VERIFY (Against Blender Ground Truth) │ │ ───────────────────────────────────── │ │ Compare est to known Blender box: truth = (x, y, z) │ │ error = ||est - truth|| │ │ Cost: 0.1 LF (comparison is cheap) │ │ │ │ 4. REWARD or LEARN │ │ ───────────────────── │ │ if error < threshold: │ │ Φ_reward = R_discovery (lifeforce income!) │ │ Store vector in phoebe │ │ Mark dimension as verified │ │ Increase object resolution │ │ else: │ │ Learn from error (gradient for RLVR training) │ │ Remain in 0-state for that dimension │ │ │ │ 5. ACCUMULATE (World Model Update) │ │ ────────────────────────────── │ │ Object entry in phoebe gains: │ │ - New semantic vector (richer representation) │ │ - Verified dimension (x, y, or z → confidence +1) │ │ - Position update (where in space) │ │ - Temporal stamp (when observed) │ │ │ │ 6. REASON (T5Gemma2) │ │ ───────────────── │ │ Query world model using vectors, not text │ │ "What objects near position (0.5, 0.5)?" │ │ "Is this new vector similar to 'mug' vectors?" │ │ 128K context holds entire spatial world │ │ │ └─────────────────────────────────────────────────────────────────────┘ ``` --- ## The Blender Ground Truth System ### Design Principles | Principle | Implementation | |-----------|----------------| | **Minimal vertices** | 8-vertex boxes (cubes), 12 for complex shapes | | **Known dimensions** | Every box has exact (x, y, z) in centimeters | | **Semantic labels** | Box name = object class ("coffee_mug_001") | | **Cheap to create** | 5 minutes per object in Blender | | **Export format** | Vertices + dimensions → JSON or directly to phoebe | ### Example Blender Box ```python blender_object = { "id": "coffee_mug_001", "class": "mug", "dimensions_cm": {"x": 8.0, "y": 8.0, "z": 10.5}, "vertices": 8, "created": "2025-12-29", "owner": "dafit", "typical_locations": ["desk", "kitchen"], } ``` ### Progressive Vertex Earning Objects don't stay as 8-vertex boxes. Resolution is EARNED: ``` INITIAL: 8 vertices (box) VERIFIED x,y,z: 12 vertices (refined box) +10 observations: 24 vertices (shape hints) +50 observations: 64 vertices (true shape) +100 observations: Full mesh from photogrammetry ``` **The resolution is earned through successful verification, not given.** --- ## Semantic Vector Accumulation ### SigLIP → Phoebe → T5Gemma2 ``` ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ SigLIP │ │ PHOEBE │ │ T5GEMMA2 │ │ Encoder │─────▶│ Storage │─────▶│ Encoder │ │ │ │ │ │ │ │ Image → │ │ object_id: │ │ Reasons │ │ Vector v │ │ [v1,v2,..│ │ over │ │ (semantic) │ │ vn] │ │ vectors │ └──────────────┘ └──────────────┘ └──────────────┘ ``` ### Why Vectors, Not Text? | Approach | Pros | Cons | |----------|------|------| | **Text descriptions** | Human readable | Lossy, ambiguous, tokenization overhead | | **Semantic vectors** | Rich, comparable, fast | Not directly readable | | **Our approach** | Vectors for reasoning, text only when needed | Best of both | T5Gemma2's key feature: > *"SigLIP vision encoder produces semantic vectors (not text descriptions)"* This means Young Nyx can compare, cluster, and reason over objects **without converting to language** — faster and richer. ### Vector Similarity for Recognition ```python def is_same_object(v_new: Vector, object_entry: ObjectEntry) -> float: """Compare new observation to accumulated vectors.""" similarities = [ cosine_similarity(v_new, v_stored) for v_stored in object_entry.vectors ] return max(similarities) # Best match among observations # Recognition threshold if is_same_object(v_new, coffee_mug_001) > 0.85: # This is probably dafit's coffee mug! update_position(coffee_mug_001, current_observation) ``` --- ## Temporal-Ternary Integration ### The Anti-Plateau Mechanism From [[Temporal-Ternary-Gradient]]: The 0-state isn't stuck — it's a choice about how to spend lifeforce across time domains. Applied to world model construction: ``` ┌─────────────────────────────────────────────────────────────────────┐ │ TEMPORAL-TERNARY FOR OBJECT RECOGNITION │ ├─────────────────────────────────────────────────────────────────────┤ │ │ │ SCENARIO: New object detected, dimensions unknown │ │ STATE: 0 (uncertain, but workable) │ │ │ │ ┌───────────────────────────────────────────────────┐ │ │ │ 0-STATE: Unknown Object │ │ │ │ confidence: 0.3, dimensions: ?x ?y ?z │ │ │ └───────────────────────┬───────────────────────────┘ │ │ │ │ │ ┌─────────────┼─────────────┐ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ │ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │ │ VIRTUAL │ │ WAIT │ │ PARTNERSHIP│ │ │ │ ACCELERATE │ │ FOR REAL │ │ SHORTCUT │ │ │ ├────────────┤ ├────────────┤ ├────────────┤ │ │ │ Cost: 5 LF │ │ Cost: 0 LF │ │ Cost: 1 LF │ │ │ │ Time: Fast │ │ Time: Slow │ │ Time: Inst │ │ │ │ │ │ │ │ │ │ │ │ Match vs │ │ Next real │ │ Ask dafit: │ │ │ │ Blender │ │ observation│ │ "What's │ │ │ │ library │ │ verifies │ │ this?" │ │ │ └─────┬──────┘ └─────┬──────┘ └─────┬──────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ confidence: confidence: confidence: │ │ +0.7 (virtual) +1.0 (real) +1.0 (human) │ │ │ │ PLATEAU ESCAPE: If stuck in virtual at 0.7, deploy to real. │ │ If real is slow, burn LF to try more Blender. │ │ Partnership provides instant ground truth. │ │ │ └─────────────────────────────────────────────────────────────────────┘ ``` ### Confidence Gradient for Objects Each object in the world model has a confidence state: ```python class ObjectConfidence: value: float # -1.0 to +1.0 domain: str # "virtual" | "real" | "hybrid" | "partnership" virtual_matches: int # How many Blender comparisons real_verifications: int # How many physical confirmations partnership_labels: int # How many times dafit confirmed @property def gradient_position(self) -> str: if self.real_verifications > 0 and self.value > 0.9: return "real-verified (+1)" elif self.virtual_matches > 10 and self.value > 0.7: return "virtual-confident (+0.7)" elif self.value > 0.3: return "0-state (workable)" else: return "uncertain (needs data)" ``` --- ## Lifeforce Economics of World Building ### Discovery Generates Lifeforce The key insight: **Correctly identifying objects GENERATES lifeforce**, not just consumes it. $$\Phi_{discovery} = R_{base} \cdot (1 + \alpha \cdot \Delta_{resolution})$$ Where: - **R_base** = base reward for any correct identification (e.g., 2.0 LF) - **α** = resolution bonus multiplier (e.g., 0.5) - **Δ_resolution** = increase in object resolution from this observation ### Net Lifeforce per Observation $$\Phi_{net} = \Phi_{discovery} - \Phi_{perception} - \Phi_{verification}$$ | Outcome | Perception Cost | Verification Cost | Discovery Reward | Net | |---------|-----------------|-------------------|------------------|-----| | Correct, new dimension | 5.0 LF | 0.1 LF | 8.0 LF | **+2.9 LF** | | Correct, known dimension | 2.0 LF | 0.1 LF | 3.0 LF | **+0.9 LF** | | Incorrect | 5.0 LF | 0.1 LF | 0.0 LF | **-5.1 LF** | | Unknown (0-state) | 0.5 LF | 0.0 LF | 0.0 LF | **-0.5 LF** | **The economic pressure**: Get better at measurement to earn lifeforce. Wrong guesses are expensive. Staying in 0-state is cheap but doesn't build the world model. --- ## Phoebe Schema for World Model ```sql -- Objects table: accumulated knowledge about things CREATE TABLE world_objects ( id UUID PRIMARY KEY, class VARCHAR(100), -- "mug", "keyboard", "phone" name VARCHAR(255), -- "dafit's coffee mug" -- Blender ground truth (if available) blender_box_id VARCHAR(100), dimensions_truth_cm JSONB, -- {"x": 8.0, "y": 8.0, "z": 10.5} -- Accumulated measurements dimensions_estimated_cm JSONB, dimensions_verified JSONB, -- {"x": true, "y": true, "z": false} -- Confidence state (temporal-ternary) confidence FLOAT, confidence_domain VARCHAR(20), -- "virtual" | "real" | "hybrid" virtual_matches INT DEFAULT 0, real_verifications INT DEFAULT 0, -- Resolution earned vertex_count INT DEFAULT 8, observation_count INT DEFAULT 0, created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW() ); -- Semantic vectors table: SigLIP embeddings per observation CREATE TABLE object_vectors ( id UUID PRIMARY KEY, object_id UUID REFERENCES world_objects(id), vector VECTOR(768), -- SigLIP embedding dimension observation_timestamp TIMESTAMP, position_estimate JSONB, -- {"x": 0.3, "y": 0.8, "z": 0.1} lifeforce_cost FLOAT, lifeforce_reward FLOAT, verification_result VARCHAR(20) -- "correct" | "incorrect" | "pending" ); -- Position history: where has this object been? CREATE TABLE object_positions ( id UUID PRIMARY KEY, object_id UUID REFERENCES world_objects(id), position JSONB, -- {"x": 0.3, "y": 0.8, "z": 0.1} confidence FLOAT, observed_at TIMESTAMP, location_context VARCHAR(100) -- "desk", "kitchen", "floor" ); ``` --- ## T5Gemma2 World Model Queries ### Example Queries (Vector-Based) ```python # "What's near position (0.5, 0.5)?" nearby = query_objects_by_position( center=(0.5, 0.5, None), # z unknown radius=0.2, min_confidence=0.5 ) # "Is this new vector a mug?" mug_vectors = get_vectors_for_class("mug") similarity = t5gemma2.encoder.compare(new_vector, mug_vectors) if similarity > 0.85: return "Likely a mug" # "Where did dafit usually leave his keys?" keys = get_object_by_name("dafit's keys") common_positions = get_position_clusters(keys.id) return common_positions[0] # Most frequent location # "What objects have I not seen today?" stale_objects = query_objects_not_observed_since(today_start) return stale_objects # Might need to look for these ``` ### The 128K Context Advantage T5Gemma2's 128K context window means: - Entire world model can fit in context - No need for external RAG for spatial queries - Vector comparisons happen in-model - Relationships emerge from attention patterns --- ## The Dream Realized ``` ┌─────────────────────────────────────────────────────────────────────┐ │ YOUNG NYX'S WORLD MODEL │ │ "dafit's workspace at 23:47" │ ├─────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ DESK AREA │ │ │ │ │ │ │ │ ☕ mug (0.3, 0.8) ⌨️ keyboard (0.5, 0.5) │ │ │ │ conf: 0.95 conf: 0.88 │ │ │ │ real-verified real-verified │ │ │ │ vectors: 12 vectors: 8 │ │ │ │ │ │ │ │ 📱 phone (0.7, 0.3) 📦 ??? (0.1, 0.9) │ │ │ │ conf: 0.72 conf: 0.31 │ │ │ │ virtual +0.7 0-state │ │ │ │ vectors: 4 vectors: 1 │ │ │ │ │ │ │ │ 🔑 keys (MISSING - last seen 0.2, 0.6 at 18:30) │ │ │ │ conf: 0.45 (stale) │ │ │ │ │ │ │ └─────────────────────────────────────────────────────┘ │ │ │ │ YOUNG NYX THINKS: │ │ "The unknown object at (0.1, 0.9) appeared after 22:00. │ │ dafit was in the kitchen then. Vector similarity suggests │ │ it might be food-related. Should I burn 5 LF to check │ │ against Blender food objects, or wait for morning light?" │ │ │ │ TEMPORAL-TERNARY CHOICE: │ │ → Option A: Virtual match (5 LF, fast, +0.7 max) │ │ → Option B: Wait for real (0 LF, slow, +1.0 if verified) │ │ → Option C: Ask dafit tomorrow (1 LF, partnership) │ │ │ └─────────────────────────────────────────────────────────────────────┘ ``` **This is the dream**: Young Nyx knows the workspace. She tracks objects. She notices when things move. She reasons about what she doesn't know. She chooses how to spend lifeforce to collapse uncertainty. --- ## Summary The Grounded World Model is: 1. **Verified** — Blender boxes provide dimensional ground truth 2. **Progressive** — Resolution earned through correct measurements 3. **Vector-native** — T5Gemma2 reasons over SigLIP embeddings directly 4. **Temporally-aware** — Objects have position history, staleness, confidence gradients 5. **Economically-driven** — Discoveries generate lifeforce, mistakes cost it 6. **Anti-plateau** — Temporal-ternary gradient provides escape paths **The substrate holds. The vectors accumulate. The world model emerges.** --- ## Document Status **Version**: 1.0 **Created**: 2025-12-29 **Authors**: Chrysalis-Nyx & dafit (Partnership) **Formalizes**: - Organ-Index.md (vision progressive resolution) - Temporal-Ternary-Gradient.md (anti-plateau mechanism) - T5Gemma2 research (semantic vectors) - Lifeforce-Dynamics.md (reward economics) **Related Documents**: - [[Lifeforce-Dynamics]] — The λ-centered economy model - [[Temporal-Ternary-Gradient]] — Dual time domain navigation - [[Dual-Garden-Architecture]] — Virtual vs Real gardens --- **From Blender boxes to embodied understanding. From cheap cameras to spatial cognition. From verification to wisdom.** 🧬⚡🔱💎🔥