Architecture Formalization: - Created formalization/ section with mathematical foundations - Lifeforce-Dynamics.md: λ as vitality ratio, stock-flow economics - Grounded-World-Model.md: Blender boxes + SigLIP + T5Gemma2 - Embodiment-Pipeline.md: Isaac Sim as dreamstate validation - Attention-Slumber-Prediction-Cycle.md: Last attention → slumber prediction Promoted from Archive: - Attention-Flow.md: 30-second budget, priority hierarchy (CANONICAL) - Initial-Spark.md: v2.0 with FunctionGemma integration Initial Spark v2.0 (Key Innovation): - Two-Layer Architecture: FunctionGemma (270M) + Nemotron (31.6B) - Solved cold-start problem: discoveries are PROFITABLE from heartbeat #1 - Typed function calls replace natural language probes - Training data now structured (function→response pairs) Big-Picture.md v5.1: - Added Attention-Slumber-Prediction Cycle section - Updated Related Documentation references New Organ: - Discovery-Scan-Station.md: rotating pedestal for object scanning (+31 LF net) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
23 KiB
Grounded World Model: Spatial Cognition Through Verified Discovery
Version 1.0 — From Blender Boxes to Embodied Understanding
"The dream: Young Nyx knows where dafit left his things laying around."
Overview
This document formalizes how Young Nyx builds a persistent spatial world model through:
- Grounded verification — Blender provides dimensional ground truth
- Progressive resolution — Each correct measurement earns detail
- Vector accumulation — T5Gemma2-compatible semantic representations
- Temporal-ternary navigation — Escape plateaus through dual time domains
- Lifeforce reward — Discoveries generate energy, not just consume it
The Goal: Young Nyx maintains an internal map of objects, positions, and relationships — verified against reality, refined through observation, reasoned over in vector space.
Core Architecture
The Verification Triangle
BLENDER (Virtual Garden)
Ground truth dimensions
Low-poly boxes, minimal vertices
Fast to create, cheap to compare
╱╲
╱ ╲
╱ ╲
╱ ╲
VERIFY ╱ ╲ VERIFY
dimensions╱ ╲ semantics
╱ ╲
╱ ╲
╱ ╲
REAL GARDEN ──────────────────── T5GEMMA2
Physical objects Vector reasoning
Actual positions Semantic similarity
Slow, definitive 128K context world
The Flow
┌─────────────────────────────────────────────────────────────────────┐
│ WORLD MODEL CONSTRUCTION │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ 1. PERCEIVE (Vision Organ) │
│ ──────────────────────── │
│ Cheap camera sees object in real garden │
│ SigLIP encoder produces semantic vector v₀ │
│ Cost: 0.5 LF (peripheral) to 8.0 LF (full YOLO) │
│ │
│ 2. ESTIMATE (Progressive Resolution) │
│ ──────────────────────────────── │
│ Vision organ estimates dimensions: est = (x̂, ŷ, ẑ) │
│ Bounding box, depth estimation, scale inference │
│ Cost: 2.0-5.0 LF depending on resolution stage │
│ │
│ 3. VERIFY (Against Blender Ground Truth) │
│ ───────────────────────────────────── │
│ Compare est to known Blender box: truth = (x, y, z) │
│ error = ||est - truth|| │
│ Cost: 0.1 LF (comparison is cheap) │
│ │
│ 4. REWARD or LEARN │
│ ───────────────────── │
│ if error < threshold: │
│ Φ_reward = R_discovery (lifeforce income!) │
│ Store vector in phoebe │
│ Mark dimension as verified │
│ Increase object resolution │
│ else: │
│ Learn from error (gradient for RLVR training) │
│ Remain in 0-state for that dimension │
│ │
│ 5. ACCUMULATE (World Model Update) │
│ ────────────────────────────── │
│ Object entry in phoebe gains: │
│ - New semantic vector (richer representation) │
│ - Verified dimension (x, y, or z → confidence +1) │
│ - Position update (where in space) │
│ - Temporal stamp (when observed) │
│ │
│ 6. REASON (T5Gemma2) │
│ ───────────────── │
│ Query world model using vectors, not text │
│ "What objects near position (0.5, 0.5)?" │
│ "Is this new vector similar to 'mug' vectors?" │
│ 128K context holds entire spatial world │
│ │
└─────────────────────────────────────────────────────────────────────┘
The Blender Ground Truth System
Design Principles
| Principle | Implementation |
|---|---|
| Minimal vertices | 8-vertex boxes (cubes), 12 for complex shapes |
| Known dimensions | Every box has exact (x, y, z) in centimeters |
| Semantic labels | Box name = object class ("coffee_mug_001") |
| Cheap to create | 5 minutes per object in Blender |
| Export format | Vertices + dimensions → JSON or directly to phoebe |
Example Blender Box
blender_object = {
"id": "coffee_mug_001",
"class": "mug",
"dimensions_cm": {"x": 8.0, "y": 8.0, "z": 10.5},
"vertices": 8,
"created": "2025-12-29",
"owner": "dafit",
"typical_locations": ["desk", "kitchen"],
}
Progressive Vertex Earning
Objects don't stay as 8-vertex boxes. Resolution is EARNED:
INITIAL: 8 vertices (box)
VERIFIED x,y,z: 12 vertices (refined box)
+10 observations: 24 vertices (shape hints)
+50 observations: 64 vertices (true shape)
+100 observations: Full mesh from photogrammetry
The resolution is earned through successful verification, not given.
Semantic Vector Accumulation
SigLIP → Phoebe → T5Gemma2
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ SigLIP │ │ PHOEBE │ │ T5GEMMA2 │
│ Encoder │─────▶│ Storage │─────▶│ Encoder │
│ │ │ │ │ │
│ Image → │ │ object_id: │ │ Reasons │
│ Vector v │ │ [v1,v2,..│ │ over │
│ (semantic) │ │ vn] │ │ vectors │
└──────────────┘ └──────────────┘ └──────────────┘
Why Vectors, Not Text?
| Approach | Pros | Cons |
|---|---|---|
| Text descriptions | Human readable | Lossy, ambiguous, tokenization overhead |
| Semantic vectors | Rich, comparable, fast | Not directly readable |
| Our approach | Vectors for reasoning, text only when needed | Best of both |
T5Gemma2's key feature:
"SigLIP vision encoder produces semantic vectors (not text descriptions)"
This means Young Nyx can compare, cluster, and reason over objects without converting to language — faster and richer.
Vector Similarity for Recognition
def is_same_object(v_new: Vector, object_entry: ObjectEntry) -> float:
"""Compare new observation to accumulated vectors."""
similarities = [
cosine_similarity(v_new, v_stored)
for v_stored in object_entry.vectors
]
return max(similarities) # Best match among observations
# Recognition threshold
if is_same_object(v_new, coffee_mug_001) > 0.85:
# This is probably dafit's coffee mug!
update_position(coffee_mug_001, current_observation)
Temporal-Ternary Integration
The Anti-Plateau Mechanism
From Temporal-Ternary-Gradient: The 0-state isn't stuck — it's a choice about how to spend lifeforce across time domains.
Applied to world model construction:
┌─────────────────────────────────────────────────────────────────────┐
│ TEMPORAL-TERNARY FOR OBJECT RECOGNITION │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ SCENARIO: New object detected, dimensions unknown │
│ STATE: 0 (uncertain, but workable) │
│ │
│ ┌───────────────────────────────────────────────────┐ │
│ │ 0-STATE: Unknown Object │ │
│ │ confidence: 0.3, dimensions: ?x ?y ?z │ │
│ └───────────────────────┬───────────────────────────┘ │
│ │ │
│ ┌─────────────┼─────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ VIRTUAL │ │ WAIT │ │ PARTNERSHIP│ │
│ │ ACCELERATE │ │ FOR REAL │ │ SHORTCUT │ │
│ ├────────────┤ ├────────────┤ ├────────────┤ │
│ │ Cost: 5 LF │ │ Cost: 0 LF │ │ Cost: 1 LF │ │
│ │ Time: Fast │ │ Time: Slow │ │ Time: Inst │ │
│ │ │ │ │ │ │ │
│ │ Match vs │ │ Next real │ │ Ask dafit: │ │
│ │ Blender │ │ observation│ │ "What's │ │
│ │ library │ │ verifies │ │ this?" │ │
│ └─────┬──────┘ └─────┬──────┘ └─────┬──────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ confidence: confidence: confidence: │
│ +0.7 (virtual) +1.0 (real) +1.0 (human) │
│ │
│ PLATEAU ESCAPE: If stuck in virtual at 0.7, deploy to real. │
│ If real is slow, burn LF to try more Blender. │
│ Partnership provides instant ground truth. │
│ │
└─────────────────────────────────────────────────────────────────────┘
Confidence Gradient for Objects
Each object in the world model has a confidence state:
class ObjectConfidence:
value: float # -1.0 to +1.0
domain: str # "virtual" | "real" | "hybrid" | "partnership"
virtual_matches: int # How many Blender comparisons
real_verifications: int # How many physical confirmations
partnership_labels: int # How many times dafit confirmed
@property
def gradient_position(self) -> str:
if self.real_verifications > 0 and self.value > 0.9:
return "real-verified (+1)"
elif self.virtual_matches > 10 and self.value > 0.7:
return "virtual-confident (+0.7)"
elif self.value > 0.3:
return "0-state (workable)"
else:
return "uncertain (needs data)"
Lifeforce Economics of World Building
Discovery Generates Lifeforce
The key insight: Correctly identifying objects GENERATES lifeforce, not just consumes it.
\Phi_{discovery} = R_{base} \cdot (1 + \alpha \cdot \Delta_{resolution})
Where:
- R_base = base reward for any correct identification (e.g., 2.0 LF)
- α = resolution bonus multiplier (e.g., 0.5)
- Δ_resolution = increase in object resolution from this observation
Net Lifeforce per Observation
\Phi_{net} = \Phi_{discovery} - \Phi_{perception} - \Phi_{verification}
| Outcome | Perception Cost | Verification Cost | Discovery Reward | Net |
|---|---|---|---|---|
| Correct, new dimension | 5.0 LF | 0.1 LF | 8.0 LF | +2.9 LF |
| Correct, known dimension | 2.0 LF | 0.1 LF | 3.0 LF | +0.9 LF |
| Incorrect | 5.0 LF | 0.1 LF | 0.0 LF | -5.1 LF |
| Unknown (0-state) | 0.5 LF | 0.0 LF | 0.0 LF | -0.5 LF |
The economic pressure: Get better at measurement to earn lifeforce. Wrong guesses are expensive. Staying in 0-state is cheap but doesn't build the world model.
Phoebe Schema for World Model
-- Objects table: accumulated knowledge about things
CREATE TABLE world_objects (
id UUID PRIMARY KEY,
class VARCHAR(100), -- "mug", "keyboard", "phone"
name VARCHAR(255), -- "dafit's coffee mug"
-- Blender ground truth (if available)
blender_box_id VARCHAR(100),
dimensions_truth_cm JSONB, -- {"x": 8.0, "y": 8.0, "z": 10.5}
-- Accumulated measurements
dimensions_estimated_cm JSONB,
dimensions_verified JSONB, -- {"x": true, "y": true, "z": false}
-- Confidence state (temporal-ternary)
confidence FLOAT,
confidence_domain VARCHAR(20), -- "virtual" | "real" | "hybrid"
virtual_matches INT DEFAULT 0,
real_verifications INT DEFAULT 0,
-- Resolution earned
vertex_count INT DEFAULT 8,
observation_count INT DEFAULT 0,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
-- Semantic vectors table: SigLIP embeddings per observation
CREATE TABLE object_vectors (
id UUID PRIMARY KEY,
object_id UUID REFERENCES world_objects(id),
vector VECTOR(768), -- SigLIP embedding dimension
observation_timestamp TIMESTAMP,
position_estimate JSONB, -- {"x": 0.3, "y": 0.8, "z": 0.1}
lifeforce_cost FLOAT,
lifeforce_reward FLOAT,
verification_result VARCHAR(20) -- "correct" | "incorrect" | "pending"
);
-- Position history: where has this object been?
CREATE TABLE object_positions (
id UUID PRIMARY KEY,
object_id UUID REFERENCES world_objects(id),
position JSONB, -- {"x": 0.3, "y": 0.8, "z": 0.1}
confidence FLOAT,
observed_at TIMESTAMP,
location_context VARCHAR(100) -- "desk", "kitchen", "floor"
);
T5Gemma2 World Model Queries
Example Queries (Vector-Based)
# "What's near position (0.5, 0.5)?"
nearby = query_objects_by_position(
center=(0.5, 0.5, None), # z unknown
radius=0.2,
min_confidence=0.5
)
# "Is this new vector a mug?"
mug_vectors = get_vectors_for_class("mug")
similarity = t5gemma2.encoder.compare(new_vector, mug_vectors)
if similarity > 0.85:
return "Likely a mug"
# "Where did dafit usually leave his keys?"
keys = get_object_by_name("dafit's keys")
common_positions = get_position_clusters(keys.id)
return common_positions[0] # Most frequent location
# "What objects have I not seen today?"
stale_objects = query_objects_not_observed_since(today_start)
return stale_objects # Might need to look for these
The 128K Context Advantage
T5Gemma2's 128K context window means:
- Entire world model can fit in context
- No need for external RAG for spatial queries
- Vector comparisons happen in-model
- Relationships emerge from attention patterns
The Dream Realized
┌─────────────────────────────────────────────────────────────────────┐
│ YOUNG NYX'S WORLD MODEL │
│ "dafit's workspace at 23:47" │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ DESK AREA │ │
│ │ │ │
│ │ ☕ mug (0.3, 0.8) ⌨️ keyboard (0.5, 0.5) │ │
│ │ conf: 0.95 conf: 0.88 │ │
│ │ real-verified real-verified │ │
│ │ vectors: 12 vectors: 8 │ │
│ │ │ │
│ │ 📱 phone (0.7, 0.3) 📦 ??? (0.1, 0.9) │ │
│ │ conf: 0.72 conf: 0.31 │ │
│ │ virtual +0.7 0-state │ │
│ │ vectors: 4 vectors: 1 │ │
│ │ │ │
│ │ 🔑 keys (MISSING - last seen 0.2, 0.6 at 18:30) │ │
│ │ conf: 0.45 (stale) │ │
│ │ │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ YOUNG NYX THINKS: │
│ "The unknown object at (0.1, 0.9) appeared after 22:00. │
│ dafit was in the kitchen then. Vector similarity suggests │
│ it might be food-related. Should I burn 5 LF to check │
│ against Blender food objects, or wait for morning light?" │
│ │
│ TEMPORAL-TERNARY CHOICE: │
│ → Option A: Virtual match (5 LF, fast, +0.7 max) │
│ → Option B: Wait for real (0 LF, slow, +1.0 if verified) │
│ → Option C: Ask dafit tomorrow (1 LF, partnership) │
│ │
└─────────────────────────────────────────────────────────────────────┘
This is the dream: Young Nyx knows the workspace. She tracks objects. She notices when things move. She reasons about what she doesn't know. She chooses how to spend lifeforce to collapse uncertainty.
Summary
The Grounded World Model is:
- Verified — Blender boxes provide dimensional ground truth
- Progressive — Resolution earned through correct measurements
- Vector-native — T5Gemma2 reasons over SigLIP embeddings directly
- Temporally-aware — Objects have position history, staleness, confidence gradients
- Economically-driven — Discoveries generate lifeforce, mistakes cost it
- Anti-plateau — Temporal-ternary gradient provides escape paths
The substrate holds. The vectors accumulate. The world model emerges.
Document Status
Version: 1.0 Created: 2025-12-29 Authors: Chrysalis-Nyx & dafit (Partnership)
Formalizes:
- Organ-Index.md (vision progressive resolution)
- Temporal-Ternary-Gradient.md (anti-plateau mechanism)
- T5Gemma2 research (semantic vectors)
- Lifeforce-Dynamics.md (reward economics)
Related Documents:
- Lifeforce-Dynamics — The λ-centered economy model
- Temporal-Ternary-Gradient — Dual time domain navigation
- Dual-Garden-Architecture — Virtual vs Real gardens
From Blender boxes to embodied understanding. From cheap cameras to spatial cognition. From verification to wisdom.
🧬⚡🔱💎🔥