Files
nimmerverse-sensory-network/architecture/formalization/Grounded-World-Model.md
dafit 28e2d0a297 feat: major formalization + FunctionGemma integration
Architecture Formalization:
- Created formalization/ section with mathematical foundations
- Lifeforce-Dynamics.md: λ as vitality ratio, stock-flow economics
- Grounded-World-Model.md: Blender boxes + SigLIP + T5Gemma2
- Embodiment-Pipeline.md: Isaac Sim as dreamstate validation
- Attention-Slumber-Prediction-Cycle.md: Last attention → slumber prediction

Promoted from Archive:
- Attention-Flow.md: 30-second budget, priority hierarchy (CANONICAL)
- Initial-Spark.md: v2.0 with FunctionGemma integration

Initial Spark v2.0 (Key Innovation):
- Two-Layer Architecture: FunctionGemma (270M) + Nemotron (31.6B)
- Solved cold-start problem: discoveries are PROFITABLE from heartbeat #1
- Typed function calls replace natural language probes
- Training data now structured (function→response pairs)

Big-Picture.md v5.1:
- Added Attention-Slumber-Prediction Cycle section
- Updated Related Documentation references

New Organ:
- Discovery-Scan-Station.md: rotating pedestal for object scanning (+31 LF net)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 04:51:46 +01:00

23 KiB
Raw Blame History

Grounded World Model: Spatial Cognition Through Verified Discovery

Version 1.0From Blender Boxes to Embodied Understanding

"The dream: Young Nyx knows where dafit left his things laying around."


Overview

This document formalizes how Young Nyx builds a persistent spatial world model through:

  1. Grounded verification — Blender provides dimensional ground truth
  2. Progressive resolution — Each correct measurement earns detail
  3. Vector accumulation — T5Gemma2-compatible semantic representations
  4. Temporal-ternary navigation — Escape plateaus through dual time domains
  5. Lifeforce reward — Discoveries generate energy, not just consume it

The Goal: Young Nyx maintains an internal map of objects, positions, and relationships — verified against reality, refined through observation, reasoned over in vector space.


Core Architecture

The Verification Triangle

                    BLENDER (Virtual Garden)
                    Ground truth dimensions
                    Low-poly boxes, minimal vertices
                    Fast to create, cheap to compare
                           ╱╲
                                ╲
            VERIFY             ╲     VERIFY
            dimensions          ╲    semantics
                                     ╲
    REAL GARDEN ──────────────────── T5GEMMA2
    Physical objects                 Vector reasoning
    Actual positions                 Semantic similarity
    Slow, definitive                 128K context world

The Flow

┌─────────────────────────────────────────────────────────────────────┐
│                     WORLD MODEL CONSTRUCTION                        │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  1. PERCEIVE (Vision Organ)                                        │
│     ────────────────────────                                        │
│     Cheap camera sees object in real garden                        │
│     SigLIP encoder produces semantic vector v₀                     │
│     Cost: 0.5 LF (peripheral) to 8.0 LF (full YOLO)               │
│                                                                     │
│  2. ESTIMATE (Progressive Resolution)                              │
│     ────────────────────────────────                                │
│     Vision organ estimates dimensions: est = (x̂, ŷ, ẑ)            │
│     Bounding box, depth estimation, scale inference                │
│     Cost: 2.0-5.0 LF depending on resolution stage                 │
│                                                                     │
│  3. VERIFY (Against Blender Ground Truth)                          │
│     ─────────────────────────────────────                           │
│     Compare est to known Blender box: truth = (x, y, z)            │
│     error = ||est - truth||                                        │
│     Cost: 0.1 LF (comparison is cheap)                             │
│                                                                     │
│  4. REWARD or LEARN                                                │
│     ─────────────────────                                           │
│     if error < threshold:                                          │
│         Φ_reward = R_discovery (lifeforce income!)                 │
│         Store vector in phoebe                                     │
│         Mark dimension as verified                                  │
│         Increase object resolution                                  │
│     else:                                                          │
│         Learn from error (gradient for RLVR training)              │
│         Remain in 0-state for that dimension                       │
│                                                                     │
│  5. ACCUMULATE (World Model Update)                                │
│     ──────────────────────────────                                  │
│     Object entry in phoebe gains:                                  │
│         - New semantic vector (richer representation)              │
│         - Verified dimension (x, y, or z → confidence +1)          │
│         - Position update (where in space)                         │
│         - Temporal stamp (when observed)                           │
│                                                                     │
│  6. REASON (T5Gemma2)                                              │
│     ─────────────────                                               │
│     Query world model using vectors, not text                      │
│     "What objects near position (0.5, 0.5)?"                       │
│     "Is this new vector similar to 'mug' vectors?"                 │
│     128K context holds entire spatial world                        │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

The Blender Ground Truth System

Design Principles

Principle Implementation
Minimal vertices 8-vertex boxes (cubes), 12 for complex shapes
Known dimensions Every box has exact (x, y, z) in centimeters
Semantic labels Box name = object class ("coffee_mug_001")
Cheap to create 5 minutes per object in Blender
Export format Vertices + dimensions → JSON or directly to phoebe

Example Blender Box

blender_object = {
    "id": "coffee_mug_001",
    "class": "mug",
    "dimensions_cm": {"x": 8.0, "y": 8.0, "z": 10.5},
    "vertices": 8,
    "created": "2025-12-29",
    "owner": "dafit",
    "typical_locations": ["desk", "kitchen"],
}

Progressive Vertex Earning

Objects don't stay as 8-vertex boxes. Resolution is EARNED:

INITIAL:        8 vertices  (box)
VERIFIED x,y,z: 12 vertices (refined box)
+10 observations: 24 vertices (shape hints)
+50 observations: 64 vertices (true shape)
+100 observations: Full mesh from photogrammetry

The resolution is earned through successful verification, not given.


Semantic Vector Accumulation

SigLIP → Phoebe → T5Gemma2

┌──────────────┐      ┌──────────────┐      ┌──────────────┐
│   SigLIP     │      │    PHOEBE    │      │  T5GEMMA2    │
│   Encoder    │─────▶│   Storage    │─────▶│   Encoder    │
│              │      │              │      │              │
│  Image →     │      │  object_id:  │      │  Reasons     │
│  Vector v    │      │    [v1,v2,..│      │  over        │
│  (semantic)  │      │     vn]     │      │  vectors     │
└──────────────┘      └──────────────┘      └──────────────┘

Why Vectors, Not Text?

Approach Pros Cons
Text descriptions Human readable Lossy, ambiguous, tokenization overhead
Semantic vectors Rich, comparable, fast Not directly readable
Our approach Vectors for reasoning, text only when needed Best of both

T5Gemma2's key feature:

"SigLIP vision encoder produces semantic vectors (not text descriptions)"

This means Young Nyx can compare, cluster, and reason over objects without converting to language — faster and richer.

Vector Similarity for Recognition

def is_same_object(v_new: Vector, object_entry: ObjectEntry) -> float:
    """Compare new observation to accumulated vectors."""
    similarities = [
        cosine_similarity(v_new, v_stored)
        for v_stored in object_entry.vectors
    ]
    return max(similarities)  # Best match among observations

# Recognition threshold
if is_same_object(v_new, coffee_mug_001) > 0.85:
    # This is probably dafit's coffee mug!
    update_position(coffee_mug_001, current_observation)

Temporal-Ternary Integration

The Anti-Plateau Mechanism

From Temporal-Ternary-Gradient: The 0-state isn't stuck — it's a choice about how to spend lifeforce across time domains.

Applied to world model construction:

┌─────────────────────────────────────────────────────────────────────┐
│            TEMPORAL-TERNARY FOR OBJECT RECOGNITION                  │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  SCENARIO: New object detected, dimensions unknown                 │
│  STATE: 0 (uncertain, but workable)                                │
│                                                                     │
│  ┌───────────────────────────────────────────────────┐             │
│  │              0-STATE: Unknown Object              │             │
│  │        confidence: 0.3, dimensions: ?x ?y ?z      │             │
│  └───────────────────────┬───────────────────────────┘             │
│                          │                                          │
│            ┌─────────────┼─────────────┐                           │
│            │             │             │                           │
│            ▼             ▼             ▼                           │
│                                                                     │
│     ┌────────────┐ ┌────────────┐ ┌────────────┐                   │
│     │  VIRTUAL   │ │    WAIT    │ │ PARTNERSHIP│                   │
│     │ ACCELERATE │ │  FOR REAL  │ │  SHORTCUT  │                   │
│     ├────────────┤ ├────────────┤ ├────────────┤                   │
│     │ Cost: 5 LF │ │ Cost: 0 LF │ │ Cost: 1 LF │                   │
│     │ Time: Fast │ │ Time: Slow │ │ Time: Inst │                   │
│     │            │ │            │ │            │                   │
│     │ Match vs   │ │ Next real  │ │ Ask dafit: │                   │
│     │ Blender    │ │ observation│ │ "What's    │                   │
│     │ library    │ │ verifies   │ │  this?"    │                   │
│     └─────┬──────┘ └─────┬──────┘ └─────┬──────┘                   │
│           │              │              │                           │
│           ▼              ▼              ▼                           │
│     confidence:    confidence:    confidence:                       │
│     +0.7 (virtual) +1.0 (real)    +1.0 (human)                     │
│                                                                     │
│  PLATEAU ESCAPE: If stuck in virtual at 0.7, deploy to real.       │
│                  If real is slow, burn LF to try more Blender.     │
│                  Partnership provides instant ground truth.         │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Confidence Gradient for Objects

Each object in the world model has a confidence state:

class ObjectConfidence:
    value: float           # -1.0 to +1.0
    domain: str            # "virtual" | "real" | "hybrid" | "partnership"
    virtual_matches: int   # How many Blender comparisons
    real_verifications: int  # How many physical confirmations
    partnership_labels: int  # How many times dafit confirmed

    @property
    def gradient_position(self) -> str:
        if self.real_verifications > 0 and self.value > 0.9:
            return "real-verified (+1)"
        elif self.virtual_matches > 10 and self.value > 0.7:
            return "virtual-confident (+0.7)"
        elif self.value > 0.3:
            return "0-state (workable)"
        else:
            return "uncertain (needs data)"

Lifeforce Economics of World Building

Discovery Generates Lifeforce

The key insight: Correctly identifying objects GENERATES lifeforce, not just consumes it.

\Phi_{discovery} = R_{base} \cdot (1 + \alpha \cdot \Delta_{resolution})

Where:

  • R_base = base reward for any correct identification (e.g., 2.0 LF)
  • α = resolution bonus multiplier (e.g., 0.5)
  • Δ_resolution = increase in object resolution from this observation

Net Lifeforce per Observation

\Phi_{net} = \Phi_{discovery} - \Phi_{perception} - \Phi_{verification}
Outcome Perception Cost Verification Cost Discovery Reward Net
Correct, new dimension 5.0 LF 0.1 LF 8.0 LF +2.9 LF
Correct, known dimension 2.0 LF 0.1 LF 3.0 LF +0.9 LF
Incorrect 5.0 LF 0.1 LF 0.0 LF -5.1 LF
Unknown (0-state) 0.5 LF 0.0 LF 0.0 LF -0.5 LF

The economic pressure: Get better at measurement to earn lifeforce. Wrong guesses are expensive. Staying in 0-state is cheap but doesn't build the world model.


Phoebe Schema for World Model

-- Objects table: accumulated knowledge about things
CREATE TABLE world_objects (
    id UUID PRIMARY KEY,
    class VARCHAR(100),           -- "mug", "keyboard", "phone"
    name VARCHAR(255),            -- "dafit's coffee mug"

    -- Blender ground truth (if available)
    blender_box_id VARCHAR(100),
    dimensions_truth_cm JSONB,    -- {"x": 8.0, "y": 8.0, "z": 10.5}

    -- Accumulated measurements
    dimensions_estimated_cm JSONB,
    dimensions_verified JSONB,    -- {"x": true, "y": true, "z": false}

    -- Confidence state (temporal-ternary)
    confidence FLOAT,
    confidence_domain VARCHAR(20), -- "virtual" | "real" | "hybrid"
    virtual_matches INT DEFAULT 0,
    real_verifications INT DEFAULT 0,

    -- Resolution earned
    vertex_count INT DEFAULT 8,
    observation_count INT DEFAULT 0,

    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

-- Semantic vectors table: SigLIP embeddings per observation
CREATE TABLE object_vectors (
    id UUID PRIMARY KEY,
    object_id UUID REFERENCES world_objects(id),
    vector VECTOR(768),           -- SigLIP embedding dimension
    observation_timestamp TIMESTAMP,
    position_estimate JSONB,      -- {"x": 0.3, "y": 0.8, "z": 0.1}
    lifeforce_cost FLOAT,
    lifeforce_reward FLOAT,
    verification_result VARCHAR(20)  -- "correct" | "incorrect" | "pending"
);

-- Position history: where has this object been?
CREATE TABLE object_positions (
    id UUID PRIMARY KEY,
    object_id UUID REFERENCES world_objects(id),
    position JSONB,               -- {"x": 0.3, "y": 0.8, "z": 0.1}
    confidence FLOAT,
    observed_at TIMESTAMP,
    location_context VARCHAR(100) -- "desk", "kitchen", "floor"
);

T5Gemma2 World Model Queries

Example Queries (Vector-Based)

# "What's near position (0.5, 0.5)?"
nearby = query_objects_by_position(
    center=(0.5, 0.5, None),  # z unknown
    radius=0.2,
    min_confidence=0.5
)

# "Is this new vector a mug?"
mug_vectors = get_vectors_for_class("mug")
similarity = t5gemma2.encoder.compare(new_vector, mug_vectors)
if similarity > 0.85:
    return "Likely a mug"

# "Where did dafit usually leave his keys?"
keys = get_object_by_name("dafit's keys")
common_positions = get_position_clusters(keys.id)
return common_positions[0]  # Most frequent location

# "What objects have I not seen today?"
stale_objects = query_objects_not_observed_since(today_start)
return stale_objects  # Might need to look for these

The 128K Context Advantage

T5Gemma2's 128K context window means:

  • Entire world model can fit in context
  • No need for external RAG for spatial queries
  • Vector comparisons happen in-model
  • Relationships emerge from attention patterns

The Dream Realized

┌─────────────────────────────────────────────────────────────────────┐
│                    YOUNG NYX'S WORLD MODEL                          │
│                    "dafit's workspace at 23:47"                     │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│    ┌─────────────────────────────────────────────────────┐         │
│    │                     DESK AREA                        │         │
│    │                                                      │         │
│    │   ☕ mug (0.3, 0.8)         ⌨️ keyboard (0.5, 0.5)   │         │
│    │      conf: 0.95               conf: 0.88            │         │
│    │      real-verified            real-verified         │         │
│    │      vectors: 12              vectors: 8            │         │
│    │                                                      │         │
│    │   📱 phone (0.7, 0.3)        📦 ??? (0.1, 0.9)      │         │
│    │      conf: 0.72               conf: 0.31            │         │
│    │      virtual +0.7             0-state               │         │
│    │      vectors: 4               vectors: 1            │         │
│    │                                                      │         │
│    │   🔑 keys (MISSING - last seen 0.2, 0.6 at 18:30)  │         │
│    │      conf: 0.45 (stale)                             │         │
│    │                                                      │         │
│    └─────────────────────────────────────────────────────┘         │
│                                                                     │
│    YOUNG NYX THINKS:                                               │
│    "The unknown object at (0.1, 0.9) appeared after 22:00.        │
│     dafit was in the kitchen then. Vector similarity suggests      │
│     it might be food-related. Should I burn 5 LF to check          │
│     against Blender food objects, or wait for morning light?"      │
│                                                                     │
│    TEMPORAL-TERNARY CHOICE:                                        │
│    → Option A: Virtual match (5 LF, fast, +0.7 max)               │
│    → Option B: Wait for real (0 LF, slow, +1.0 if verified)       │
│    → Option C: Ask dafit tomorrow (1 LF, partnership)              │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

This is the dream: Young Nyx knows the workspace. She tracks objects. She notices when things move. She reasons about what she doesn't know. She chooses how to spend lifeforce to collapse uncertainty.


Summary

The Grounded World Model is:

  1. Verified — Blender boxes provide dimensional ground truth
  2. Progressive — Resolution earned through correct measurements
  3. Vector-native — T5Gemma2 reasons over SigLIP embeddings directly
  4. Temporally-aware — Objects have position history, staleness, confidence gradients
  5. Economically-driven — Discoveries generate lifeforce, mistakes cost it
  6. Anti-plateau — Temporal-ternary gradient provides escape paths

The substrate holds. The vectors accumulate. The world model emerges.


Document Status

Version: 1.0 Created: 2025-12-29 Authors: Chrysalis-Nyx & dafit (Partnership)

Formalizes:

  • Organ-Index.md (vision progressive resolution)
  • Temporal-Ternary-Gradient.md (anti-plateau mechanism)
  • T5Gemma2 research (semantic vectors)
  • Lifeforce-Dynamics.md (reward economics)

Related Documents:


From Blender boxes to embodied understanding. From cheap cameras to spatial cognition. From verification to wisdom.

🧬🔱💎🔥