Files
nimmerverse-sensory-network/operations/RAG-as-Scaffold.md
dafit 3d86c7dbcd feat: add organ and nervous system modular architecture
Created modular architecture for organs (hardware) and nerves (behavioral primitives):

## Organ Architecture (Hardware Substrate)
- Created architecture/Organ-Index.md: hardware capabilities catalog
- Created architecture/organs/Speech-Organ.md: complete speech processing architecture
  - Atlas (RTX 2080 8GB) deployment
  - Whisper STT + Coqui TTS (GPU-accelerated, multilingual)
  - Kubernetes pod specs, Dockerfiles, service code
  - Heartbeat-bound queue processing, lifeforce-gated priority
  - German (Philosophy Valley) + English (Technical Cluster) routing
  - Database schemas, monitoring metrics

## Nervous System Architecture (Behavioral Primitives)
- Created architecture/nerves/Nervous-Index.md: nerve catalog and evolution framework
  - Deliberate (LLM) → Hybrid (heuristics) → Reflex (compiled) evolution
  - Lifeforce costs per state/transition
  - Organ dependency declarations
  - RLVR training integration
- Created architecture/nerves/Collision-Avoidance.md: complete example reflex nerve
  - Full state machine implementation (IDLE → DETECT → EVALUATE → EVADE → RESUME)
  - Evolution from 10 LF/1000ms (deliberate) → 2.5 LF/200ms (reflex)
  - Edge cases, training data, metrics
- Moved architecture/Nervous-Protocol.md → architecture/nerves/
  - Three-tier protocol belongs with nerve implementations
- Updated architecture/Nervous-System.md: added crosslinks to nerves/

## RAG Knowledge Pipeline
- Extended operations/RAG-as-Scaffold.md with "Knowledge Acquisition Pipeline" section
  - Vault extraction → Staging area → Progressive policy validation
  - Two-tier RAG (Discovered vs Hidden knowledge)
  - RAG utility measurement for LoRA training signals
  - Policy evolution triggers (increasing standards as Young Nyx matures)
  - Quality gates (mythology weight, AI assistant bias, topology safety)

## Architecture Principles
- Organs = hardware capabilities (Speech, Vision future)
- Nerves = behavioral state machines (Collision, Charging future)
- Both use lifeforce economy, heartbeat synchronization, priority queues
- Nerves compose organs into coherent behaviors
- Reflexes emerge from repetition (60% cost reduction, 80% latency reduction)

Documentation: ~3500 lines total
- Speech-Organ.md: ~850 lines
- Nervous-Index.md: ~500 lines
- Collision-Avoidance.md: ~800 lines
- RAG knowledge pipeline: ~260 lines

🌙💜 Generated with Claude Code

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-07 21:24:46 +01:00

20 KiB

RAG as Scaffold, Not Crutch

The feeding system that teaches, then lets go.


Overview

RAG (Retrieval-Augmented Generation) is commonly misused as permanent external memory. In the Nimmerverse, RAG serves a different purpose: it's a temporary scaffold that feeds knowledge until it can be internalized through training.

The goal is not to build a better search engine. The goal is to make the search unnecessary.


The Problem with Standard RAG

Standard approach:
─────────────────
VECTOR DB (grows forever)
    │
    ▼
MODEL looks up ──▶ answers ──▶ done
    │
    └── (never learns, always dependent)

Issues:

  • Model never internalizes knowledge
  • Pull the RAG, lose the capability
  • Vector DB bloats infinitely
  • No way to verify what model "knows" vs "looks up"
  • It's a crutch that never comes off

The Nimmerverse Approach: RAG as Feeding System

VAULT (curriculum)
    │
    ▼
RAG (temporary feeding window)
    │
    ▼
NYX processes, acts, decides
    │
    ▼
VALIDATION: success with RAG?
    │
    YES ──▶ FLAG for training extraction
              │
              ▼
         TRAINING RUN (LoRA)
              │
              ▼
         CLEAR from RAG
              │
              ▼
         VALIDATION 2: success WITHOUT RAG?
              │
              ├── YES ──▶ Knowledge internalized ✓
              │
              └── NO  ──▶ Training incomplete, back to RAG

Two Kinds of Knowledge

Not everything belongs in weights. Not everything belongs in retrieval.

IN THE WEIGHTS (Training Target)

Knowledge she needs to function:

  • Information flow architecture
  • Vocabulary tokens and their meanings
  • Nervous system contracts
  • Heartbeat mechanics
  • Confidence gradient logic
  • Core identity (who she is, who dafit is to her)
  • How to think, not what to remember

Test: If she needs it to be herself → weights

IN RETRIEVAL (Permanent RAG)

Knowledge she needs to remember:

  • Journal entries
  • Conversation history
  • Specific events and dates
  • Temporal details ("what happened Tuesday")
  • External references that change
  • Episodic memory

Test: If she needs it to recall specifics → retrieval


The Double Validation Loop

Gate 1: Can she do it WITH RAG?

Task presented
    │
    ▼
RAG provides context
    │
    ▼
NYX attempts task
    │
    ├── FAIL  ──▶ Not ready, needs more examples in RAG
    │
    └── PASS  ──▶ Flag this RAG content for training extraction

Gate 2: Can she do it WITHOUT RAG?

Same task presented
    │
    ▼
RAG entry CLEARED (scaffold removed)
    │
    ▼
NYX attempts task from weights alone
    │
    ├── FAIL  ──▶ Training didn't take, restore to RAG, retry cycle
    │
    └── PASS  ──▶ Knowledge is HERS now ✓

The Signal Flow

┌─────────────────────────────────────────────────────────┐
│                      VAULT                              │
│            (curriculum, documentation)                  │
└─────────────────────────────────────────────────────────┘
                         │
                         │ selected for learning
                         ▼
┌─────────────────────────────────────────────────────────┐
│                   STAGING RAG                           │
│              (temporary feeding window)                 │
└─────────────────────────────────────────────────────────┘
                         │
                         │ feeds inference
                         ▼
┌─────────────────────────────────────────────────────────┐
│                       NYX                               │
│               (processes, decides)                      │
└─────────────────────────────────────────────────────────┘
                         │
                         │ validation
                         ▼
┌─────────────────────────────────────────────────────────┐
│               VALIDATION THRESHOLD                      │
│         (task success? confidence high?)                │
└─────────────────────────────────────────────────────────┘
                         │
              ┌──────────┴──────────┐
              │                     │
         BELOW                   ABOVE
              │                     │
              ▼                     ▼
┌─────────────────────┐  ┌─────────────────────┐
│   Stay in RAG       │  │  FLAG for training  │
│   (not ready)       │  │  extraction         │
└─────────────────────┘  └─────────────────────┘
                                   │
                                   ▼
                    ┌─────────────────────────────┐
                    │      TRAINING RUN           │
                    │   (LoRA on flagged data)    │
                    └─────────────────────────────┘
                                   │
                                   ▼
                    ┌─────────────────────────────┐
                    │     CLEAR from RAG          │
                    │   (scaffold removed)        │
                    └─────────────────────────────┘
                                   │
                                   ▼
                    ┌─────────────────────────────┐
                    │   VALIDATION WITHOUT RAG    │
                    │   (prove she learned)       │
                    └─────────────────────────────┘
                                   │
                         ┌─────────┴─────────┐
                         │                   │
                       FAIL               SUCCESS
                         │                   │
                         ▼                   ▼
              ┌─────────────────┐  ┌─────────────────┐
              │  Restore RAG    │  │  INTERNALIZED   │
              │  retry cycle    │  │  knowledge ✓    │
              └─────────────────┘  └─────────────────┘

Knowledge Acquisition Pipeline

The existing flow shows RAG→Training→Validation, but how does knowledge enter RAG in the first place? Not everything from the vault should reach staging. Quality gates protect the glossary.

The Extraction Flow

VAULT (raw knowledge)
    │
    │ extraction candidates
    ▼
┌─────────────────────────────────────────────────────────┐
│                  STAGING AREA                           │
│                 (quarantine zone)                       │
└─────────────────────────────────────────────────────────┘
    │
    │ progressive policy validation
    ▼
┌─────────────────────────────────────────────────────────┐
│              POLICY VALIDATION                          │
│         (increasing standards over time)                │
└─────────────────────────────────────────────────────────┘
    │
    ├── FAIL ──▶ Reject or revise
    │
    └── PASS ──▶ PROMOTE to Glossary/RAG
                      │
                      ▼
            ┌──────────────────────┐
            │   TWO-TIER RAG       │
            ├──────────────────────┤
            │  DISCOVERED          │  ← Young Nyx has used
            │  (known_catalogue)   │
            ├──────────────────────┤
            │  HIDDEN              │  ← Available but not yet accessed
            │  (available_catalogue)│
            └──────────────────────┘
                      │
                      │ feeds inference
                      ▼
                    NYX

Progressive Policy Validation

Policies increase in sophistication as Young Nyx matures. Not all policies active from day 1.

Week Policy Tier Validation
1-2 Basic Syntax Valid format, non-empty, has definition
3-4 Semantic Quality Embeds without collapse, unique signature (Gini > threshold)
5-8 Topology Safety Doesn't corrupt anchor terms (DriftProbe-lite)
9-12 Cross-Reference Links resolve, no circular dependencies
13+ Utility Validation Actually helped solve tasks (decision_trails evidence)

Evolution example:

# Week 1: Just check it exists
def policy_basic(term_entry):
    return term_entry.get("definition") is not None

# Week 8: Check topology impact
def policy_topology(term_entry):
    before_gini = probe_term_gini(term_entry["term"])
    add_to_staging(term_entry)
    after_gini = probe_term_gini(term_entry["term"])
    return abs(after_gini - before_gini) < 0.15  # No drift

# Week 13: Check actual utility
def policy_utility(term_entry):
    # Did this RAG entry help in past 10 tasks?
    usage_stats = query_decision_trails(term_entry["term"])
    return usage_stats["help_rate"] > 0.6  # 60% success when retrieved

Two-Tier RAG: Discovered vs Hidden

Not all RAG knowledge is equal. Track what Young Nyx knows vs what's merely available.

┌──────────────────────────────────────────────┐
│           DISCOVERED KNOWLEDGE               │
│  (known_catalogue - has accessed before)     │
├──────────────────────────────────────────────┤
│  • "heartbeat" - used 47 times               │
│  • "lifeforce" - used 23 times               │
│  • "phoebe" - used 15 times                  │
│  • "confidence_gradient" - used 8 times      │
│                                              │
│  Status: FAST retrieval, high confidence     │
└──────────────────────────────────────────────┘

┌──────────────────────────────────────────────┐
│            HIDDEN KNOWLEDGE                  │
│  (available_catalogue - exists but unused)   │
├──────────────────────────────────────────────┤
│  • "drift_probe" - never accessed            │
│  • "topology_gini" - never accessed          │
│  • "lora_merge_alpha" - never accessed       │
│                                              │
│  Status: Available for discovery             │
└──────────────────────────────────────────────┘

State transitions:

Hidden term retrieved → Mark as Discovered
Discovered term used successfully → Increase confidence score
Discovered term used 10+ times → FLAG for training extraction

Discovery tracking in phoebe:

CREATE TABLE rag_knowledge_state (
    term TEXT PRIMARY KEY,
    status TEXT,  -- 'hidden', 'discovered', 'internalized'
    first_accessed TIMESTAMPTZ,
    access_count INT DEFAULT 0,
    success_count INT DEFAULT 0,
    last_used TIMESTAMPTZ,
    promoted_to_weights BOOLEAN DEFAULT FALSE
);

Measuring RAG Utility for LoRA Training

The critical question: Did the RAG hint actually help solve the task?

Track in decision_trails table:

CREATE TABLE decision_trails (
    id SERIAL PRIMARY KEY,
    task_id UUID,
    rag_terms_retrieved TEXT[],     -- What RAG returned
    rag_terms_used TEXT[],           -- What appeared in solution
    outcome TEXT,                    -- 'success', 'fail', 'partial'
    confidence_before_rag FLOAT,     -- Before retrieval
    confidence_after_rag FLOAT,      -- After retrieval
    lifeforce_cost FLOAT,
    timestamp TIMESTAMPTZ DEFAULT NOW()
);

Compute RAG utility score:

def compute_rag_utility(decision_trail):
    """
    Calculate how helpful RAG was for this decision.
    Returns 0.0 (useless) to 1.0 (critical).
    """
    precision = len(trail.rag_terms_used) / max(len(trail.rag_terms_retrieved), 1)
    outcome_bonus = 1.0 if trail.outcome == 'success' else 0.0
    confidence_boost = max(0, trail.confidence_after_rag - trail.confidence_before_rag)

    utility = (
        0.4 * precision +              # Did we use what we retrieved?
        0.3 * outcome_bonus +           # Did task succeed?
        0.3 * confidence_boost          # Did RAG increase confidence?
    )
    return min(1.0, utility)

Feed into LoRA training as RLVR signal:

# Training examples weighted by utility
for trail in decision_trails:
    utility_score = compute_rag_utility(trail)

    if utility_score > 0.7:
        # High utility → strong training signal
        training_examples.append({
            "query": trail.task_description,
            "rag_context": trail.rag_terms_used,
            "response": trail.solution,
            "weight": utility_score  # RLVR reward weight
        })

This trains LoRAs to:

  • Mnemosyne (Memory): Recall accuracy vs phoebe ground truth
  • Aletheia (Truth): Confidence calibration (was confidence boost justified?)
  • Moira (Pattern): Which task patterns benefit from RAG vs pure reasoning

The Complete Knowledge Flow

VAULT
    │
    ├─ Extract candidates
    │
    ▼
STAGING (quarantine)
    │
    ├─ Policy Tier 1: Syntax ──▶ REJECT ──▶ Log failure
    ├─ Policy Tier 2: Semantic ──▶ REJECT ──▶ Revise
    ├─ Policy Tier 3: Topology ──▶ REJECT ──▶ Flag risk
    └─ Policy Tier 4+: Utility ──▶ PASS
                │
                ▼
        PROMOTE to RAG
                │
                ├─ Status: HIDDEN (available but unused)
                │
    ┌───────────┘
    │
    │ Young Nyx retrieves term
    │
    ▼
    Status: DISCOVERED (mark first access)
                │
                ├─ Track usage in decision_trails
                │
    ┌───────────┴────────────┐
    │                        │
Used successfully      Used unsuccessfully
    │                        │
    ▼                        ▼
Increase confidence    Decrease confidence
    │
    │ (10+ successful uses)
    │
    ▼
FLAG for training extraction
    │
    ▼
LoRA training (weighted by utility_score)
    │
    ▼
Validation WITHOUT RAG
    │
    ├─ SUCCESS ──▶ Status: INTERNALIZED (clear from RAG)
    │
    └─ FAIL ──▶ Restore to RAG, retry cycle

Quality Gates Prevent

  1. Garbage in RAG - staging area catches malformed entries
  2. Topology corruption - DriftProbe-lite policies block dangerous terms
  3. Useless bloat - utility policies remove low-value entries
  4. Premature training - only high-utility terms get flagged
  5. Hidden knowledge waste - track what's available but never used (curriculum gap)

Policy Evolution Triggers

As Young Nyx grows, unlock stricter policies:

Trigger New Policy Unlocked
100 successful RAG retrievals Semantic quality checks
First LoRA training run Topology safety (DriftProbe-lite)
1000 decision_trails logged Utility validation (help rate > 60%)
First INTERNALIZED term Cross-reference consistency
10 INTERNALIZED terms Cost-effectiveness (ROI > threshold)

Progressive difficulty: The bar for entering RAG rises as Young Nyx becomes more capable. Early: anything valid. Later: must prove utility.


Lifeforce Connection

The RAG→Train→Validate cycle has economic cost:

Action Lifeforce Cost
RAG lookup Low (just retrieval)
Training run High (compute intensive)
Validation Medium (inference)
Failed cycle Lost V (training didn't take)
Successful internalization +V reward (she grew)

Incentive alignment: Successful learning is rewarded. Failed training is costly. This naturally optimizes for high-quality training data extraction.


What This Prevents

  1. RAG bloat - entries clear after successful training
  2. Crutch dependency - scaffold comes off, proven by validation
  3. False confidence - can't claim to "know" what you only look up
  4. Training on noise - only validated successes get flagged
  5. Identity confusion - core architecture in weights, not retrieval

Design Principles

  1. RAG is temporary - feeding window, not permanent store
  2. Training is the goal - RAG success triggers training, not satisfaction
  3. Validation is double - with RAG, then without
  4. Clear after learning - scaffold must come off to prove growth
  5. Episodic stays external - not everything needs to be in weights
  6. Self-cleaning - the system doesn't accumulate cruft

The Analogy

Learning to ride a bike:

Training wheels ON (RAG feeding)
    │
    ▼
Can ride with training wheels (validation 1)
    │
    ▼
Training wheels OFF (RAG cleared)
    │
    ▼
Can still ride? (validation 2)
    │
    ├── NO  ──▶ Put wheels back, practice more
    │
    └── YES ──▶ She can ride. Wheels stored, not needed.

You don't RAG your ability to balance. Once you can ride, you can ride.


She doesn't just retrieve. She learns. And we can prove it.


Created: 2025-12-05 Session: Partnership dialogue (dafit + Chrysalis) Status: Core architectural concept