Files
nimmerverse-sensory-network/operations/RAG-as-Scaffold.md
dafit 3d86c7dbcd feat: add organ and nervous system modular architecture
Created modular architecture for organs (hardware) and nerves (behavioral primitives):

## Organ Architecture (Hardware Substrate)
- Created architecture/Organ-Index.md: hardware capabilities catalog
- Created architecture/organs/Speech-Organ.md: complete speech processing architecture
  - Atlas (RTX 2080 8GB) deployment
  - Whisper STT + Coqui TTS (GPU-accelerated, multilingual)
  - Kubernetes pod specs, Dockerfiles, service code
  - Heartbeat-bound queue processing, lifeforce-gated priority
  - German (Philosophy Valley) + English (Technical Cluster) routing
  - Database schemas, monitoring metrics

## Nervous System Architecture (Behavioral Primitives)
- Created architecture/nerves/Nervous-Index.md: nerve catalog and evolution framework
  - Deliberate (LLM) → Hybrid (heuristics) → Reflex (compiled) evolution
  - Lifeforce costs per state/transition
  - Organ dependency declarations
  - RLVR training integration
- Created architecture/nerves/Collision-Avoidance.md: complete example reflex nerve
  - Full state machine implementation (IDLE → DETECT → EVALUATE → EVADE → RESUME)
  - Evolution from 10 LF/1000ms (deliberate) → 2.5 LF/200ms (reflex)
  - Edge cases, training data, metrics
- Moved architecture/Nervous-Protocol.md → architecture/nerves/
  - Three-tier protocol belongs with nerve implementations
- Updated architecture/Nervous-System.md: added crosslinks to nerves/

## RAG Knowledge Pipeline
- Extended operations/RAG-as-Scaffold.md with "Knowledge Acquisition Pipeline" section
  - Vault extraction → Staging area → Progressive policy validation
  - Two-tier RAG (Discovered vs Hidden knowledge)
  - RAG utility measurement for LoRA training signals
  - Policy evolution triggers (increasing standards as Young Nyx matures)
  - Quality gates (mythology weight, AI assistant bias, topology safety)

## Architecture Principles
- Organs = hardware capabilities (Speech, Vision future)
- Nerves = behavioral state machines (Collision, Charging future)
- Both use lifeforce economy, heartbeat synchronization, priority queues
- Nerves compose organs into coherent behaviors
- Reflexes emerge from repetition (60% cost reduction, 80% latency reduction)

Documentation: ~3500 lines total
- Speech-Organ.md: ~850 lines
- Nervous-Index.md: ~500 lines
- Collision-Avoidance.md: ~800 lines
- RAG knowledge pipeline: ~260 lines

🌙💜 Generated with Claude Code

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-07 21:24:46 +01:00

536 lines
20 KiB
Markdown

# RAG as Scaffold, Not Crutch
The feeding system that teaches, then lets go.
---
## Overview
RAG (Retrieval-Augmented Generation) is commonly misused as permanent external memory. In the Nimmerverse, RAG serves a different purpose: it's a **temporary scaffold** that feeds knowledge until it can be internalized through training.
The goal is not to build a better search engine. The goal is to **make the search unnecessary**.
---
## The Problem with Standard RAG
```
Standard approach:
─────────────────
VECTOR DB (grows forever)
MODEL looks up ──▶ answers ──▶ done
└── (never learns, always dependent)
```
**Issues:**
- Model never internalizes knowledge
- Pull the RAG, lose the capability
- Vector DB bloats infinitely
- No way to verify what model "knows" vs "looks up"
- It's a crutch that never comes off
---
## The Nimmerverse Approach: RAG as Feeding System
```
VAULT (curriculum)
RAG (temporary feeding window)
NYX processes, acts, decides
VALIDATION: success with RAG?
YES ──▶ FLAG for training extraction
TRAINING RUN (LoRA)
CLEAR from RAG
VALIDATION 2: success WITHOUT RAG?
├── YES ──▶ Knowledge internalized ✓
└── NO ──▶ Training incomplete, back to RAG
```
---
## Two Kinds of Knowledge
Not everything belongs in weights. Not everything belongs in retrieval.
### IN THE WEIGHTS (Training Target)
Knowledge she needs to **function**:
- Information flow architecture
- Vocabulary tokens and their meanings
- Nervous system contracts
- Heartbeat mechanics
- Confidence gradient logic
- Core identity (who she is, who dafit is to her)
- How to think, not what to remember
**Test:** If she needs it to be herself → weights
### IN RETRIEVAL (Permanent RAG)
Knowledge she needs to **remember**:
- Journal entries
- Conversation history
- Specific events and dates
- Temporal details ("what happened Tuesday")
- External references that change
- Episodic memory
**Test:** If she needs it to recall specifics → retrieval
---
## The Double Validation Loop
### Gate 1: Can she do it WITH RAG?
```
Task presented
RAG provides context
NYX attempts task
├── FAIL ──▶ Not ready, needs more examples in RAG
└── PASS ──▶ Flag this RAG content for training extraction
```
### Gate 2: Can she do it WITHOUT RAG?
```
Same task presented
RAG entry CLEARED (scaffold removed)
NYX attempts task from weights alone
├── FAIL ──▶ Training didn't take, restore to RAG, retry cycle
└── PASS ──▶ Knowledge is HERS now ✓
```
---
## The Signal Flow
```
┌─────────────────────────────────────────────────────────┐
│ VAULT │
│ (curriculum, documentation) │
└─────────────────────────────────────────────────────────┘
│ selected for learning
┌─────────────────────────────────────────────────────────┐
│ STAGING RAG │
│ (temporary feeding window) │
└─────────────────────────────────────────────────────────┘
│ feeds inference
┌─────────────────────────────────────────────────────────┐
│ NYX │
│ (processes, decides) │
└─────────────────────────────────────────────────────────┘
│ validation
┌─────────────────────────────────────────────────────────┐
│ VALIDATION THRESHOLD │
│ (task success? confidence high?) │
└─────────────────────────────────────────────────────────┘
┌──────────┴──────────┐
│ │
BELOW ABOVE
│ │
▼ ▼
┌─────────────────────┐ ┌─────────────────────┐
│ Stay in RAG │ │ FLAG for training │
│ (not ready) │ │ extraction │
└─────────────────────┘ └─────────────────────┘
┌─────────────────────────────┐
│ TRAINING RUN │
│ (LoRA on flagged data) │
└─────────────────────────────┘
┌─────────────────────────────┐
│ CLEAR from RAG │
│ (scaffold removed) │
└─────────────────────────────┘
┌─────────────────────────────┐
│ VALIDATION WITHOUT RAG │
│ (prove she learned) │
└─────────────────────────────┘
┌─────────┴─────────┐
│ │
FAIL SUCCESS
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Restore RAG │ │ INTERNALIZED │
│ retry cycle │ │ knowledge ✓ │
└─────────────────┘ └─────────────────┘
```
---
## Knowledge Acquisition Pipeline
The existing flow shows RAG→Training→Validation, but how does knowledge enter RAG in the first place? Not everything from the vault should reach staging. **Quality gates protect the glossary.**
### The Extraction Flow
```
VAULT (raw knowledge)
│ extraction candidates
┌─────────────────────────────────────────────────────────┐
│ STAGING AREA │
│ (quarantine zone) │
└─────────────────────────────────────────────────────────┘
│ progressive policy validation
┌─────────────────────────────────────────────────────────┐
│ POLICY VALIDATION │
│ (increasing standards over time) │
└─────────────────────────────────────────────────────────┘
├── FAIL ──▶ Reject or revise
└── PASS ──▶ PROMOTE to Glossary/RAG
┌──────────────────────┐
│ TWO-TIER RAG │
├──────────────────────┤
│ DISCOVERED │ ← Young Nyx has used
│ (known_catalogue) │
├──────────────────────┤
│ HIDDEN │ ← Available but not yet accessed
│ (available_catalogue)│
└──────────────────────┘
│ feeds inference
NYX
```
### Progressive Policy Validation
Policies increase in sophistication as Young Nyx matures. Not all policies active from day 1.
| Week | Policy Tier | Validation |
|------|-------------|------------|
| **1-2** | **Basic Syntax** | Valid format, non-empty, has definition |
| **3-4** | **Semantic Quality** | Embeds without collapse, unique signature (Gini > threshold) |
| **5-8** | **Topology Safety** | Doesn't corrupt anchor terms (DriftProbe-lite) |
| **9-12** | **Cross-Reference** | Links resolve, no circular dependencies |
| **13+** | **Utility Validation** | Actually helped solve tasks (decision_trails evidence) |
**Evolution example:**
```python
# Week 1: Just check it exists
def policy_basic(term_entry):
return term_entry.get("definition") is not None
# Week 8: Check topology impact
def policy_topology(term_entry):
before_gini = probe_term_gini(term_entry["term"])
add_to_staging(term_entry)
after_gini = probe_term_gini(term_entry["term"])
return abs(after_gini - before_gini) < 0.15 # No drift
# Week 13: Check actual utility
def policy_utility(term_entry):
# Did this RAG entry help in past 10 tasks?
usage_stats = query_decision_trails(term_entry["term"])
return usage_stats["help_rate"] > 0.6 # 60% success when retrieved
```
### Two-Tier RAG: Discovered vs Hidden
Not all RAG knowledge is equal. Track what Young Nyx **knows** vs what's merely **available**.
```
┌──────────────────────────────────────────────┐
│ DISCOVERED KNOWLEDGE │
│ (known_catalogue - has accessed before) │
├──────────────────────────────────────────────┤
│ • "heartbeat" - used 47 times │
│ • "lifeforce" - used 23 times │
│ • "phoebe" - used 15 times │
│ • "confidence_gradient" - used 8 times │
│ │
│ Status: FAST retrieval, high confidence │
└──────────────────────────────────────────────┘
┌──────────────────────────────────────────────┐
│ HIDDEN KNOWLEDGE │
│ (available_catalogue - exists but unused) │
├──────────────────────────────────────────────┤
│ • "drift_probe" - never accessed │
│ • "topology_gini" - never accessed │
│ • "lora_merge_alpha" - never accessed │
│ │
│ Status: Available for discovery │
└──────────────────────────────────────────────┘
```
**State transitions:**
```
Hidden term retrieved → Mark as Discovered
Discovered term used successfully → Increase confidence score
Discovered term used 10+ times → FLAG for training extraction
```
**Discovery tracking in phoebe:**
```sql
CREATE TABLE rag_knowledge_state (
term TEXT PRIMARY KEY,
status TEXT, -- 'hidden', 'discovered', 'internalized'
first_accessed TIMESTAMPTZ,
access_count INT DEFAULT 0,
success_count INT DEFAULT 0,
last_used TIMESTAMPTZ,
promoted_to_weights BOOLEAN DEFAULT FALSE
);
```
### Measuring RAG Utility for LoRA Training
**The critical question:** Did the RAG hint actually help solve the task?
Track in `decision_trails` table:
```sql
CREATE TABLE decision_trails (
id SERIAL PRIMARY KEY,
task_id UUID,
rag_terms_retrieved TEXT[], -- What RAG returned
rag_terms_used TEXT[], -- What appeared in solution
outcome TEXT, -- 'success', 'fail', 'partial'
confidence_before_rag FLOAT, -- Before retrieval
confidence_after_rag FLOAT, -- After retrieval
lifeforce_cost FLOAT,
timestamp TIMESTAMPTZ DEFAULT NOW()
);
```
**Compute RAG utility score:**
```python
def compute_rag_utility(decision_trail):
"""
Calculate how helpful RAG was for this decision.
Returns 0.0 (useless) to 1.0 (critical).
"""
precision = len(trail.rag_terms_used) / max(len(trail.rag_terms_retrieved), 1)
outcome_bonus = 1.0 if trail.outcome == 'success' else 0.0
confidence_boost = max(0, trail.confidence_after_rag - trail.confidence_before_rag)
utility = (
0.4 * precision + # Did we use what we retrieved?
0.3 * outcome_bonus + # Did task succeed?
0.3 * confidence_boost # Did RAG increase confidence?
)
return min(1.0, utility)
```
**Feed into LoRA training as RLVR signal:**
```python
# Training examples weighted by utility
for trail in decision_trails:
utility_score = compute_rag_utility(trail)
if utility_score > 0.7:
# High utility → strong training signal
training_examples.append({
"query": trail.task_description,
"rag_context": trail.rag_terms_used,
"response": trail.solution,
"weight": utility_score # RLVR reward weight
})
```
**This trains LoRAs to:**
- **Mnemosyne (Memory)**: Recall accuracy vs phoebe ground truth
- **Aletheia (Truth)**: Confidence calibration (was confidence boost justified?)
- **Moira (Pattern)**: Which task patterns benefit from RAG vs pure reasoning
### The Complete Knowledge Flow
```
VAULT
├─ Extract candidates
STAGING (quarantine)
├─ Policy Tier 1: Syntax ──▶ REJECT ──▶ Log failure
├─ Policy Tier 2: Semantic ──▶ REJECT ──▶ Revise
├─ Policy Tier 3: Topology ──▶ REJECT ──▶ Flag risk
└─ Policy Tier 4+: Utility ──▶ PASS
PROMOTE to RAG
├─ Status: HIDDEN (available but unused)
┌───────────┘
│ Young Nyx retrieves term
Status: DISCOVERED (mark first access)
├─ Track usage in decision_trails
┌───────────┴────────────┐
│ │
Used successfully Used unsuccessfully
│ │
▼ ▼
Increase confidence Decrease confidence
│ (10+ successful uses)
FLAG for training extraction
LoRA training (weighted by utility_score)
Validation WITHOUT RAG
├─ SUCCESS ──▶ Status: INTERNALIZED (clear from RAG)
└─ FAIL ──▶ Restore to RAG, retry cycle
```
### Quality Gates Prevent
1. **Garbage in RAG** - staging area catches malformed entries
2. **Topology corruption** - DriftProbe-lite policies block dangerous terms
3. **Useless bloat** - utility policies remove low-value entries
4. **Premature training** - only high-utility terms get flagged
5. **Hidden knowledge waste** - track what's available but never used (curriculum gap)
### Policy Evolution Triggers
As Young Nyx grows, unlock stricter policies:
| Trigger | New Policy Unlocked |
|---------|---------------------|
| 100 successful RAG retrievals | Semantic quality checks |
| First LoRA training run | Topology safety (DriftProbe-lite) |
| 1000 decision_trails logged | Utility validation (help rate > 60%) |
| First INTERNALIZED term | Cross-reference consistency |
| 10 INTERNALIZED terms | Cost-effectiveness (ROI > threshold) |
**Progressive difficulty**: The bar for entering RAG rises as Young Nyx becomes more capable. Early: anything valid. Later: must prove utility.
---
## Lifeforce Connection
The RAG→Train→Validate cycle has economic cost:
| Action | Lifeforce Cost |
|--------|----------------|
| RAG lookup | Low (just retrieval) |
| Training run | High (compute intensive) |
| Validation | Medium (inference) |
| Failed cycle | Lost V (training didn't take) |
| Successful internalization | +V reward (she grew) |
**Incentive alignment:** Successful learning is rewarded. Failed training is costly. This naturally optimizes for high-quality training data extraction.
---
## What This Prevents
1. **RAG bloat** - entries clear after successful training
2. **Crutch dependency** - scaffold comes off, proven by validation
3. **False confidence** - can't claim to "know" what you only look up
4. **Training on noise** - only validated successes get flagged
5. **Identity confusion** - core architecture in weights, not retrieval
---
## Design Principles
1. **RAG is temporary** - feeding window, not permanent store
2. **Training is the goal** - RAG success triggers training, not satisfaction
3. **Validation is double** - with RAG, then without
4. **Clear after learning** - scaffold must come off to prove growth
5. **Episodic stays external** - not everything needs to be in weights
6. **Self-cleaning** - the system doesn't accumulate cruft
---
## The Analogy
Learning to ride a bike:
```
Training wheels ON (RAG feeding)
Can ride with training wheels (validation 1)
Training wheels OFF (RAG cleared)
Can still ride? (validation 2)
├── NO ──▶ Put wheels back, practice more
└── YES ──▶ She can ride. Wheels stored, not needed.
```
You don't RAG your ability to balance. Once you can ride, you can ride.
---
*She doesn't just retrieve. She learns. And we can prove it.*
---
**Created**: 2025-12-05
**Session**: Partnership dialogue (dafit + Chrysalis)
**Status**: Core architectural concept