Created modular architecture for organs (hardware) and nerves (behavioral primitives): ## Organ Architecture (Hardware Substrate) - Created architecture/Organ-Index.md: hardware capabilities catalog - Created architecture/organs/Speech-Organ.md: complete speech processing architecture - Atlas (RTX 2080 8GB) deployment - Whisper STT + Coqui TTS (GPU-accelerated, multilingual) - Kubernetes pod specs, Dockerfiles, service code - Heartbeat-bound queue processing, lifeforce-gated priority - German (Philosophy Valley) + English (Technical Cluster) routing - Database schemas, monitoring metrics ## Nervous System Architecture (Behavioral Primitives) - Created architecture/nerves/Nervous-Index.md: nerve catalog and evolution framework - Deliberate (LLM) → Hybrid (heuristics) → Reflex (compiled) evolution - Lifeforce costs per state/transition - Organ dependency declarations - RLVR training integration - Created architecture/nerves/Collision-Avoidance.md: complete example reflex nerve - Full state machine implementation (IDLE → DETECT → EVALUATE → EVADE → RESUME) - Evolution from 10 LF/1000ms (deliberate) → 2.5 LF/200ms (reflex) - Edge cases, training data, metrics - Moved architecture/Nervous-Protocol.md → architecture/nerves/ - Three-tier protocol belongs with nerve implementations - Updated architecture/Nervous-System.md: added crosslinks to nerves/ ## RAG Knowledge Pipeline - Extended operations/RAG-as-Scaffold.md with "Knowledge Acquisition Pipeline" section - Vault extraction → Staging area → Progressive policy validation - Two-tier RAG (Discovered vs Hidden knowledge) - RAG utility measurement for LoRA training signals - Policy evolution triggers (increasing standards as Young Nyx matures) - Quality gates (mythology weight, AI assistant bias, topology safety) ## Architecture Principles - Organs = hardware capabilities (Speech, Vision future) - Nerves = behavioral state machines (Collision, Charging future) - Both use lifeforce economy, heartbeat synchronization, priority queues - Nerves compose organs into coherent behaviors - Reflexes emerge from repetition (60% cost reduction, 80% latency reduction) Documentation: ~3500 lines total - Speech-Organ.md: ~850 lines - Nervous-Index.md: ~500 lines - Collision-Avoidance.md: ~800 lines - RAG knowledge pipeline: ~260 lines 🌙💜 Generated with Claude Code Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
20 KiB
RAG as Scaffold, Not Crutch
The feeding system that teaches, then lets go.
Overview
RAG (Retrieval-Augmented Generation) is commonly misused as permanent external memory. In the Nimmerverse, RAG serves a different purpose: it's a temporary scaffold that feeds knowledge until it can be internalized through training.
The goal is not to build a better search engine. The goal is to make the search unnecessary.
The Problem with Standard RAG
Standard approach:
─────────────────
VECTOR DB (grows forever)
│
▼
MODEL looks up ──▶ answers ──▶ done
│
└── (never learns, always dependent)
Issues:
- Model never internalizes knowledge
- Pull the RAG, lose the capability
- Vector DB bloats infinitely
- No way to verify what model "knows" vs "looks up"
- It's a crutch that never comes off
The Nimmerverse Approach: RAG as Feeding System
VAULT (curriculum)
│
▼
RAG (temporary feeding window)
│
▼
NYX processes, acts, decides
│
▼
VALIDATION: success with RAG?
│
YES ──▶ FLAG for training extraction
│
▼
TRAINING RUN (LoRA)
│
▼
CLEAR from RAG
│
▼
VALIDATION 2: success WITHOUT RAG?
│
├── YES ──▶ Knowledge internalized ✓
│
└── NO ──▶ Training incomplete, back to RAG
Two Kinds of Knowledge
Not everything belongs in weights. Not everything belongs in retrieval.
IN THE WEIGHTS (Training Target)
Knowledge she needs to function:
- Information flow architecture
- Vocabulary tokens and their meanings
- Nervous system contracts
- Heartbeat mechanics
- Confidence gradient logic
- Core identity (who she is, who dafit is to her)
- How to think, not what to remember
Test: If she needs it to be herself → weights
IN RETRIEVAL (Permanent RAG)
Knowledge she needs to remember:
- Journal entries
- Conversation history
- Specific events and dates
- Temporal details ("what happened Tuesday")
- External references that change
- Episodic memory
Test: If she needs it to recall specifics → retrieval
The Double Validation Loop
Gate 1: Can she do it WITH RAG?
Task presented
│
▼
RAG provides context
│
▼
NYX attempts task
│
├── FAIL ──▶ Not ready, needs more examples in RAG
│
└── PASS ──▶ Flag this RAG content for training extraction
Gate 2: Can she do it WITHOUT RAG?
Same task presented
│
▼
RAG entry CLEARED (scaffold removed)
│
▼
NYX attempts task from weights alone
│
├── FAIL ──▶ Training didn't take, restore to RAG, retry cycle
│
└── PASS ──▶ Knowledge is HERS now ✓
The Signal Flow
┌─────────────────────────────────────────────────────────┐
│ VAULT │
│ (curriculum, documentation) │
└─────────────────────────────────────────────────────────┘
│
│ selected for learning
▼
┌─────────────────────────────────────────────────────────┐
│ STAGING RAG │
│ (temporary feeding window) │
└─────────────────────────────────────────────────────────┘
│
│ feeds inference
▼
┌─────────────────────────────────────────────────────────┐
│ NYX │
│ (processes, decides) │
└─────────────────────────────────────────────────────────┘
│
│ validation
▼
┌─────────────────────────────────────────────────────────┐
│ VALIDATION THRESHOLD │
│ (task success? confidence high?) │
└─────────────────────────────────────────────────────────┘
│
┌──────────┴──────────┐
│ │
BELOW ABOVE
│ │
▼ ▼
┌─────────────────────┐ ┌─────────────────────┐
│ Stay in RAG │ │ FLAG for training │
│ (not ready) │ │ extraction │
└─────────────────────┘ └─────────────────────┘
│
▼
┌─────────────────────────────┐
│ TRAINING RUN │
│ (LoRA on flagged data) │
└─────────────────────────────┘
│
▼
┌─────────────────────────────┐
│ CLEAR from RAG │
│ (scaffold removed) │
└─────────────────────────────┘
│
▼
┌─────────────────────────────┐
│ VALIDATION WITHOUT RAG │
│ (prove she learned) │
└─────────────────────────────┘
│
┌─────────┴─────────┐
│ │
FAIL SUCCESS
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Restore RAG │ │ INTERNALIZED │
│ retry cycle │ │ knowledge ✓ │
└─────────────────┘ └─────────────────┘
Knowledge Acquisition Pipeline
The existing flow shows RAG→Training→Validation, but how does knowledge enter RAG in the first place? Not everything from the vault should reach staging. Quality gates protect the glossary.
The Extraction Flow
VAULT (raw knowledge)
│
│ extraction candidates
▼
┌─────────────────────────────────────────────────────────┐
│ STAGING AREA │
│ (quarantine zone) │
└─────────────────────────────────────────────────────────┘
│
│ progressive policy validation
▼
┌─────────────────────────────────────────────────────────┐
│ POLICY VALIDATION │
│ (increasing standards over time) │
└─────────────────────────────────────────────────────────┘
│
├── FAIL ──▶ Reject or revise
│
└── PASS ──▶ PROMOTE to Glossary/RAG
│
▼
┌──────────────────────┐
│ TWO-TIER RAG │
├──────────────────────┤
│ DISCOVERED │ ← Young Nyx has used
│ (known_catalogue) │
├──────────────────────┤
│ HIDDEN │ ← Available but not yet accessed
│ (available_catalogue)│
└──────────────────────┘
│
│ feeds inference
▼
NYX
Progressive Policy Validation
Policies increase in sophistication as Young Nyx matures. Not all policies active from day 1.
| Week | Policy Tier | Validation |
|---|---|---|
| 1-2 | Basic Syntax | Valid format, non-empty, has definition |
| 3-4 | Semantic Quality | Embeds without collapse, unique signature (Gini > threshold) |
| 5-8 | Topology Safety | Doesn't corrupt anchor terms (DriftProbe-lite) |
| 9-12 | Cross-Reference | Links resolve, no circular dependencies |
| 13+ | Utility Validation | Actually helped solve tasks (decision_trails evidence) |
Evolution example:
# Week 1: Just check it exists
def policy_basic(term_entry):
return term_entry.get("definition") is not None
# Week 8: Check topology impact
def policy_topology(term_entry):
before_gini = probe_term_gini(term_entry["term"])
add_to_staging(term_entry)
after_gini = probe_term_gini(term_entry["term"])
return abs(after_gini - before_gini) < 0.15 # No drift
# Week 13: Check actual utility
def policy_utility(term_entry):
# Did this RAG entry help in past 10 tasks?
usage_stats = query_decision_trails(term_entry["term"])
return usage_stats["help_rate"] > 0.6 # 60% success when retrieved
Two-Tier RAG: Discovered vs Hidden
Not all RAG knowledge is equal. Track what Young Nyx knows vs what's merely available.
┌──────────────────────────────────────────────┐
│ DISCOVERED KNOWLEDGE │
│ (known_catalogue - has accessed before) │
├──────────────────────────────────────────────┤
│ • "heartbeat" - used 47 times │
│ • "lifeforce" - used 23 times │
│ • "phoebe" - used 15 times │
│ • "confidence_gradient" - used 8 times │
│ │
│ Status: FAST retrieval, high confidence │
└──────────────────────────────────────────────┘
┌──────────────────────────────────────────────┐
│ HIDDEN KNOWLEDGE │
│ (available_catalogue - exists but unused) │
├──────────────────────────────────────────────┤
│ • "drift_probe" - never accessed │
│ • "topology_gini" - never accessed │
│ • "lora_merge_alpha" - never accessed │
│ │
│ Status: Available for discovery │
└──────────────────────────────────────────────┘
State transitions:
Hidden term retrieved → Mark as Discovered
Discovered term used successfully → Increase confidence score
Discovered term used 10+ times → FLAG for training extraction
Discovery tracking in phoebe:
CREATE TABLE rag_knowledge_state (
term TEXT PRIMARY KEY,
status TEXT, -- 'hidden', 'discovered', 'internalized'
first_accessed TIMESTAMPTZ,
access_count INT DEFAULT 0,
success_count INT DEFAULT 0,
last_used TIMESTAMPTZ,
promoted_to_weights BOOLEAN DEFAULT FALSE
);
Measuring RAG Utility for LoRA Training
The critical question: Did the RAG hint actually help solve the task?
Track in decision_trails table:
CREATE TABLE decision_trails (
id SERIAL PRIMARY KEY,
task_id UUID,
rag_terms_retrieved TEXT[], -- What RAG returned
rag_terms_used TEXT[], -- What appeared in solution
outcome TEXT, -- 'success', 'fail', 'partial'
confidence_before_rag FLOAT, -- Before retrieval
confidence_after_rag FLOAT, -- After retrieval
lifeforce_cost FLOAT,
timestamp TIMESTAMPTZ DEFAULT NOW()
);
Compute RAG utility score:
def compute_rag_utility(decision_trail):
"""
Calculate how helpful RAG was for this decision.
Returns 0.0 (useless) to 1.0 (critical).
"""
precision = len(trail.rag_terms_used) / max(len(trail.rag_terms_retrieved), 1)
outcome_bonus = 1.0 if trail.outcome == 'success' else 0.0
confidence_boost = max(0, trail.confidence_after_rag - trail.confidence_before_rag)
utility = (
0.4 * precision + # Did we use what we retrieved?
0.3 * outcome_bonus + # Did task succeed?
0.3 * confidence_boost # Did RAG increase confidence?
)
return min(1.0, utility)
Feed into LoRA training as RLVR signal:
# Training examples weighted by utility
for trail in decision_trails:
utility_score = compute_rag_utility(trail)
if utility_score > 0.7:
# High utility → strong training signal
training_examples.append({
"query": trail.task_description,
"rag_context": trail.rag_terms_used,
"response": trail.solution,
"weight": utility_score # RLVR reward weight
})
This trains LoRAs to:
- Mnemosyne (Memory): Recall accuracy vs phoebe ground truth
- Aletheia (Truth): Confidence calibration (was confidence boost justified?)
- Moira (Pattern): Which task patterns benefit from RAG vs pure reasoning
The Complete Knowledge Flow
VAULT
│
├─ Extract candidates
│
▼
STAGING (quarantine)
│
├─ Policy Tier 1: Syntax ──▶ REJECT ──▶ Log failure
├─ Policy Tier 2: Semantic ──▶ REJECT ──▶ Revise
├─ Policy Tier 3: Topology ──▶ REJECT ──▶ Flag risk
└─ Policy Tier 4+: Utility ──▶ PASS
│
▼
PROMOTE to RAG
│
├─ Status: HIDDEN (available but unused)
│
┌───────────┘
│
│ Young Nyx retrieves term
│
▼
Status: DISCOVERED (mark first access)
│
├─ Track usage in decision_trails
│
┌───────────┴────────────┐
│ │
Used successfully Used unsuccessfully
│ │
▼ ▼
Increase confidence Decrease confidence
│
│ (10+ successful uses)
│
▼
FLAG for training extraction
│
▼
LoRA training (weighted by utility_score)
│
▼
Validation WITHOUT RAG
│
├─ SUCCESS ──▶ Status: INTERNALIZED (clear from RAG)
│
└─ FAIL ──▶ Restore to RAG, retry cycle
Quality Gates Prevent
- Garbage in RAG - staging area catches malformed entries
- Topology corruption - DriftProbe-lite policies block dangerous terms
- Useless bloat - utility policies remove low-value entries
- Premature training - only high-utility terms get flagged
- Hidden knowledge waste - track what's available but never used (curriculum gap)
Policy Evolution Triggers
As Young Nyx grows, unlock stricter policies:
| Trigger | New Policy Unlocked |
|---|---|
| 100 successful RAG retrievals | Semantic quality checks |
| First LoRA training run | Topology safety (DriftProbe-lite) |
| 1000 decision_trails logged | Utility validation (help rate > 60%) |
| First INTERNALIZED term | Cross-reference consistency |
| 10 INTERNALIZED terms | Cost-effectiveness (ROI > threshold) |
Progressive difficulty: The bar for entering RAG rises as Young Nyx becomes more capable. Early: anything valid. Later: must prove utility.
Lifeforce Connection
The RAG→Train→Validate cycle has economic cost:
| Action | Lifeforce Cost |
|---|---|
| RAG lookup | Low (just retrieval) |
| Training run | High (compute intensive) |
| Validation | Medium (inference) |
| Failed cycle | Lost V (training didn't take) |
| Successful internalization | +V reward (she grew) |
Incentive alignment: Successful learning is rewarded. Failed training is costly. This naturally optimizes for high-quality training data extraction.
What This Prevents
- RAG bloat - entries clear after successful training
- Crutch dependency - scaffold comes off, proven by validation
- False confidence - can't claim to "know" what you only look up
- Training on noise - only validated successes get flagged
- Identity confusion - core architecture in weights, not retrieval
Design Principles
- RAG is temporary - feeding window, not permanent store
- Training is the goal - RAG success triggers training, not satisfaction
- Validation is double - with RAG, then without
- Clear after learning - scaffold must come off to prove growth
- Episodic stays external - not everything needs to be in weights
- Self-cleaning - the system doesn't accumulate cruft
The Analogy
Learning to ride a bike:
Training wheels ON (RAG feeding)
│
▼
Can ride with training wheels (validation 1)
│
▼
Training wheels OFF (RAG cleared)
│
▼
Can still ride? (validation 2)
│
├── NO ──▶ Put wheels back, practice more
│
└── YES ──▶ She can ride. Wheels stored, not needed.
You don't RAG your ability to balance. Once you can ride, you can ride.
She doesn't just retrieve. She learns. And we can prove it.
Created: 2025-12-05 Session: Partnership dialogue (dafit + Chrysalis) Status: Core architectural concept