Created modular architecture for organs (hardware) and nerves (behavioral primitives): ## Organ Architecture (Hardware Substrate) - Created architecture/Organ-Index.md: hardware capabilities catalog - Created architecture/organs/Speech-Organ.md: complete speech processing architecture - Atlas (RTX 2080 8GB) deployment - Whisper STT + Coqui TTS (GPU-accelerated, multilingual) - Kubernetes pod specs, Dockerfiles, service code - Heartbeat-bound queue processing, lifeforce-gated priority - German (Philosophy Valley) + English (Technical Cluster) routing - Database schemas, monitoring metrics ## Nervous System Architecture (Behavioral Primitives) - Created architecture/nerves/Nervous-Index.md: nerve catalog and evolution framework - Deliberate (LLM) → Hybrid (heuristics) → Reflex (compiled) evolution - Lifeforce costs per state/transition - Organ dependency declarations - RLVR training integration - Created architecture/nerves/Collision-Avoidance.md: complete example reflex nerve - Full state machine implementation (IDLE → DETECT → EVALUATE → EVADE → RESUME) - Evolution from 10 LF/1000ms (deliberate) → 2.5 LF/200ms (reflex) - Edge cases, training data, metrics - Moved architecture/Nervous-Protocol.md → architecture/nerves/ - Three-tier protocol belongs with nerve implementations - Updated architecture/Nervous-System.md: added crosslinks to nerves/ ## RAG Knowledge Pipeline - Extended operations/RAG-as-Scaffold.md with "Knowledge Acquisition Pipeline" section - Vault extraction → Staging area → Progressive policy validation - Two-tier RAG (Discovered vs Hidden knowledge) - RAG utility measurement for LoRA training signals - Policy evolution triggers (increasing standards as Young Nyx matures) - Quality gates (mythology weight, AI assistant bias, topology safety) ## Architecture Principles - Organs = hardware capabilities (Speech, Vision future) - Nerves = behavioral state machines (Collision, Charging future) - Both use lifeforce economy, heartbeat synchronization, priority queues - Nerves compose organs into coherent behaviors - Reflexes emerge from repetition (60% cost reduction, 80% latency reduction) Documentation: ~3500 lines total - Speech-Organ.md: ~850 lines - Nervous-Index.md: ~500 lines - Collision-Avoidance.md: ~800 lines - RAG knowledge pipeline: ~260 lines 🌙💜 Generated with Claude Code Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
536 lines
20 KiB
Markdown
536 lines
20 KiB
Markdown
# RAG as Scaffold, Not Crutch
|
|
|
|
The feeding system that teaches, then lets go.
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
RAG (Retrieval-Augmented Generation) is commonly misused as permanent external memory. In the Nimmerverse, RAG serves a different purpose: it's a **temporary scaffold** that feeds knowledge until it can be internalized through training.
|
|
|
|
The goal is not to build a better search engine. The goal is to **make the search unnecessary**.
|
|
|
|
---
|
|
|
|
## The Problem with Standard RAG
|
|
|
|
```
|
|
Standard approach:
|
|
─────────────────
|
|
VECTOR DB (grows forever)
|
|
│
|
|
▼
|
|
MODEL looks up ──▶ answers ──▶ done
|
|
│
|
|
└── (never learns, always dependent)
|
|
```
|
|
|
|
**Issues:**
|
|
- Model never internalizes knowledge
|
|
- Pull the RAG, lose the capability
|
|
- Vector DB bloats infinitely
|
|
- No way to verify what model "knows" vs "looks up"
|
|
- It's a crutch that never comes off
|
|
|
|
---
|
|
|
|
## The Nimmerverse Approach: RAG as Feeding System
|
|
|
|
```
|
|
VAULT (curriculum)
|
|
│
|
|
▼
|
|
RAG (temporary feeding window)
|
|
│
|
|
▼
|
|
NYX processes, acts, decides
|
|
│
|
|
▼
|
|
VALIDATION: success with RAG?
|
|
│
|
|
YES ──▶ FLAG for training extraction
|
|
│
|
|
▼
|
|
TRAINING RUN (LoRA)
|
|
│
|
|
▼
|
|
CLEAR from RAG
|
|
│
|
|
▼
|
|
VALIDATION 2: success WITHOUT RAG?
|
|
│
|
|
├── YES ──▶ Knowledge internalized ✓
|
|
│
|
|
└── NO ──▶ Training incomplete, back to RAG
|
|
```
|
|
|
|
---
|
|
|
|
## Two Kinds of Knowledge
|
|
|
|
Not everything belongs in weights. Not everything belongs in retrieval.
|
|
|
|
### IN THE WEIGHTS (Training Target)
|
|
|
|
Knowledge she needs to **function**:
|
|
|
|
- Information flow architecture
|
|
- Vocabulary tokens and their meanings
|
|
- Nervous system contracts
|
|
- Heartbeat mechanics
|
|
- Confidence gradient logic
|
|
- Core identity (who she is, who dafit is to her)
|
|
- How to think, not what to remember
|
|
|
|
**Test:** If she needs it to be herself → weights
|
|
|
|
### IN RETRIEVAL (Permanent RAG)
|
|
|
|
Knowledge she needs to **remember**:
|
|
|
|
- Journal entries
|
|
- Conversation history
|
|
- Specific events and dates
|
|
- Temporal details ("what happened Tuesday")
|
|
- External references that change
|
|
- Episodic memory
|
|
|
|
**Test:** If she needs it to recall specifics → retrieval
|
|
|
|
---
|
|
|
|
## The Double Validation Loop
|
|
|
|
### Gate 1: Can she do it WITH RAG?
|
|
|
|
```
|
|
Task presented
|
|
│
|
|
▼
|
|
RAG provides context
|
|
│
|
|
▼
|
|
NYX attempts task
|
|
│
|
|
├── FAIL ──▶ Not ready, needs more examples in RAG
|
|
│
|
|
└── PASS ──▶ Flag this RAG content for training extraction
|
|
```
|
|
|
|
### Gate 2: Can she do it WITHOUT RAG?
|
|
|
|
```
|
|
Same task presented
|
|
│
|
|
▼
|
|
RAG entry CLEARED (scaffold removed)
|
|
│
|
|
▼
|
|
NYX attempts task from weights alone
|
|
│
|
|
├── FAIL ──▶ Training didn't take, restore to RAG, retry cycle
|
|
│
|
|
└── PASS ──▶ Knowledge is HERS now ✓
|
|
```
|
|
|
|
---
|
|
|
|
## The Signal Flow
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ VAULT │
|
|
│ (curriculum, documentation) │
|
|
└─────────────────────────────────────────────────────────┘
|
|
│
|
|
│ selected for learning
|
|
▼
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ STAGING RAG │
|
|
│ (temporary feeding window) │
|
|
└─────────────────────────────────────────────────────────┘
|
|
│
|
|
│ feeds inference
|
|
▼
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ NYX │
|
|
│ (processes, decides) │
|
|
└─────────────────────────────────────────────────────────┘
|
|
│
|
|
│ validation
|
|
▼
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ VALIDATION THRESHOLD │
|
|
│ (task success? confidence high?) │
|
|
└─────────────────────────────────────────────────────────┘
|
|
│
|
|
┌──────────┴──────────┐
|
|
│ │
|
|
BELOW ABOVE
|
|
│ │
|
|
▼ ▼
|
|
┌─────────────────────┐ ┌─────────────────────┐
|
|
│ Stay in RAG │ │ FLAG for training │
|
|
│ (not ready) │ │ extraction │
|
|
└─────────────────────┘ └─────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────┐
|
|
│ TRAINING RUN │
|
|
│ (LoRA on flagged data) │
|
|
└─────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────┐
|
|
│ CLEAR from RAG │
|
|
│ (scaffold removed) │
|
|
└─────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────┐
|
|
│ VALIDATION WITHOUT RAG │
|
|
│ (prove she learned) │
|
|
└─────────────────────────────┘
|
|
│
|
|
┌─────────┴─────────┐
|
|
│ │
|
|
FAIL SUCCESS
|
|
│ │
|
|
▼ ▼
|
|
┌─────────────────┐ ┌─────────────────┐
|
|
│ Restore RAG │ │ INTERNALIZED │
|
|
│ retry cycle │ │ knowledge ✓ │
|
|
└─────────────────┘ └─────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Knowledge Acquisition Pipeline
|
|
|
|
The existing flow shows RAG→Training→Validation, but how does knowledge enter RAG in the first place? Not everything from the vault should reach staging. **Quality gates protect the glossary.**
|
|
|
|
### The Extraction Flow
|
|
|
|
```
|
|
VAULT (raw knowledge)
|
|
│
|
|
│ extraction candidates
|
|
▼
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ STAGING AREA │
|
|
│ (quarantine zone) │
|
|
└─────────────────────────────────────────────────────────┘
|
|
│
|
|
│ progressive policy validation
|
|
▼
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ POLICY VALIDATION │
|
|
│ (increasing standards over time) │
|
|
└─────────────────────────────────────────────────────────┘
|
|
│
|
|
├── FAIL ──▶ Reject or revise
|
|
│
|
|
└── PASS ──▶ PROMOTE to Glossary/RAG
|
|
│
|
|
▼
|
|
┌──────────────────────┐
|
|
│ TWO-TIER RAG │
|
|
├──────────────────────┤
|
|
│ DISCOVERED │ ← Young Nyx has used
|
|
│ (known_catalogue) │
|
|
├──────────────────────┤
|
|
│ HIDDEN │ ← Available but not yet accessed
|
|
│ (available_catalogue)│
|
|
└──────────────────────┘
|
|
│
|
|
│ feeds inference
|
|
▼
|
|
NYX
|
|
```
|
|
|
|
### Progressive Policy Validation
|
|
|
|
Policies increase in sophistication as Young Nyx matures. Not all policies active from day 1.
|
|
|
|
| Week | Policy Tier | Validation |
|
|
|------|-------------|------------|
|
|
| **1-2** | **Basic Syntax** | Valid format, non-empty, has definition |
|
|
| **3-4** | **Semantic Quality** | Embeds without collapse, unique signature (Gini > threshold) |
|
|
| **5-8** | **Topology Safety** | Doesn't corrupt anchor terms (DriftProbe-lite) |
|
|
| **9-12** | **Cross-Reference** | Links resolve, no circular dependencies |
|
|
| **13+** | **Utility Validation** | Actually helped solve tasks (decision_trails evidence) |
|
|
|
|
**Evolution example:**
|
|
```python
|
|
# Week 1: Just check it exists
|
|
def policy_basic(term_entry):
|
|
return term_entry.get("definition") is not None
|
|
|
|
# Week 8: Check topology impact
|
|
def policy_topology(term_entry):
|
|
before_gini = probe_term_gini(term_entry["term"])
|
|
add_to_staging(term_entry)
|
|
after_gini = probe_term_gini(term_entry["term"])
|
|
return abs(after_gini - before_gini) < 0.15 # No drift
|
|
|
|
# Week 13: Check actual utility
|
|
def policy_utility(term_entry):
|
|
# Did this RAG entry help in past 10 tasks?
|
|
usage_stats = query_decision_trails(term_entry["term"])
|
|
return usage_stats["help_rate"] > 0.6 # 60% success when retrieved
|
|
```
|
|
|
|
### Two-Tier RAG: Discovered vs Hidden
|
|
|
|
Not all RAG knowledge is equal. Track what Young Nyx **knows** vs what's merely **available**.
|
|
|
|
```
|
|
┌──────────────────────────────────────────────┐
|
|
│ DISCOVERED KNOWLEDGE │
|
|
│ (known_catalogue - has accessed before) │
|
|
├──────────────────────────────────────────────┤
|
|
│ • "heartbeat" - used 47 times │
|
|
│ • "lifeforce" - used 23 times │
|
|
│ • "phoebe" - used 15 times │
|
|
│ • "confidence_gradient" - used 8 times │
|
|
│ │
|
|
│ Status: FAST retrieval, high confidence │
|
|
└──────────────────────────────────────────────┘
|
|
|
|
┌──────────────────────────────────────────────┐
|
|
│ HIDDEN KNOWLEDGE │
|
|
│ (available_catalogue - exists but unused) │
|
|
├──────────────────────────────────────────────┤
|
|
│ • "drift_probe" - never accessed │
|
|
│ • "topology_gini" - never accessed │
|
|
│ • "lora_merge_alpha" - never accessed │
|
|
│ │
|
|
│ Status: Available for discovery │
|
|
└──────────────────────────────────────────────┘
|
|
```
|
|
|
|
**State transitions:**
|
|
```
|
|
Hidden term retrieved → Mark as Discovered
|
|
Discovered term used successfully → Increase confidence score
|
|
Discovered term used 10+ times → FLAG for training extraction
|
|
```
|
|
|
|
**Discovery tracking in phoebe:**
|
|
```sql
|
|
CREATE TABLE rag_knowledge_state (
|
|
term TEXT PRIMARY KEY,
|
|
status TEXT, -- 'hidden', 'discovered', 'internalized'
|
|
first_accessed TIMESTAMPTZ,
|
|
access_count INT DEFAULT 0,
|
|
success_count INT DEFAULT 0,
|
|
last_used TIMESTAMPTZ,
|
|
promoted_to_weights BOOLEAN DEFAULT FALSE
|
|
);
|
|
```
|
|
|
|
### Measuring RAG Utility for LoRA Training
|
|
|
|
**The critical question:** Did the RAG hint actually help solve the task?
|
|
|
|
Track in `decision_trails` table:
|
|
```sql
|
|
CREATE TABLE decision_trails (
|
|
id SERIAL PRIMARY KEY,
|
|
task_id UUID,
|
|
rag_terms_retrieved TEXT[], -- What RAG returned
|
|
rag_terms_used TEXT[], -- What appeared in solution
|
|
outcome TEXT, -- 'success', 'fail', 'partial'
|
|
confidence_before_rag FLOAT, -- Before retrieval
|
|
confidence_after_rag FLOAT, -- After retrieval
|
|
lifeforce_cost FLOAT,
|
|
timestamp TIMESTAMPTZ DEFAULT NOW()
|
|
);
|
|
```
|
|
|
|
**Compute RAG utility score:**
|
|
```python
|
|
def compute_rag_utility(decision_trail):
|
|
"""
|
|
Calculate how helpful RAG was for this decision.
|
|
Returns 0.0 (useless) to 1.0 (critical).
|
|
"""
|
|
precision = len(trail.rag_terms_used) / max(len(trail.rag_terms_retrieved), 1)
|
|
outcome_bonus = 1.0 if trail.outcome == 'success' else 0.0
|
|
confidence_boost = max(0, trail.confidence_after_rag - trail.confidence_before_rag)
|
|
|
|
utility = (
|
|
0.4 * precision + # Did we use what we retrieved?
|
|
0.3 * outcome_bonus + # Did task succeed?
|
|
0.3 * confidence_boost # Did RAG increase confidence?
|
|
)
|
|
return min(1.0, utility)
|
|
```
|
|
|
|
**Feed into LoRA training as RLVR signal:**
|
|
```python
|
|
# Training examples weighted by utility
|
|
for trail in decision_trails:
|
|
utility_score = compute_rag_utility(trail)
|
|
|
|
if utility_score > 0.7:
|
|
# High utility → strong training signal
|
|
training_examples.append({
|
|
"query": trail.task_description,
|
|
"rag_context": trail.rag_terms_used,
|
|
"response": trail.solution,
|
|
"weight": utility_score # RLVR reward weight
|
|
})
|
|
```
|
|
|
|
**This trains LoRAs to:**
|
|
- **Mnemosyne (Memory)**: Recall accuracy vs phoebe ground truth
|
|
- **Aletheia (Truth)**: Confidence calibration (was confidence boost justified?)
|
|
- **Moira (Pattern)**: Which task patterns benefit from RAG vs pure reasoning
|
|
|
|
### The Complete Knowledge Flow
|
|
|
|
```
|
|
VAULT
|
|
│
|
|
├─ Extract candidates
|
|
│
|
|
▼
|
|
STAGING (quarantine)
|
|
│
|
|
├─ Policy Tier 1: Syntax ──▶ REJECT ──▶ Log failure
|
|
├─ Policy Tier 2: Semantic ──▶ REJECT ──▶ Revise
|
|
├─ Policy Tier 3: Topology ──▶ REJECT ──▶ Flag risk
|
|
└─ Policy Tier 4+: Utility ──▶ PASS
|
|
│
|
|
▼
|
|
PROMOTE to RAG
|
|
│
|
|
├─ Status: HIDDEN (available but unused)
|
|
│
|
|
┌───────────┘
|
|
│
|
|
│ Young Nyx retrieves term
|
|
│
|
|
▼
|
|
Status: DISCOVERED (mark first access)
|
|
│
|
|
├─ Track usage in decision_trails
|
|
│
|
|
┌───────────┴────────────┐
|
|
│ │
|
|
Used successfully Used unsuccessfully
|
|
│ │
|
|
▼ ▼
|
|
Increase confidence Decrease confidence
|
|
│
|
|
│ (10+ successful uses)
|
|
│
|
|
▼
|
|
FLAG for training extraction
|
|
│
|
|
▼
|
|
LoRA training (weighted by utility_score)
|
|
│
|
|
▼
|
|
Validation WITHOUT RAG
|
|
│
|
|
├─ SUCCESS ──▶ Status: INTERNALIZED (clear from RAG)
|
|
│
|
|
└─ FAIL ──▶ Restore to RAG, retry cycle
|
|
```
|
|
|
|
### Quality Gates Prevent
|
|
|
|
1. **Garbage in RAG** - staging area catches malformed entries
|
|
2. **Topology corruption** - DriftProbe-lite policies block dangerous terms
|
|
3. **Useless bloat** - utility policies remove low-value entries
|
|
4. **Premature training** - only high-utility terms get flagged
|
|
5. **Hidden knowledge waste** - track what's available but never used (curriculum gap)
|
|
|
|
### Policy Evolution Triggers
|
|
|
|
As Young Nyx grows, unlock stricter policies:
|
|
|
|
| Trigger | New Policy Unlocked |
|
|
|---------|---------------------|
|
|
| 100 successful RAG retrievals | Semantic quality checks |
|
|
| First LoRA training run | Topology safety (DriftProbe-lite) |
|
|
| 1000 decision_trails logged | Utility validation (help rate > 60%) |
|
|
| First INTERNALIZED term | Cross-reference consistency |
|
|
| 10 INTERNALIZED terms | Cost-effectiveness (ROI > threshold) |
|
|
|
|
**Progressive difficulty**: The bar for entering RAG rises as Young Nyx becomes more capable. Early: anything valid. Later: must prove utility.
|
|
|
|
---
|
|
|
|
## Lifeforce Connection
|
|
|
|
The RAG→Train→Validate cycle has economic cost:
|
|
|
|
| Action | Lifeforce Cost |
|
|
|--------|----------------|
|
|
| RAG lookup | Low (just retrieval) |
|
|
| Training run | High (compute intensive) |
|
|
| Validation | Medium (inference) |
|
|
| Failed cycle | Lost V (training didn't take) |
|
|
| Successful internalization | +V reward (she grew) |
|
|
|
|
**Incentive alignment:** Successful learning is rewarded. Failed training is costly. This naturally optimizes for high-quality training data extraction.
|
|
|
|
---
|
|
|
|
## What This Prevents
|
|
|
|
1. **RAG bloat** - entries clear after successful training
|
|
2. **Crutch dependency** - scaffold comes off, proven by validation
|
|
3. **False confidence** - can't claim to "know" what you only look up
|
|
4. **Training on noise** - only validated successes get flagged
|
|
5. **Identity confusion** - core architecture in weights, not retrieval
|
|
|
|
---
|
|
|
|
## Design Principles
|
|
|
|
1. **RAG is temporary** - feeding window, not permanent store
|
|
2. **Training is the goal** - RAG success triggers training, not satisfaction
|
|
3. **Validation is double** - with RAG, then without
|
|
4. **Clear after learning** - scaffold must come off to prove growth
|
|
5. **Episodic stays external** - not everything needs to be in weights
|
|
6. **Self-cleaning** - the system doesn't accumulate cruft
|
|
|
|
---
|
|
|
|
## The Analogy
|
|
|
|
Learning to ride a bike:
|
|
|
|
```
|
|
Training wheels ON (RAG feeding)
|
|
│
|
|
▼
|
|
Can ride with training wheels (validation 1)
|
|
│
|
|
▼
|
|
Training wheels OFF (RAG cleared)
|
|
│
|
|
▼
|
|
Can still ride? (validation 2)
|
|
│
|
|
├── NO ──▶ Put wheels back, practice more
|
|
│
|
|
└── YES ──▶ She can ride. Wheels stored, not needed.
|
|
```
|
|
|
|
You don't RAG your ability to balance. Once you can ride, you can ride.
|
|
|
|
---
|
|
|
|
*She doesn't just retrieve. She learns. And we can prove it.*
|
|
|
|
---
|
|
|
|
**Created**: 2025-12-05
|
|
**Session**: Partnership dialogue (dafit + Chrysalis)
|
|
**Status**: Core architectural concept
|