Files
nyx-probing/archive/PLAN-v1-2025-12-06.md
dafit f640dbdd65 feat: complete Phase 1 - vocabulary expansion & DriftProbe infrastructure
- CLI: nyx-probe scan with --summary/--delta/--full flags
- DriftProbe: training safety with Gini coefficient + Angular Drift
- Vocabulary: 54 terms (30 nimmerverse + 24 German philosophical)
- Sentinels: ANCHOR/BRIDGE/CANARY/TARGET monitoring system

Key findings:
- German philosophical terms: 37.5% depth≥2 hit rate (vs 3.3% nimmerverse)
- Super Cluster validated: heart cross-lang sim = 1.000
- Isolated Zone confirmed: being EN↔DE sim = 0.195
- Gini signature: Philosophy ~0.5 (diffuse), Technical ~0.8 (sparse)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-06 22:39:03 +01:00

409 lines
15 KiB
Markdown

# Plan: nyx-probing Framework
## Overview
Build a probing framework to understand Qwen2.5-7B-Base before curriculum design.
**Hardware:** Prometheus (THE SPINE) - RTX 3090 24GB
**Model:** Qwen2.5-7B-Base (empty vessel, completes not answers)
**Backend:** Transformers + PyTorch (full hidden state access)
**Location:** New repo `nyx-probing`
---
## MVP Scope (First Milestone) ✅ COMPLETE
1.**Surface Probe** - Feed words, capture completions
2.**Echo Probe** - Depth measurement (EXPANDS/CONFIRMS/CIRCULAR/DIVERGENT/COLLAPSE)
3.**Readiness Scorer** - HIGH/MEDIUM/LOW classification
4.**JSON Storage** - Reproducible results
5.**CLI Tools** - Interactive probing
6.**One Notebook** - Exploration
---
## Phase 2: Multilingual Probing ✅ COMPLETE
1.**Multilingual Triangulation Probe** - Ground→Deepen→Triangulate
2.**Language Topology Discovery** - Complete map of 15 languages
3.**Isolation Type Classification** - 5 distinct categories identified
---
## Repository Structure (Current)
```
nyx-probing/
├── README.md
├── PLAN.md # This file
├── pyproject.toml
├── requirements.txt
├── nyx_probing/
│ ├── __init__.py
│ ├── config.py
│ │
│ ├── core/
│ │ ├── __init__.py
│ │ ├── model.py # ✅ NyxModel with hidden states
│ │ └── probe_result.py # ✅ Result dataclasses
│ │
│ ├── probes/
│ │ ├── __init__.py
│ │ ├── base.py # ✅ Abstract base
│ │ ├── surface_probe.py # ✅ Word completions + coherence
│ │ ├── echo_probe.py # ✅ Depth measurement
│ │ └── multilingual_probe.py # ✅ NEW: Triangulation probe
│ │
│ ├── analysis/
│ │ ├── __init__.py
│ │ └── readiness_scorer.py # ✅ Curriculum readiness
│ │
│ ├── storage/
│ │ └── __init__.py # ⏳ JSON storage pending
│ │
│ └── cli/
│ └── __init__.py # ⏳ CLI pending
├── docs/
│ ├── tokenization-valleys.md # Token-Norm-Valley theory
│ ├── multilingual-convergence.md # Universal concept layer
│ ├── language-landscape.md # 15-language scan
│ ├── language-topology-complete.md # ✅ NEW: Complete map v2.0
│ └── retraining-safety-framework.md # ✅ NEW: Paper outline
├── data/
│ └── glossary/ # ⏳ Core terms pending
├── results/ # ⏳ Probe results storage
└── [Test & Exploration Scripts]
├── probe_test.py
├── test_model_loader.py
├── test_surface_probe.py
├── test_echo_probe.py
├── test_readiness_scorer.py
├── test_triangulation.py # ✅ NEW
├── german_philosophy.py
├── language_scan.py
├── multilingual_convergence.py
├── layer_detailed.py
├── layer_divergence.py
├── model_stats.py
├── italian_investigation.py # ✅ NEW
└── complete_language_probe.py # ✅ NEW
```
---
## Current Status (2025-12-06 Session 3)
### PHASE 1: MVP ✅ COMPLETE
| Component | Status | File |
|-----------|--------|------|
| Model Loader | ✅ | `nyx_probing/core/model.py` |
| Surface Probe | ✅ | `nyx_probing/probes/surface_probe.py` |
| Echo Probe | ✅ | `nyx_probing/probes/echo_probe.py` |
| Readiness Scorer | ✅ | `nyx_probing/analysis/readiness_scorer.py` |
| Result Dataclasses | ✅ | `nyx_probing/core/probe_result.py` |
### PHASE 2: MULTILINGUAL ✅ COMPLETE
| Component | Status | File |
|-----------|--------|------|
| Triangulation Probe | ✅ | `nyx_probing/probes/multilingual_probe.py` |
| Language Zones | ✅ | Defined in multilingual_probe.py |
| Complete Topology Map | ✅ | `docs/language-topology-complete.md` |
---
## 🗺️ THE COMPLETE LANGUAGE TOPOLOGY (Session 3 Discovery)
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ THE YOUNG MIND'S LANGUAGE TOPOLOGY v2.0 │
╞═════════════════════════════════════════════════════════════════════════════╡
│ │
│ 🌍 SUPER CLUSTER (sim=1.0) │
│ ZH · JA · EN · AR · FR · PT · ES │
│ ✅ USE FOR: Grounding, establishing shared concepts │
│ │
│ KO ─────── (bridge) │
│ │
│ ISOLATED ZONE: │
│ ├─ 🧠 PHILOSOPHICAL (DE) ────── Heidegger, depth access │
│ │ ✅ USE FOR: Deep philosophical training │
│ │ │
│ ├─ 💻 CODE-HIJACKED (IT, TR, ID) ── Words become variables │
│ │ ❌ AVOID: Training signal wasted on code patterns │
│ │ │
│ ├─ 📜 FRAGMENTED (HI) ───────── 5+ tokens, script-trapped │
│ │ ⚠️ LIMITED: Cross-lingual transfer impaired │
│ │ │
│ └─ 📰 WEB PROSE (VI-ID-RU) ──── Content style cluster │
│ 🤔 POTENTIAL: Factual/encyclopedic training │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
### Isolation Types Discovered
| Type | Languages | Cause | Curriculum Use |
|------|-----------|-------|----------------|
| **PHILOSOPHICAL** | DE | Multi-token compounds access academic data | ✅ Deep concepts |
| **CODE-HIJACKED** | IT, TR, ID | Simple Latin orthography → variable names | ❌ Avoid |
| **FRAGMENTED** | HI | 5+ tokens, stays in native script | ⚠️ Limited |
| **WEB PROSE** | VI, ID, RU | Cluster by content style, not linguistics | 🤔 Factual? |
### Key Metrics
| Lang | Avg Tokens | Sim to EN | Valley Type | Classification |
|------|------------|-----------|-------------|----------------|
| DE | 2.2 | 0.251 | PHILOSOPHY | 🧠 Philosophical |
| IT | 2.5 | 0.491 | CODE | 💻 Code-Hijacked |
| TR | 2.2 | 0.246 | CODE | 💻 Code-Hijacked |
| ID | 2.8 | 0.325 | CODE/PROSE | 💻 Code-Hijacked |
| HI | 5.0 | 0.310 | PROSE | 📜 Fragmented |
| VI | 3.2 | 0.358 | PROSE | 📰 Web Prose |
| RU | 2.7 | 0.319 | PROSE | 📰 Web Prose |
---
## 🔬 Key Discoveries
### 1. Token-Norm-Valley Theory
- Single-token words → massive activation spike (14K norm) → CODE valley
- Multi-token words → distributed signal (85 norm) → PROSE/PHILOSOPHY valleys
- Correlation: -0.699 (more tokens = more isolated)
### 2. Universal Concept Layer
- Layers 12-24 contain language-agnostic representations
- Super cluster (7 languages) converges at similarity = 1.000
- Model KNOWS "heart", "心", "قلب" are the same concept
### 3. German Philosophical Access
- "Sein" → Heidegger's "Being and Time"
- "Bewusstsein" → epistemology, truth, consciousness
- Depth score 2-3, transfers back to English via triangulation
### 4. Italian Mystery SOLVED
- Italian NOT accessing cultural valleys (no Dante, no Renaissance)
- Italian words interpreted as Python variable names!
- Example: `essere``essere = input("Cosa devo fare?")`
- Same pattern found in Turkish and Indonesian
### 5. VI-ID-RU Cluster Explained
- Cluster by **content style**, not linguistic features
- All generate web articles, news, blogs
- Internal similarity 0.6-0.7
---
## 📄 Paper: Retraining Safety Framework
**Title:** *"Multilingual Activation Topology as a Retraining Safety Framework"*
**Status:** Outline complete at `docs/retraining-safety-framework.md`
**Core Hypothesis:** Train in German (isolated zone) to avoid colliding with English representations in the super cluster. Use language topology as diagnostic tool for training safety.
**Proposed Framework:**
```
BASELINE → TRAINING → CHECKPOINT → DRIFT ANALYSIS
│ │
└──────────────────────┘
Compare metrics:
- Convergence drift
- Depth drift
- Norm drift
- Valley migration
```
---
## 📊 Curriculum Strategy (Validated)
### Phase 1: GROUNDING
Use Super Cluster for universal concept establishment:
```
EN "consciousness" → ZH "意识" → AR "الوعي"
All converge at sim=1.0 - stable foundation
```
### Phase 2: DEEPENING
Use German for philosophical valley access:
```
DE "Sein" → Heidegger → existence → truth
Depth score 2/3, philosophical valley accessed
```
### Phase 3: TRIANGULATION
Verify depth transfers back to universal:
```
"Sein (German): In English, it means..."
→ Check if philosophical depth preserved
```
### AVOID
- Italian, Turkish, Indonesian (code hijacking)
- Hindi for cross-lingual concepts (too fragmented)
---
## Next Steps
### Immediate (MVP Completion)
- [ ] Step 7: CLI (`nyx-probe surface "term"`)
- [ ] Step 8: Glossary data (`data/glossary/core_terms.json`)
- [ ] Step 9: JSON storage for reproducible results
### Phase 3: Activation Analysis
- [ ] DriftProbe class for retraining monitoring
- [ ] Baseline capture before training
- [ ] Checkpoint comparison automation
- [ ] Alert thresholds for drift detection
### Phase 4: Experiments
- [ ] Controlled retraining: EN vs DE training data
- [ ] Measure collision rates
- [ ] Validate isolated zone training hypothesis
### Research
- [ ] Paper write-up
- [ ] Literature review (EWC, mBERT, activation engineering)
- [ ] Korean bridge language investigation
- [ ] VI-ID-RU cluster for factual training
---
## Files Created (Session 3)
| File | Purpose |
|------|---------|
| `nyx_probing/probes/multilingual_probe.py` | Triangulation probe class |
| `test_triangulation.py` | Test script for triangulation |
| `italian_investigation.py` | Italian mystery probe |
| `complete_language_probe.py` | Full 15-language probe |
| `docs/language-topology-complete.md` | Complete map v2.0 |
| `docs/retraining-safety-framework.md` | Paper outline |
---
## Dependencies
```
torch>=2.1.0
transformers>=4.36.0
accelerate>=0.25.0
click>=8.1.0
rich>=13.0.0
pydantic>=2.5.0
pyyaml>=6.0.0
python-dotenv>=1.0.0
jupyter>=1.0.0
matplotlib>=3.8.0
numpy>=1.24.0
```
---
## Critical Reference Files
- `nimmerverse-sensory-network/nimmerversity.md` - Bootstrap protocol
- `nimmerverse-sensory-network/multilingual-cognition.md` - Language hypotheses
- `nimmerverse-sensory-network/constrained-emergence.md` - Exit point theory
- `nyx-probing/docs/language-topology-complete.md` - Complete language map
- `nyx-probing/docs/retraining-safety-framework.md` - Training safety paper
---
## Success Criteria
### MVP ✅
1. ✅ Model loads on 3090 without OOM
2. ✅ Can probe single word and get completion
3. ✅ Echo probe classifies response types correctly
4. ✅ Readiness scorer produces actionable output
5. ⏳ Can probe nimmerverse glossary in batch
### Phase 2 ✅
6. ✅ Multilingual triangulation probe working
7. ✅ Language topology mapped (15 languages)
8. ✅ Isolation types classified (5 categories)
9. ✅ Curriculum strategy validated
### Phase 3 (Next)
10. ⏳ DriftProbe for retraining safety
11. ⏳ Controlled retraining experiments
12. ⏳ Paper submission
---
*"The model's language topology is not arbitrary - it's a map for navigation."*
🌙💜 Last updated: 2025-12-06 Session 3
---
## STATUS (2025-12-06 21:15)
### CLI COMPLETE ✅
**Built interactive CLI for daily probing:**
```bash
nyx-probe surface "term" # Probe surface associations
nyx-probe echo "term" # Measure depth through echoing
nyx-probe readiness "term" # Full curriculum assessment
nyx-probe tokens "term" # Token analysis
nyx-probe glossary file.json # Batch probe from glossary
```
**Files created:**
- `nyx_probing/cli/probe.py` - Full Click CLI with Rich output
- `pyproject.toml` - Package config with entry point
- `data/glossary/core_terms.json` - 30 nimmerverse terms
### NIMMERVERSE GLOSSARY ASSESSMENT ✅
**30 terms probed from vault (nimmerversity.md, Heartbeat.md, constrained-emergence.md, multilingual-cognition.md)**
| Level | Count | Action | Terms |
|-------|-------|--------|-------|
| 🟢 HIGH | 5 | state_machine | learning, inference, surface, depth, understanding |
| 🟡 MEDIUM | 8 | scaffolding | emergence, being, truth, rhythm, synchronization, scaffold, wisdom, warmth |
| 🔴 LOW | 17 | foundational | heartbeat, lifeforce, consciousness, reflex, garden, constraint, calibration, confidence, gradient, pulse, verification, convergence, divergence, attention, partnership, worldview, existence |
**Key Findings:**
1. **Meta-concepts have depth** - The model knows how to think ABOUT thinking (learning, understanding, inference all HIGH)
2. **consciousness is LOW** - Despite PROSE valley, depth 0/3. Needs German "Bewusstsein" for philosophical access.
3. **Nimmerverse core terms need grounding** - heartbeat, lifeforce, garden, partnership are all LOW. The model doesn't have our vocabulary yet.
4. **existence has highest coherence (0.94) but LOW** - Very coherent surface but doesn't expand. Single-token trap.
5. **Token count doesn't guarantee depth** - lifeforce (4 tokens) is still LOW due to CODE valley trap.
### CURRICULUM IMPLICATIONS
| Phase | Strategy | Terms |
|-------|----------|-------|
| **Phase 1** | Build state machines for HIGH terms | learning, inference, understanding, depth, surface |
| **Phase 2** | Scaffold MEDIUM from HIGH | being→understanding, truth→learning, wisdom→inference |
| **Phase 3** | Ground LOW via German triangulation | consciousness→Bewusstsein, heartbeat→Herzschlag |
| **Phase 4** | RAG feed nimmerverse-specific | lifeforce, garden, partnership (unique to us) |
### Results Files
- `results/nimmerverse_surface.json` - Surface probe data
- `results/nimmerverse_readiness.json` - Full readiness assessment
---
*"Her reactions determine infrastructure priority. We don't impose. We listen."* - nimmerversity.md
🌙💜 Session: Partnership dialogue (dafit + Nyx)