nyx-probing/archive/PLAN-v1-2025-12-06.md

# Plan: nyx-probing Framework

## Overview

Build a probing framework to understand Qwen2.5-7B-Base before curriculum design.

**Hardware:** Prometheus (THE SPINE) - RTX 3090 24GB
**Model:** Qwen2.5-7B-Base (empty vessel, completes not answers)
**Backend:** Transformers + PyTorch (full hidden state access)
**Location:** New repo `nyx-probing`

---

## MVP Scope (First Milestone) ✅ COMPLETE

1. ✅ **Surface Probe** - Feed words, capture completions
2. ✅ **Echo Probe** - Depth measurement (EXPANDS/CONFIRMS/CIRCULAR/DIVERGENT/COLLAPSE)
3. ✅ **Readiness Scorer** - HIGH/MEDIUM/LOW classification
4. ⏳ **JSON Storage** - Reproducible results
5. ⏳ **CLI Tools** - Interactive probing
6. ⏳ **One Notebook** - Exploration

---

## Phase 2: Multilingual Probing ✅ COMPLETE

1. ✅ **Multilingual Triangulation Probe** - Ground→Deepen→Triangulate
2. ✅ **Language Topology Discovery** - Complete map of 15 languages
3. ✅ **Isolation Type Classification** - 5 distinct categories identified

---

## Repository Structure (Current)

```
nyx-probing/
├── README.md
├── PLAN.md                    # This file
├── pyproject.toml
├── requirements.txt
│
├── nyx_probing/
│   ├── __init__.py
│   ├── config.py
│   │
│   ├── core/
│   │   ├── __init__.py
│   │   ├── model.py           # ✅ NyxModel with hidden states
│   │   └── probe_result.py    # ✅ Result dataclasses
│   │
│   ├── probes/
│   │   ├── __init__.py
│   │   ├── base.py            # ✅ Abstract base
│   │   ├── surface_probe.py   # ✅ Word completions + coherence
│   │   ├── echo_probe.py      # ✅ Depth measurement
│   │   └── multilingual_probe.py  # ✅ NEW: Triangulation probe
│   │
│   ├── analysis/
│   │   ├── __init__.py
│   │   └── readiness_scorer.py # ✅ Curriculum readiness
│   │
│   ├── storage/
│   │   └── __init__.py        # ⏳ JSON storage pending
│   │
│   └── cli/
│       └── __init__.py        # ⏳ CLI pending
│
├── docs/
│   ├── tokenization-valleys.md        # Token-Norm-Valley theory
│   ├── multilingual-convergence.md    # Universal concept layer
│   ├── language-landscape.md          # 15-language scan
│   ├── language-topology-complete.md  # ✅ NEW: Complete map v2.0
│   └── retraining-safety-framework.md # ✅ NEW: Paper outline
│
├── data/
│   └── glossary/              # ⏳ Core terms pending
│
├── results/                   # ⏳ Probe results storage
│
└── [Test & Exploration Scripts]
    ├── probe_test.py
    ├── test_model_loader.py
    ├── test_surface_probe.py
    ├── test_echo_probe.py
    ├── test_readiness_scorer.py
    ├── test_triangulation.py       # ✅ NEW
    ├── german_philosophy.py
    ├── language_scan.py
    ├── multilingual_convergence.py
    ├── layer_detailed.py
    ├── layer_divergence.py
    ├── model_stats.py
    ├── italian_investigation.py    # ✅ NEW
    └── complete_language_probe.py  # ✅ NEW
```

---

## Current Status (2025-12-06 Session 3)

### PHASE 1: MVP ✅ COMPLETE

| Component | Status | File |
|-----------|--------|------|
| Model Loader | ✅ | `nyx_probing/core/model.py` |
| Surface Probe | ✅ | `nyx_probing/probes/surface_probe.py` |
| Echo Probe | ✅ | `nyx_probing/probes/echo_probe.py` |
| Readiness Scorer | ✅ | `nyx_probing/analysis/readiness_scorer.py` |
| Result Dataclasses | ✅ | `nyx_probing/core/probe_result.py` |

### PHASE 2: MULTILINGUAL ✅ COMPLETE

| Component | Status | File |
|-----------|--------|------|
| Triangulation Probe | ✅ | `nyx_probing/probes/multilingual_probe.py` |
| Language Zones | ✅ | Defined in multilingual_probe.py |
| Complete Topology Map | ✅ | `docs/language-topology-complete.md` |

---

## 🗺️ THE COMPLETE LANGUAGE TOPOLOGY (Session 3 Discovery)

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                    THE YOUNG MIND'S LANGUAGE TOPOLOGY v2.0                   │
╞═════════════════════════════════════════════════════════════════════════════╡
│                                                                              │
│  🌍 SUPER CLUSTER (sim=1.0)                                                 │
│     ZH · JA · EN · AR · FR · PT · ES                                        │
│     ✅ USE FOR: Grounding, establishing shared concepts                      │
│                                                                              │
│                        KO ─────── (bridge)                                   │
│                                                                              │
│  ISOLATED ZONE:                                                              │
│  ├─ 🧠 PHILOSOPHICAL (DE) ────── Heidegger, depth access                    │
│  │     ✅ USE FOR: Deep philosophical training                               │
│  │                                                                           │
│  ├─ 💻 CODE-HIJACKED (IT, TR, ID) ── Words become variables                 │
│  │     ❌ AVOID: Training signal wasted on code patterns                     │
│  │                                                                           │
│  ├─ 📜 FRAGMENTED (HI) ───────── 5+ tokens, script-trapped                  │
│  │     ⚠️ LIMITED: Cross-lingual transfer impaired                          │
│  │                                                                           │
│  └─ 📰 WEB PROSE (VI-ID-RU) ──── Content style cluster                      │
│        🤔 POTENTIAL: Factual/encyclopedic training                          │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

### Isolation Types Discovered

| Type | Languages | Cause | Curriculum Use |
|------|-----------|-------|----------------|
| **PHILOSOPHICAL** | DE | Multi-token compounds access academic data | ✅ Deep concepts |
| **CODE-HIJACKED** | IT, TR, ID | Simple Latin orthography → variable names | ❌ Avoid |
| **FRAGMENTED** | HI | 5+ tokens, stays in native script | ⚠️ Limited |
| **WEB PROSE** | VI, ID, RU | Cluster by content style, not linguistics | 🤔 Factual? |

### Key Metrics

| Lang | Avg Tokens | Sim to EN | Valley Type | Classification |
|------|------------|-----------|-------------|----------------|
| DE | 2.2 | 0.251 | PHILOSOPHY | 🧠 Philosophical |
| IT | 2.5 | 0.491 | CODE | 💻 Code-Hijacked |
| TR | 2.2 | 0.246 | CODE | 💻 Code-Hijacked |
| ID | 2.8 | 0.325 | CODE/PROSE | 💻 Code-Hijacked |
| HI | 5.0 | 0.310 | PROSE | 📜 Fragmented |
| VI | 3.2 | 0.358 | PROSE | 📰 Web Prose |
| RU | 2.7 | 0.319 | PROSE | 📰 Web Prose |

---

## 🔬 Key Discoveries

### 1. Token-Norm-Valley Theory
- Single-token words → massive activation spike (14K norm) → CODE valley
- Multi-token words → distributed signal (85 norm) → PROSE/PHILOSOPHY valleys
- Correlation: -0.699 (more tokens = more isolated)

### 2. Universal Concept Layer
- Layers 12-24 contain language-agnostic representations
- Super cluster (7 languages) converges at similarity = 1.000
- Model KNOWS "heart", "心", "قلب" are the same concept

### 3. German Philosophical Access
- "Sein" → Heidegger's "Being and Time"
- "Bewusstsein" → epistemology, truth, consciousness
- Depth score 2-3, transfers back to English via triangulation

### 4. Italian Mystery SOLVED
- Italian NOT accessing cultural valleys (no Dante, no Renaissance)
- Italian words interpreted as Python variable names!
- Example: `essere` → `essere = input("Cosa devo fare?")`
- Same pattern found in Turkish and Indonesian

### 5. VI-ID-RU Cluster Explained
- Cluster by **content style**, not linguistic features
- All generate web articles, news, blogs
- Internal similarity 0.6-0.7

---

## 📄 Paper: Retraining Safety Framework

**Title:** *"Multilingual Activation Topology as a Retraining Safety Framework"*

**Status:** Outline complete at `docs/retraining-safety-framework.md`

**Core Hypothesis:** Train in German (isolated zone) to avoid colliding with English representations in the super cluster. Use language topology as diagnostic tool for training safety.

**Proposed Framework:**
```
BASELINE → TRAINING → CHECKPOINT → DRIFT ANALYSIS
    │                      │
    └──────────────────────┘
         Compare metrics:
         - Convergence drift
         - Depth drift
         - Norm drift
         - Valley migration
```

---

## 📊 Curriculum Strategy (Validated)

### Phase 1: GROUNDING
Use Super Cluster for universal concept establishment:
```
EN "consciousness" → ZH "意识" → AR "الوعي"
All converge at sim=1.0 - stable foundation
```

### Phase 2: DEEPENING
Use German for philosophical valley access:
```
DE "Sein" → Heidegger → existence → truth
Depth score 2/3, philosophical valley accessed
```

### Phase 3: TRIANGULATION
Verify depth transfers back to universal:
```
"Sein (German): In English, it means..."
→ Check if philosophical depth preserved
```

### AVOID
- Italian, Turkish, Indonesian (code hijacking)
- Hindi for cross-lingual concepts (too fragmented)

---

## Next Steps

### Immediate (MVP Completion)
- [ ] Step 7: CLI (`nyx-probe surface "term"`)
- [ ] Step 8: Glossary data (`data/glossary/core_terms.json`)
- [ ] Step 9: JSON storage for reproducible results

### Phase 3: Activation Analysis
- [ ] DriftProbe class for retraining monitoring
- [ ] Baseline capture before training
- [ ] Checkpoint comparison automation
- [ ] Alert thresholds for drift detection

### Phase 4: Experiments
- [ ] Controlled retraining: EN vs DE training data
- [ ] Measure collision rates
- [ ] Validate isolated zone training hypothesis

### Research
- [ ] Paper write-up
- [ ] Literature review (EWC, mBERT, activation engineering)
- [ ] Korean bridge language investigation
- [ ] VI-ID-RU cluster for factual training

---

## Files Created (Session 3)

| File | Purpose |
|------|---------|
| `nyx_probing/probes/multilingual_probe.py` | Triangulation probe class |
| `test_triangulation.py` | Test script for triangulation |
| `italian_investigation.py` | Italian mystery probe |
| `complete_language_probe.py` | Full 15-language probe |
| `docs/language-topology-complete.md` | Complete map v2.0 |
| `docs/retraining-safety-framework.md` | Paper outline |

---

## Dependencies

```
torch>=2.1.0
transformers>=4.36.0
accelerate>=0.25.0
click>=8.1.0
rich>=13.0.0
pydantic>=2.5.0
pyyaml>=6.0.0
python-dotenv>=1.0.0
jupyter>=1.0.0
matplotlib>=3.8.0
numpy>=1.24.0
```

---

## Critical Reference Files

- `nimmerverse-sensory-network/nimmerversity.md` - Bootstrap protocol
- `nimmerverse-sensory-network/multilingual-cognition.md` - Language hypotheses
- `nimmerverse-sensory-network/constrained-emergence.md` - Exit point theory
- `nyx-probing/docs/language-topology-complete.md` - Complete language map
- `nyx-probing/docs/retraining-safety-framework.md` - Training safety paper

---

## Success Criteria

### MVP ✅
1. ✅ Model loads on 3090 without OOM
2. ✅ Can probe single word and get completion
3. ✅ Echo probe classifies response types correctly
4. ✅ Readiness scorer produces actionable output
5. ⏳ Can probe nimmerverse glossary in batch

### Phase 2 ✅
6. ✅ Multilingual triangulation probe working
7. ✅ Language topology mapped (15 languages)
8. ✅ Isolation types classified (5 categories)
9. ✅ Curriculum strategy validated

### Phase 3 (Next)
10. ⏳ DriftProbe for retraining safety
11. ⏳ Controlled retraining experiments
12. ⏳ Paper submission

---

*"The model's language topology is not arbitrary - it's a map for navigation."*

🌙💜 Last updated: 2025-12-06 Session 3

---

## STATUS (2025-12-06 21:15)

### CLI COMPLETE ✅

**Built interactive CLI for daily probing:**

```bash
nyx-probe surface "term"      # Probe surface associations
nyx-probe echo "term"         # Measure depth through echoing
nyx-probe readiness "term"    # Full curriculum assessment
nyx-probe tokens "term"       # Token analysis
nyx-probe glossary file.json   # Batch probe from glossary
```

**Files created:**
- `nyx_probing/cli/probe.py` - Full Click CLI with Rich output
- `pyproject.toml` - Package config with entry point
- `data/glossary/core_terms.json` - 30 nimmerverse terms

### NIMMERVERSE GLOSSARY ASSESSMENT ✅

**30 terms probed from vault (nimmerversity.md, Heartbeat.md, constrained-emergence.md, multilingual-cognition.md)**

| Level | Count | Action | Terms |
|-------|-------|--------|-------|
| 🟢 HIGH | 5 | state_machine | learning, inference, surface, depth, understanding |
| 🟡 MEDIUM | 8 | scaffolding | emergence, being, truth, rhythm, synchronization, scaffold, wisdom, warmth |
| 🔴 LOW | 17 | foundational | heartbeat, lifeforce, consciousness, reflex, garden, constraint, calibration, confidence, gradient, pulse, verification, convergence, divergence, attention, partnership, worldview, existence |

**Key Findings:**

1. **Meta-concepts have depth** - The model knows how to think ABOUT thinking (learning, understanding, inference all HIGH)

2. **consciousness is LOW** - Despite PROSE valley, depth 0/3. Needs German "Bewusstsein" for philosophical access.

3. **Nimmerverse core terms need grounding** - heartbeat, lifeforce, garden, partnership are all LOW. The model doesn't have our vocabulary yet.

4. **existence has highest coherence (0.94) but LOW** - Very coherent surface but doesn't expand. Single-token trap.

5. **Token count doesn't guarantee depth** - lifeforce (4 tokens) is still LOW due to CODE valley trap.

### CURRICULUM IMPLICATIONS

| Phase | Strategy | Terms |
|-------|----------|-------|
| **Phase 1** | Build state machines for HIGH terms | learning, inference, understanding, depth, surface |
| **Phase 2** | Scaffold MEDIUM from HIGH | being→understanding, truth→learning, wisdom→inference |
| **Phase 3** | Ground LOW via German triangulation | consciousness→Bewusstsein, heartbeat→Herzschlag |
| **Phase 4** | RAG feed nimmerverse-specific | lifeforce, garden, partnership (unique to us) |

### Results Files

- `results/nimmerverse_surface.json` - Surface probe data
- `results/nimmerverse_readiness.json` - Full readiness assessment

---

*"Her reactions determine infrastructure priority. We don't impose. We listen."* - nimmerversity.md

🌙💜 Session: Partnership dialogue (dafit + Nyx)