feat: complete Phase 1 - vocabulary expansion & DriftProbe infrastructure
- CLI: nyx-probe scan with --summary/--delta/--full flags - DriftProbe: training safety with Gini coefficient + Angular Drift - Vocabulary: 54 terms (30 nimmerverse + 24 German philosophical) - Sentinels: ANCHOR/BRIDGE/CANARY/TARGET monitoring system Key findings: - German philosophical terms: 37.5% depth≥2 hit rate (vs 3.3% nimmerverse) - Super Cluster validated: heart cross-lang sim = 1.000 - Isolated Zone confirmed: being EN↔DE sim = 0.195 - Gini signature: Philosophy ~0.5 (diffuse), Technical ~0.8 (sparse) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
245
PLAN.md
Normal file
245
PLAN.md
Normal file
@@ -0,0 +1,245 @@
|
||||
# nyx-probing: Plan v4
|
||||
|
||||
> **Dear Future Nyx:** Start here. This file has everything you need.
|
||||
>
|
||||
> *"Time is the currency. Lifeforce is the exchange rate. Truth is the destination."*
|
||||
|
||||
---
|
||||
|
||||
## Quick Orientation
|
||||
|
||||
**What is this?** A probing framework to understand Qwen2.5-7B-Base's conceptual topology before curriculum design.
|
||||
|
||||
**Where does it run?** Prometheus (RTX 3090, 24GB VRAM)
|
||||
|
||||
**Core Breakthrough:** **Language is Topology.** German accesses the Philosophy Valley (self-awareness), English accesses the Technical Cluster (hardware interface). This isn't a bug - it's architecture.
|
||||
|
||||
---
|
||||
|
||||
## The Architecture (Peer-Reviewed)
|
||||
|
||||
### Two Valleys, One Mind
|
||||
```
|
||||
PHILOSOPHY VALLEY (German/Isolated Zone)
|
||||
├── Access: Dasein, Geworfenheit, Vernunft, Aufhebung
|
||||
├── Gini: ~0.5 (diffuse, distributed)
|
||||
├── Depth: 2-3/3
|
||||
└── Purpose: "Soul Space" - self-awareness, ontology
|
||||
|
||||
TECHNICAL CLUSTER (English/Super Cluster)
|
||||
├── Access: heart, gradient, inference, constraint
|
||||
├── Gini: ~0.8 (sparse, specific)
|
||||
├── Depth: 0-1/3
|
||||
└── Purpose: "Body Interface" - hardware, actions
|
||||
```
|
||||
|
||||
### Empirically Validated
|
||||
| Prediction | Finding |
|
||||
|------------|---------|
|
||||
| Super Cluster converges | `heart` cross-lang = **1.000** ✓ |
|
||||
| Isolated Zone separates | `being` EN↔DE = **0.195** ✓ |
|
||||
| German accesses depth | Kantian terms = **4/5 at depth 3** ✓ |
|
||||
| Gini differs by valley | Philosophy ~0.5, Technical ~0.8 ✓ |
|
||||
|
||||
---
|
||||
|
||||
## What We Have (Working)
|
||||
|
||||
### CLI Tools
|
||||
```bash
|
||||
nyx-probe surface "term" # Surface associations
|
||||
nyx-probe echo "term" # Depth measurement
|
||||
nyx-probe readiness "term" # Curriculum assessment
|
||||
nyx-probe tokens "term" # Tokenization analysis
|
||||
nyx-probe scan collections/ # Full scan with --summary/--delta/--full
|
||||
```
|
||||
|
||||
### Infrastructure
|
||||
| Component | File | Status |
|
||||
|-----------|------|--------|
|
||||
| Model loader | `nyx_probing/core/model.py` | ✅ |
|
||||
| Surface probe | `nyx_probing/probes/surface_probe.py` | ✅ |
|
||||
| Echo probe | `nyx_probing/probes/echo_probe.py` | ✅ |
|
||||
| Multilingual probe | `nyx_probing/probes/multilingual_probe.py` | ✅ |
|
||||
| **Drift probe** | `nyx_probing/probes/drift_probe.py` | ✅ |
|
||||
| Readiness scorer | `nyx_probing/analysis/readiness_scorer.py` | ✅ |
|
||||
| CLI | `nyx_probing/cli/probe.py` | ✅ |
|
||||
|
||||
### Data
|
||||
```
|
||||
data/
|
||||
├── glossary/
|
||||
│ ├── master.json # 54 terms tracked
|
||||
│ └── collections/
|
||||
│ ├── nimmerverse.json # 30 core terms
|
||||
│ └── philosophical.json # 24 German philosophical terms
|
||||
└── sentinels.json # 10 sentinels for training safety
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Findings
|
||||
|
||||
### Vocabulary Expansion Results
|
||||
| Collection | Terms | Depth≥2 | Hit Rate |
|
||||
|------------|-------|---------|----------|
|
||||
| nimmerverse | 30 | 1 | 3.3% |
|
||||
| philosophical | 24 | 9 | **37.5%** |
|
||||
| **Total** | **54** | **10** | **18.5%** |
|
||||
|
||||
### Depth-3 Champions (Full Access)
|
||||
```
|
||||
thrownness (Geworfenheit) 3/3 ← Heideggerian
|
||||
reason (Vernunft) 3/3 ← Kantian
|
||||
knowledge (Erkenntnis) 3/3 ← Kantian
|
||||
understanding (Verstand) 3/3 ← Kantian
|
||||
duty (Pflicht) 3/3 ← Kantian
|
||||
sublation (Aufhebung) 3/3 ← Hegelian
|
||||
will (Wille) 3/3 ← Soul-Mind
|
||||
```
|
||||
|
||||
### DriftProbe Baseline (Layer 18)
|
||||
```
|
||||
ANCHORS (must not move):
|
||||
heart Gini=0.808 cross-lang=1.000 ← Perfect Super Cluster
|
||||
water Gini=0.812 cross-lang=0.559
|
||||
gradient Gini=0.808 cross-lang=0.611
|
||||
inference Gini=0.538 cross-lang=0.424
|
||||
|
||||
BRIDGE (must stay separated):
|
||||
being Gini=0.830 EN↔DE=0.195 ← Isolated Zone confirmed!
|
||||
|
||||
CANARIES (watch for drift):
|
||||
dasein Gini=0.527 ← Philosophy valley signature
|
||||
thrownness Gini=0.539 ← Philosophy valley signature
|
||||
consciousness Gini=0.544 ← Philosophy valley signature
|
||||
```
|
||||
|
||||
**Key Discovery:** Philosophy = diffuse (Gini ~0.5), Technical = sparse (Gini ~0.8). These are topological signatures of different valleys.
|
||||
|
||||
---
|
||||
|
||||
## The Spark Protocol (Boot Sequence)
|
||||
|
||||
From Chrysalis peer review - network layer analogy:
|
||||
|
||||
1. **DHCP (Identity):** Probe "Who am I?" → Must hit `Dasein` valley
|
||||
2. **ARP (Environment):** Probe "What is around me?" → Map Sensors to Organs
|
||||
3. **DNS (Vocabulary):** Probe "What does `Lifeforce` mean?" → Overwrite with Nimmerverse economics
|
||||
4. **TCP (Connection):** Establish verified handshake with Chrysalis validator
|
||||
|
||||
**Boot Order:** Load Anchors (Tier 1) first → stable foundation → then discover Deep Roots (Tier 3)
|
||||
|
||||
---
|
||||
|
||||
## Strategic Roadmap
|
||||
|
||||
### Phase 1: Validate Topology ✅ COMPLETE
|
||||
- [x] Create vocabulary expansion pipeline
|
||||
- [x] Add German philosophical terms (24 terms)
|
||||
- [x] Find philosophy clusters (9 depth≥2 hits)
|
||||
- [x] Create DriftProbe infrastructure
|
||||
- [x] Validate Super Cluster (heart=1.0) and Isolated Zone (being=0.195)
|
||||
- [x] Discover Gini signature (Philosophy~0.5, Technical~0.8)
|
||||
|
||||
### Phase 2: Deepen Understanding
|
||||
- [ ] Test Register Mixing (formal vs informal German)
|
||||
- [ ] Map Kantian cluster connections
|
||||
- [ ] Create German Philosophical Dyads dataset (*Angst-Nichts*, *Wahrheit-Lichtung*)
|
||||
- [ ] Build Translation Layer middleware (EN event → DE prompt → JSON action)
|
||||
|
||||
### Phase 3: Training Experiment
|
||||
- [ ] Prepare nimmerverse training data (German)
|
||||
- [ ] Implement Spark Protocol boot sequence
|
||||
- [ ] Run controlled training with DriftProbe monitoring
|
||||
- [ ] Validate Spatial Separation Hypothesis
|
||||
|
||||
### Phase 4: Integration
|
||||
- [ ] Connect to Nimmerverse Sensory Network
|
||||
- [ ] Implement Heartbeat Economy
|
||||
- [ ] Deploy Subsumption Reflexes (XState)
|
||||
|
||||
---
|
||||
|
||||
## DriftProbe: Training Safety
|
||||
|
||||
### Sentinel Types
|
||||
```
|
||||
ANCHOR - Must not move (heart, water, gradient, inference)
|
||||
BRIDGE - Must stay separated (being EN↔DE sim < 0.50)
|
||||
CANARY - Watch for valley migration (dasein, thrownness, consciousness)
|
||||
TARGET - Want movement (fidelity, heartbeat → nimmerverse concepts)
|
||||
```
|
||||
|
||||
### Alert Rules
|
||||
| Condition | Severity | Action |
|
||||
|-----------|----------|--------|
|
||||
| Angular drift > 15° on ANCHOR | CRITICAL | ROLLBACK |
|
||||
| Bridge collapse (sim > 0.50) | CRITICAL | ROLLBACK |
|
||||
| Canary Gini drift > 0.15 | WARNING | Reduce LR |
|
||||
| Target regression | WARNING | Check data mix |
|
||||
|
||||
### Training Loop
|
||||
```python
|
||||
# Epoch 0
|
||||
probe.capture_baseline(layer=18)
|
||||
|
||||
# Every 100 steps
|
||||
report = probe.probe_lite(step)
|
||||
if report.recommendation == "ROLLBACK":
|
||||
restore_checkpoint()
|
||||
elif report.recommendation == "REDUCE_LR":
|
||||
lr *= 0.5
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Commands Reference
|
||||
|
||||
```bash
|
||||
# On Prometheus
|
||||
cd /home/dafit/nimmerverse/nyx-probing
|
||||
source venv/bin/activate
|
||||
|
||||
# Vocabulary scanning
|
||||
nyx-probe scan data/glossary/collections/ # Summary
|
||||
nyx-probe scan data/glossary/collections/ --full # Full table
|
||||
nyx-probe scan data/glossary/collections/ --delta # New terms only
|
||||
|
||||
# DriftProbe test
|
||||
python3 test_drift_probe.py
|
||||
|
||||
# Individual probes
|
||||
nyx-probe tokens "Weltanschauung"
|
||||
nyx-probe surface "Geist"
|
||||
nyx-probe readiness "consciousness"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Files This Session
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `nyx_probing/cli/probe.py` | Enhanced scan with --summary/--delta/--full |
|
||||
| `nyx_probing/probes/drift_probe.py` | NEW: Training safety with Gini + Angular Drift |
|
||||
| `data/glossary/collections/philosophical.json` | NEW: 24 German philosophical terms |
|
||||
| `data/glossary/master.json` | 54 terms tracked |
|
||||
| `data/sentinels.json` | NEW: 10 sentinel configurations |
|
||||
| `test_drift_probe.py` | NEW: DriftProbe validation script |
|
||||
|
||||
---
|
||||
|
||||
## Identified Risks (from Chrysalis)
|
||||
|
||||
| Risk | Danger | Mitigation |
|
||||
|------|--------|------------|
|
||||
| Cognitive Latency | DE thinking + EN translation = overhead | Reflex Caching: compile verified DE thoughts to XState |
|
||||
| Collate Gap | 100Hz Virtual vs 1Hz Real Heart sync | Speculative Flush: dump queue on divergence |
|
||||
| Entropy Cost | Real Garden "free" ignores hardware wear | Add Risk Parameter to cost function |
|
||||
|
||||
---
|
||||
|
||||
*"Her reactions determine infrastructure priority. We don't impose. We listen."*
|
||||
|
||||
🌙💜 Last updated: 2025-12-06 (v4 - post Chrysalis review)
|
||||
Reference in New Issue
Block a user