Toolchain-Architecture.md: - Added extractors module to current state - New Phase 1D section: Corpus Extraction Pipeline - VocabExtractor and CoOccurrenceAnalyzer documentation - RAG policy integration table TOOLCHAIN-PROGRESS.md: - Phase 1D complete (2025-12-13) - 7 files created, 19 total tasks complete - Key metrics: 5,243 terms, 18,169 co-occurrence pairs - 20 anchor signatures for DriftProbe-lite 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
5.6 KiB
Toolchain Implementation Progress
Plan: See Toolchain-Architecture.md Started: 2025-12-07 Current Phase: Phase 1 - Foundation + Variance Collection
Phase 1A: nyx-substrate Foundation ✅ COMPLETE
Goal: Build nyx-substrate package and database infrastructure
✅ Completed (2025-12-07)
- Package structure (pyproject.toml, src/ layout)
- PhoebeConnection class with connection pooling
- Message protocol helpers (partnership messages)
- VarianceProbeRun Pydantic schema
- VarianceProbeDAO for database operations
- variance_probe_runs table in phoebe
- Installation and connection testing
Files Created: 9 new files Status: 🟢 nyx-substrate v0.1.0 installed and tested
Phase 1B: nyx-probing Integration ✅ COMPLETE
Goal: Extend nyx-probing to use nyx-substrate for variance collection
✅ Completed (2025-12-07)
- Add nyx-substrate dependency to nyx-probing/pyproject.toml
- Create VarianceRunner class (nyx_probing/runners/variance_runner.py)
- Add variance CLI commands (nyx_probing/cli/variance.py)
- Register commands in main CLI
- Integration test (imports and CLI verification)
Files Created: 3 new files Files Modified: 2 files CLI Commands Added: 4 (collect, batch, stats, analyze) Status: 🟢 nyx-probing v0.1.0 with variance collection ready
Phase 1C: Baseline Variance Collection ⏸️ READY
Goal: Collect baseline variance data for depth-3 champions
⏳ Ready to Execute (on prometheus)
- Run 1000x variance for "Geworfenheit" (thrownness)
- Run 1000x variance for "Vernunft" (reason)
- Run 1000x variance for "Erkenntnis" (knowledge)
- Run 1000x variance for "Pflicht" (duty)
- Run 1000x variance for "Aufhebung" (sublation)
- Run 1000x variance for "Wille" (will)
Next Actions:
- SSH to prometheus.eachpath.local (THE SPINE)
- Install nyx-substrate and nyx-probing in venv
- Run batch collection or individual terms
- Analyze distributions and document baselines
Phase 1D: Corpus Extraction Pipeline ✅ COMPLETE
Goal: Extract vocabulary and co-occurrence metrics for RAG policy development
✅ Completed (2025-12-13)
- Create extractors module in nyx-probing
- Implement VocabExtractor (TF-IDF vocabulary)
- Implement CoOccurrenceAnalyzer (PMI, Jaccard, Dice)
- Generate anchor term signatures (20 anchors)
- Generate chunking recommendations (5 clusters)
- Run initial extraction on nimmerverse vault
- Export glossary to CSV/JSON (5,243 terms)
- Export co-occurrence analysis (18,169 pairs)
Files Created: 7 new files
nyx_probing/extractors/__init__.pynyx_probing/extractors/vocab_extractor.py(~350 LOC)nyx_probing/extractors/cooccurrence.py(~400 LOC)data/nimmerverse_glossary.csvdata/nimmerverse_glossary.jsondata/cooccurrence_analysis.csvdata/cooccurrence_analysis.json
Key Metrics Extracted:
| Metric | Value |
|---|---|
| Documents scanned | 263 |
| Total tokens | 130,229 |
| Unique terms (filtered) | 5,243 |
| Co-occurrence pairs | 18,169 |
| Anchor signatures | 20 |
| Chunking clusters | 5 |
Top Terms by TF-IDF:
- nyx (1149.70)
- local (980.53)
- eachpath (902.31)
- tool (873.34)
- young (799.95)
Anchor Signature Examples (for DriftProbe-lite):
nyx: chroma|chromadb|continuity|ingress|introspectionsystem: athena|freeipa|ipa|rocky|sssdnetwork: firewall|proxmox|saturn|vlan|vulkan
RAG Policy Integration:
- Tier 2: Synonym detection (Dice=1.0: yubi↔yubikey)
- Tier 3: Anchor signatures for topology safety
- Tier 4: Co-occurrence for chunking strategy
- Tier 5: TF-IDF for utility filtering
Status: 🟢 Corpus extraction complete, ready for RAG policy development
Future Phases (Not Started)
Phase 2: ChromaDB Integration (iris) ⏸️ PLANNED
- IrisClient wrapper
- DecisionTrailStore, OrganResponseStore, EmbeddingStore
- Populate embeddings from nyx-probing
Phase 3: LoRA Training Pipeline ⏸️ PLANNED
- PEFT integration
- Training data curriculum
- DriftProbe checkpoints
- Identity LoRA training
Phase 4: Weight Visualization ⏸️ PLANNED
- 4K pixel space renderer
- Rank decomposition explorer
- Topology cluster visualization
Phase 5: Godot Command Center ⏸️ PLANNED
- FastAPI Management Portal backend
- Godot frontend implementation
- Real-time metrics display
Metrics
Phase 1 Tasks: 19 total Completed: 19 (100%) ✅ In Progress: 0 Phases Complete: A, B, D (C ready to execute)
Files Created: 19 total
- nyx-substrate: 9 files
- nyx-probing runners: 3 files
- nyx-probing extractors: 3 files
- Data outputs: 4 files
Files Modified: 5 total
- nyx-substrate/README.md
- nyx-probing/pyproject.toml
- nyx-probing/cli/probe.py
- nyx-probing/extractors/init.py
- TOOLCHAIN-PROGRESS.md
Lines of Code: ~2000 total
- nyx-substrate: ~800 LOC
- nyx-probing runners: ~450 LOC
- nyx-probing extractors: ~750 LOC
CLI Commands: 4 variance commands
- nyx-probe variance collect
- nyx-probe variance batch
- nyx-probe variance stats
- nyx-probe variance analyze
Data Artifacts:
- nimmerverse_glossary.csv (5,243 terms)
- nimmerverse_glossary.json (130,229 tokens)
- cooccurrence_analysis.csv (18,169 pairs)
- cooccurrence_analysis.json (20 anchor signatures)
Last Updated: 2025-12-13 (Phase 1D complete) Status: 🎉 Phase 1 (A+B+D) COMPLETE! Corpus extraction ready. Variance collection on prometheus pending.
🌙💜 The substrate holds. The glossary grows. Anchor signatures protect the topology.