Toolchain-Architecture.md: - Added extractors module to current state - New Phase 1D section: Corpus Extraction Pipeline - VocabExtractor and CoOccurrenceAnalyzer documentation - RAG policy integration table TOOLCHAIN-PROGRESS.md: - Phase 1D complete (2025-12-13) - 7 files created, 19 total tasks complete - Key metrics: 5,243 terms, 18,169 co-occurrence pairs - 20 anchor signatures for DriftProbe-lite 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
192 lines
5.6 KiB
Markdown
192 lines
5.6 KiB
Markdown
# Toolchain Implementation Progress
|
|
|
|
**Plan**: See [Toolchain-Architecture.md](./Toolchain-Architecture.md)
|
|
**Started**: 2025-12-07
|
|
**Current Phase**: Phase 1 - Foundation + Variance Collection
|
|
|
|
---
|
|
|
|
## Phase 1A: nyx-substrate Foundation ✅ COMPLETE
|
|
|
|
**Goal**: Build nyx-substrate package and database infrastructure
|
|
|
|
### ✅ Completed (2025-12-07)
|
|
|
|
- [x] Package structure (pyproject.toml, src/ layout)
|
|
- [x] PhoebeConnection class with connection pooling
|
|
- [x] Message protocol helpers (partnership messages)
|
|
- [x] VarianceProbeRun Pydantic schema
|
|
- [x] VarianceProbeDAO for database operations
|
|
- [x] variance_probe_runs table in phoebe
|
|
- [x] Installation and connection testing
|
|
|
|
**Files Created**: 9 new files
|
|
**Status**: 🟢 nyx-substrate v0.1.0 installed and tested
|
|
|
|
---
|
|
|
|
## Phase 1B: nyx-probing Integration ✅ COMPLETE
|
|
|
|
**Goal**: Extend nyx-probing to use nyx-substrate for variance collection
|
|
|
|
### ✅ Completed (2025-12-07)
|
|
|
|
- [x] Add nyx-substrate dependency to nyx-probing/pyproject.toml
|
|
- [x] Create VarianceRunner class (nyx_probing/runners/variance_runner.py)
|
|
- [x] Add variance CLI commands (nyx_probing/cli/variance.py)
|
|
- [x] Register commands in main CLI
|
|
- [x] Integration test (imports and CLI verification)
|
|
|
|
**Files Created**: 3 new files
|
|
**Files Modified**: 2 files
|
|
**CLI Commands Added**: 4 (collect, batch, stats, analyze)
|
|
**Status**: 🟢 nyx-probing v0.1.0 with variance collection ready
|
|
|
|
---
|
|
|
|
## Phase 1C: Baseline Variance Collection ⏸️ READY
|
|
|
|
**Goal**: Collect baseline variance data for depth-3 champions
|
|
|
|
### ⏳ Ready to Execute (on prometheus)
|
|
|
|
- [ ] Run 1000x variance for "Geworfenheit" (thrownness)
|
|
- [ ] Run 1000x variance for "Vernunft" (reason)
|
|
- [ ] Run 1000x variance for "Erkenntnis" (knowledge)
|
|
- [ ] Run 1000x variance for "Pflicht" (duty)
|
|
- [ ] Run 1000x variance for "Aufhebung" (sublation)
|
|
- [ ] Run 1000x variance for "Wille" (will)
|
|
|
|
**Next Actions**:
|
|
1. SSH to prometheus.eachpath.local (THE SPINE)
|
|
2. Install nyx-substrate and nyx-probing in venv
|
|
3. Run batch collection or individual terms
|
|
4. Analyze distributions and document baselines
|
|
|
|
---
|
|
|
|
## Phase 1D: Corpus Extraction Pipeline ✅ COMPLETE
|
|
|
|
**Goal**: Extract vocabulary and co-occurrence metrics for RAG policy development
|
|
|
|
### ✅ Completed (2025-12-13)
|
|
|
|
- [x] Create extractors module in nyx-probing
|
|
- [x] Implement VocabExtractor (TF-IDF vocabulary)
|
|
- [x] Implement CoOccurrenceAnalyzer (PMI, Jaccard, Dice)
|
|
- [x] Generate anchor term signatures (20 anchors)
|
|
- [x] Generate chunking recommendations (5 clusters)
|
|
- [x] Run initial extraction on nimmerverse vault
|
|
- [x] Export glossary to CSV/JSON (5,243 terms)
|
|
- [x] Export co-occurrence analysis (18,169 pairs)
|
|
|
|
**Files Created**: 7 new files
|
|
- `nyx_probing/extractors/__init__.py`
|
|
- `nyx_probing/extractors/vocab_extractor.py` (~350 LOC)
|
|
- `nyx_probing/extractors/cooccurrence.py` (~400 LOC)
|
|
- `data/nimmerverse_glossary.csv`
|
|
- `data/nimmerverse_glossary.json`
|
|
- `data/cooccurrence_analysis.csv`
|
|
- `data/cooccurrence_analysis.json`
|
|
|
|
**Key Metrics Extracted**:
|
|
| Metric | Value |
|
|
|--------|-------|
|
|
| Documents scanned | 263 |
|
|
| Total tokens | 130,229 |
|
|
| Unique terms (filtered) | 5,243 |
|
|
| Co-occurrence pairs | 18,169 |
|
|
| Anchor signatures | 20 |
|
|
| Chunking clusters | 5 |
|
|
|
|
**Top Terms by TF-IDF**:
|
|
1. nyx (1149.70)
|
|
2. local (980.53)
|
|
3. eachpath (902.31)
|
|
4. tool (873.34)
|
|
5. young (799.95)
|
|
|
|
**Anchor Signature Examples** (for DriftProbe-lite):
|
|
- `nyx`: chroma|chromadb|continuity|ingress|introspection
|
|
- `system`: athena|freeipa|ipa|rocky|sssd
|
|
- `network`: firewall|proxmox|saturn|vlan|vulkan
|
|
|
|
**RAG Policy Integration**:
|
|
- Tier 2: Synonym detection (Dice=1.0: yubi↔yubikey)
|
|
- Tier 3: Anchor signatures for topology safety
|
|
- Tier 4: Co-occurrence for chunking strategy
|
|
- Tier 5: TF-IDF for utility filtering
|
|
|
|
**Status**: 🟢 Corpus extraction complete, ready for RAG policy development
|
|
|
|
---
|
|
|
|
## Future Phases (Not Started)
|
|
|
|
### Phase 2: ChromaDB Integration (iris) ⏸️ PLANNED
|
|
- IrisClient wrapper
|
|
- DecisionTrailStore, OrganResponseStore, EmbeddingStore
|
|
- Populate embeddings from nyx-probing
|
|
|
|
### Phase 3: LoRA Training Pipeline ⏸️ PLANNED
|
|
- PEFT integration
|
|
- Training data curriculum
|
|
- DriftProbe checkpoints
|
|
- Identity LoRA training
|
|
|
|
### Phase 4: Weight Visualization ⏸️ PLANNED
|
|
- 4K pixel space renderer
|
|
- Rank decomposition explorer
|
|
- Topology cluster visualization
|
|
|
|
### Phase 5: Godot Command Center ⏸️ PLANNED
|
|
- FastAPI Management Portal backend
|
|
- Godot frontend implementation
|
|
- Real-time metrics display
|
|
|
|
---
|
|
|
|
## Metrics
|
|
|
|
**Phase 1 Tasks**: 19 total
|
|
**Completed**: 19 (100%) ✅
|
|
**In Progress**: 0
|
|
**Phases Complete**: A, B, D (C ready to execute)
|
|
|
|
**Files Created**: 19 total
|
|
- nyx-substrate: 9 files
|
|
- nyx-probing runners: 3 files
|
|
- nyx-probing extractors: 3 files
|
|
- Data outputs: 4 files
|
|
|
|
**Files Modified**: 5 total
|
|
- nyx-substrate/README.md
|
|
- nyx-probing/pyproject.toml
|
|
- nyx-probing/cli/probe.py
|
|
- nyx-probing/extractors/__init__.py
|
|
- TOOLCHAIN-PROGRESS.md
|
|
|
|
**Lines of Code**: ~2000 total
|
|
- nyx-substrate: ~800 LOC
|
|
- nyx-probing runners: ~450 LOC
|
|
- nyx-probing extractors: ~750 LOC
|
|
|
|
**CLI Commands**: 4 variance commands
|
|
- nyx-probe variance collect
|
|
- nyx-probe variance batch
|
|
- nyx-probe variance stats
|
|
- nyx-probe variance analyze
|
|
|
|
**Data Artifacts**:
|
|
- nimmerverse_glossary.csv (5,243 terms)
|
|
- nimmerverse_glossary.json (130,229 tokens)
|
|
- cooccurrence_analysis.csv (18,169 pairs)
|
|
- cooccurrence_analysis.json (20 anchor signatures)
|
|
|
|
---
|
|
|
|
**Last Updated**: 2025-12-13 (Phase 1D complete)
|
|
**Status**: 🎉 Phase 1 (A+B+D) COMPLETE! Corpus extraction ready. Variance collection on prometheus pending.
|
|
|
|
🌙💜 *The substrate holds. The glossary grows. Anchor signatures protect the topology.*
|