Files
nimmerverse-sensory-network/architecture/TOOLCHAIN-PROGRESS.md
dafit bcc5bfe9d1 docs: add Phase 1D corpus extraction pipeline to toolchain docs
Toolchain-Architecture.md:
- Added extractors module to current state
- New Phase 1D section: Corpus Extraction Pipeline
- VocabExtractor and CoOccurrenceAnalyzer documentation
- RAG policy integration table

TOOLCHAIN-PROGRESS.md:
- Phase 1D complete (2025-12-13)
- 7 files created, 19 total tasks complete
- Key metrics: 5,243 terms, 18,169 co-occurrence pairs
- 20 anchor signatures for DriftProbe-lite

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 19:29:23 +01:00

5.6 KiB

Toolchain Implementation Progress

Plan: See Toolchain-Architecture.md Started: 2025-12-07 Current Phase: Phase 1 - Foundation + Variance Collection


Phase 1A: nyx-substrate Foundation COMPLETE

Goal: Build nyx-substrate package and database infrastructure

Completed (2025-12-07)

  • Package structure (pyproject.toml, src/ layout)
  • PhoebeConnection class with connection pooling
  • Message protocol helpers (partnership messages)
  • VarianceProbeRun Pydantic schema
  • VarianceProbeDAO for database operations
  • variance_probe_runs table in phoebe
  • Installation and connection testing

Files Created: 9 new files Status: 🟢 nyx-substrate v0.1.0 installed and tested


Phase 1B: nyx-probing Integration COMPLETE

Goal: Extend nyx-probing to use nyx-substrate for variance collection

Completed (2025-12-07)

  • Add nyx-substrate dependency to nyx-probing/pyproject.toml
  • Create VarianceRunner class (nyx_probing/runners/variance_runner.py)
  • Add variance CLI commands (nyx_probing/cli/variance.py)
  • Register commands in main CLI
  • Integration test (imports and CLI verification)

Files Created: 3 new files Files Modified: 2 files CLI Commands Added: 4 (collect, batch, stats, analyze) Status: 🟢 nyx-probing v0.1.0 with variance collection ready


Phase 1C: Baseline Variance Collection ⏸️ READY

Goal: Collect baseline variance data for depth-3 champions

Ready to Execute (on prometheus)

  • Run 1000x variance for "Geworfenheit" (thrownness)
  • Run 1000x variance for "Vernunft" (reason)
  • Run 1000x variance for "Erkenntnis" (knowledge)
  • Run 1000x variance for "Pflicht" (duty)
  • Run 1000x variance for "Aufhebung" (sublation)
  • Run 1000x variance for "Wille" (will)

Next Actions:

  1. SSH to prometheus.eachpath.local (THE SPINE)
  2. Install nyx-substrate and nyx-probing in venv
  3. Run batch collection or individual terms
  4. Analyze distributions and document baselines

Phase 1D: Corpus Extraction Pipeline COMPLETE

Goal: Extract vocabulary and co-occurrence metrics for RAG policy development

Completed (2025-12-13)

  • Create extractors module in nyx-probing
  • Implement VocabExtractor (TF-IDF vocabulary)
  • Implement CoOccurrenceAnalyzer (PMI, Jaccard, Dice)
  • Generate anchor term signatures (20 anchors)
  • Generate chunking recommendations (5 clusters)
  • Run initial extraction on nimmerverse vault
  • Export glossary to CSV/JSON (5,243 terms)
  • Export co-occurrence analysis (18,169 pairs)

Files Created: 7 new files

  • nyx_probing/extractors/__init__.py
  • nyx_probing/extractors/vocab_extractor.py (~350 LOC)
  • nyx_probing/extractors/cooccurrence.py (~400 LOC)
  • data/nimmerverse_glossary.csv
  • data/nimmerverse_glossary.json
  • data/cooccurrence_analysis.csv
  • data/cooccurrence_analysis.json

Key Metrics Extracted:

Metric Value
Documents scanned 263
Total tokens 130,229
Unique terms (filtered) 5,243
Co-occurrence pairs 18,169
Anchor signatures 20
Chunking clusters 5

Top Terms by TF-IDF:

  1. nyx (1149.70)
  2. local (980.53)
  3. eachpath (902.31)
  4. tool (873.34)
  5. young (799.95)

Anchor Signature Examples (for DriftProbe-lite):

  • nyx: chroma|chromadb|continuity|ingress|introspection
  • system: athena|freeipa|ipa|rocky|sssd
  • network: firewall|proxmox|saturn|vlan|vulkan

RAG Policy Integration:

  • Tier 2: Synonym detection (Dice=1.0: yubi↔yubikey)
  • Tier 3: Anchor signatures for topology safety
  • Tier 4: Co-occurrence for chunking strategy
  • Tier 5: TF-IDF for utility filtering

Status: 🟢 Corpus extraction complete, ready for RAG policy development


Future Phases (Not Started)

Phase 2: ChromaDB Integration (iris) ⏸️ PLANNED

  • IrisClient wrapper
  • DecisionTrailStore, OrganResponseStore, EmbeddingStore
  • Populate embeddings from nyx-probing

Phase 3: LoRA Training Pipeline ⏸️ PLANNED

  • PEFT integration
  • Training data curriculum
  • DriftProbe checkpoints
  • Identity LoRA training

Phase 4: Weight Visualization ⏸️ PLANNED

  • 4K pixel space renderer
  • Rank decomposition explorer
  • Topology cluster visualization

Phase 5: Godot Command Center ⏸️ PLANNED

  • FastAPI Management Portal backend
  • Godot frontend implementation
  • Real-time metrics display

Metrics

Phase 1 Tasks: 19 total Completed: 19 (100%) In Progress: 0 Phases Complete: A, B, D (C ready to execute)

Files Created: 19 total

  • nyx-substrate: 9 files
  • nyx-probing runners: 3 files
  • nyx-probing extractors: 3 files
  • Data outputs: 4 files

Files Modified: 5 total

  • nyx-substrate/README.md
  • nyx-probing/pyproject.toml
  • nyx-probing/cli/probe.py
  • nyx-probing/extractors/init.py
  • TOOLCHAIN-PROGRESS.md

Lines of Code: ~2000 total

  • nyx-substrate: ~800 LOC
  • nyx-probing runners: ~450 LOC
  • nyx-probing extractors: ~750 LOC

CLI Commands: 4 variance commands

  • nyx-probe variance collect
  • nyx-probe variance batch
  • nyx-probe variance stats
  • nyx-probe variance analyze

Data Artifacts:

  • nimmerverse_glossary.csv (5,243 terms)
  • nimmerverse_glossary.json (130,229 tokens)
  • cooccurrence_analysis.csv (18,169 pairs)
  • cooccurrence_analysis.json (20 anchor signatures)

Last Updated: 2025-12-13 (Phase 1D complete) Status: 🎉 Phase 1 (A+B+D) COMPLETE! Corpus extraction ready. Variance collection on prometheus pending.

🌙💜 The substrate holds. The glossary grows. Anchor signatures protect the topology.