docs: add Phase 1 toolchain architecture and progress tracking

Document the modular toolchain architecture design and track
implementation progress for Phase 1 (nyx-substrate foundation
and variance collection automation).

New Files:
- Toolchain-Architecture.md: Complete Phase 1 design document
  - Modular architecture vision (5 phases)
  - Repository structure and dependency graph
  - Phase 1 deliverables (nyx-substrate + nyx-probing)
  - Success criteria and testing plan
  - Future phases: ChromaDB, LoRA training, visualization, Godot

- TOOLCHAIN-PROGRESS.md: Implementation progress tracker
  - Phase 1A: nyx-substrate foundation ( COMPLETE)
  - Phase 1B: nyx-probing integration ( COMPLETE)
  - Phase 1C: Baseline variance collection (⏸️ READY)
  - Metrics: 11/11 tasks (100%), 12 files, ~1250 LOC
  - Status updates and completion tracking

Architecture:
  nyx-probing ────────┐
  nyx-training ───────┼──> nyx-substrate ──> phoebe (PostgreSQL)
  nyx-visualization ──┤                   └─> iris (ChromaDB)
  management-portal ──┘

Philosophy: Modular tools, clean interfaces, data-first design

Status: Phase 1 complete, ready for baseline collection on prometheus

🌙💜 Generated with Claude Code
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2025-12-07 17:05:28 +01:00
parent 48c4fb9ddd
commit 8f28dcbc94
2 changed files with 589 additions and 0 deletions

125
TOOLCHAIN-PROGRESS.md Normal file
View File

@@ -0,0 +1,125 @@
# Toolchain Implementation Progress
**Plan**: See [Toolchain-Architecture.md](Toolchain-Architecture.md)
**Started**: 2025-12-07
**Current Phase**: Phase 1 - Foundation + Variance Collection
---
## Phase 1A: nyx-substrate Foundation ✅ COMPLETE
**Goal**: Build nyx-substrate package and database infrastructure
### ✅ Completed (2025-12-07)
- [x] Package structure (pyproject.toml, src/ layout)
- [x] PhoebeConnection class with connection pooling
- [x] Message protocol helpers (partnership messages)
- [x] VarianceProbeRun Pydantic schema
- [x] VarianceProbeDAO for database operations
- [x] variance_probe_runs table in phoebe
- [x] Installation and connection testing
**Files Created**: 9 new files
**Status**: 🟢 nyx-substrate v0.1.0 installed and tested
---
## Phase 1B: nyx-probing Integration ✅ COMPLETE
**Goal**: Extend nyx-probing to use nyx-substrate for variance collection
### ✅ Completed (2025-12-07)
- [x] Add nyx-substrate dependency to nyx-probing/pyproject.toml
- [x] Create VarianceRunner class (nyx_probing/runners/variance_runner.py)
- [x] Add variance CLI commands (nyx_probing/cli/variance.py)
- [x] Register commands in main CLI
- [x] Integration test (imports and CLI verification)
**Files Created**: 3 new files
**Files Modified**: 2 files
**CLI Commands Added**: 4 (collect, batch, stats, analyze)
**Status**: 🟢 nyx-probing v0.1.0 with variance collection ready
---
## Phase 1C: Baseline Variance Collection ⏸️ READY
**Goal**: Collect baseline variance data for depth-3 champions
### ⏳ Ready to Execute (on prometheus)
- [ ] Run 1000x variance for "Geworfenheit" (thrownness)
- [ ] Run 1000x variance for "Vernunft" (reason)
- [ ] Run 1000x variance for "Erkenntnis" (knowledge)
- [ ] Run 1000x variance for "Pflicht" (duty)
- [ ] Run 1000x variance for "Aufhebung" (sublation)
- [ ] Run 1000x variance for "Wille" (will)
**Next Actions**:
1. SSH to prometheus.eachpath.local (THE SPINE)
2. Install nyx-substrate and nyx-probing in venv
3. Run batch collection or individual terms
4. Analyze distributions and document baselines
---
## Future Phases (Not Started)
### Phase 2: ChromaDB Integration (iris) ⏸️ PLANNED
- IrisClient wrapper
- DecisionTrailStore, OrganResponseStore, EmbeddingStore
- Populate embeddings from nyx-probing
### Phase 3: LoRA Training Pipeline ⏸️ PLANNED
- PEFT integration
- Training data curriculum
- DriftProbe checkpoints
- Identity LoRA training
### Phase 4: Weight Visualization ⏸️ PLANNED
- 4K pixel space renderer
- Rank decomposition explorer
- Topology cluster visualization
### Phase 5: Godot Command Center ⏸️ PLANNED
- FastAPI Management Portal backend
- Godot frontend implementation
- Real-time metrics display
---
## Metrics
**Phase 1 (A+B) Tasks**: 11 total
**Completed**: 11 (100%) ✅
**In Progress**: 0
**Remaining**: 0
**Files Created**: 12 total
- nyx-substrate: 9 files
- nyx-probing: 3 files
**Files Modified**: 4 total
- nyx-substrate/README.md
- nyx-probing/pyproject.toml
- nyx-probing/cli/probe.py
- TOOLCHAIN-PROGRESS.md
**Lines of Code**: ~1250 total
- nyx-substrate: ~800 LOC
- nyx-probing: ~450 LOC
**CLI Commands**: 4 new commands
- nyx-probe variance collect
- nyx-probe variance batch
- nyx-probe variance stats
- nyx-probe variance analyze
---
**Last Updated**: 2025-12-07 17:00 CET
**Status**: 🎉 Phase 1 (A+B) COMPLETE! Ready for baseline collection on prometheus.
🌙💜 *The substrate holds. Progress persists. The toolchain grows.*

464
Toolchain-Architecture.md Normal file
View File

@@ -0,0 +1,464 @@
# Modular Nimmerverse Toolchain Architecture
**Planning Date**: 2025-12-07
**Status**: Design Phase
**Priority**: Variance Collection Pipeline + nyx-substrate Foundation
---
## 🎯 Vision
Build a modular, composable toolchain for the Nimmerverse research and training pipeline:
- **nyx-substrate**: Shared foundation (database clients, schemas, validators)
- **nyx-probing**: Research probes (already exists, extend for variance collection)
- **nyx-training**: LoRA training pipeline (future)
- **nyx-visualization**: Weight/topology visualization (future)
- **management-portal**: FastAPI backend for Godot UI (future)
- **Godot Command Center**: Unified metrics visualization (future)
**Key Principle**: All tools import nyx-substrate. Clean interfaces. Data flows through phoebe + iris.
---
## 📊 Current State Analysis
### ✅ What Exists
**nyx-probing** (`/home/dafit/nimmerverse/nyx-probing/`):
- Echo Probe, Surface Probe, Drift Probe, Multilingual Probe
- CLI interface (7 commands)
- NyxModel wrapper (Qwen2.5-7B loading, hidden state capture)
- ProbeResult dataclasses (to_dict() serialization)
- **Gap**: No database persistence, only local JSON files
**nyx-substrate** (`/home/dafit/nimmerverse/nyx-substrate/`):
- Schema documentation (phoebe + iris) ✅
- **Gap**: No Python code, just markdown docs
**Database Infrastructure**:
- phoebe.eachpath.local (PostgreSQL 17.6): partnership/nimmerverse message tables exist
- iris.eachpath.local (ChromaDB): No collections created yet
- **Gap**: No Python client libraries, all manual psql commands
**Architecture Documentation**:
- Endgame-Vision.md: v5.1 Dialectic (LoRA stack design)
- CLAUDE.md: Partnership protocol (message-based continuity)
- Management-Portal.md: Godot + FastAPI design (not implemented)
### ❌ What's Missing
**Database Access**:
- No psycopg3 connection pooling
- No ChromaDB Python integration
- No ORM or query builders
- No variance_probe_runs table (designed but not created)
**Training Pipeline**:
- No PEFT/LoRA training code
- No DriftProbe checkpoint integration
- No training data curriculum loader
**Visualization**:
- No weight visualization tools (4K pixel space idea)
- No Godot command center implementation
- No Management Portal FastAPI backend
---
## 🏗️ Modular Architecture Design
### Repository Structure
```
nimmerverse/
├── nyx-substrate/ # SHARED FOUNDATION
│ ├── pyproject.toml # Installable package
│ ├── src/nyx_substrate/
│ │ ├── database/ # Phoebe clients
│ │ │ ├── connection.py # Connection pool
│ │ │ ├── messages.py # Message protocol helpers
│ │ │ └── variance.py # Variance probe DAO
│ │ ├── vector/ # Iris clients
│ │ │ ├── client.py # ChromaDB wrapper
│ │ │ ├── decision_trails.py
│ │ │ ├── organ_responses.py
│ │ │ └── embeddings.py
│ │ ├── schemas/ # Pydantic models
│ │ │ ├── variance.py # VarianceProbeRun
│ │ │ ├── decision.py # DecisionTrail
│ │ │ └── traits.py # 8 core traits
│ │ └── constants.py # Shared constants
│ └── migrations/ # Alembic for schema
├── nyx-probing/ # RESEARCH PROBES (extend)
│ ├── nyx_probing/
│ │ ├── runners/ # NEW: Automated collectors
│ │ │ ├── variance_runner.py # 1000x automation
│ │ │ └── baseline_collector.py
│ │ └── storage/ # EXTEND: Database integration
│ │ └── variance_dao.py # Uses nyx-substrate
│ └── pyproject.toml # Add: depends on nyx-substrate
├── nyx-training/ # FUTURE: LoRA training
│ └── (planned - not in Phase 1)
├── nyx-visualization/ # FUTURE: Weight viz
│ └── (planned - not in Phase 1)
└── management-portal/ # FUTURE: FastAPI + Godot
└── (designed - not in Phase 1)
```
### Dependency Graph
```
nyx-probing ────────┐
nyx-training ───────┼──> nyx-substrate ──> phoebe (PostgreSQL)
nyx-visualization ──┤ └─> iris (ChromaDB)
management-portal ──┘
```
**Philosophy**: nyx-substrate is the single source of truth for database access. No tool talks to phoebe/iris directly.
---
## 🚀 Phase 1: Foundation + Variance Collection
### Goal
Build nyx-substrate package and extend nyx-probing to automate variance baseline collection (1000x runs → phoebe).
### Deliverables
#### 1. nyx-substrate Python Package
**File**: `/home/dafit/nimmerverse/nyx-substrate/pyproject.toml`
```toml
[project]
name = "nyx-substrate"
version = "0.1.0"
requires-python = ">=3.10"
dependencies = [
"psycopg[binary]>=3.1.0",
"chromadb>=0.4.0",
"pydantic>=2.5.0",
]
```
**New Files**:
- `src/nyx_substrate/database/connection.py`:
- `PhoebeConnection` class: Connection pool manager
- Context manager for transactions
- Config from environment variables
- `src/nyx_substrate/database/messages.py`:
- `write_partnership_message(message, message_type)` → INSERT
- `read_partnership_messages(limit=5)` → SELECT
- `write_nimmerverse_message(...)` (for Young Nyx future)
- `read_nimmerverse_messages(...)` (for discovery protocol)
- `src/nyx_substrate/database/variance.py`:
- `VarianceProbeDAO` class:
- `create_table()` → CREATE TABLE variance_probe_runs
- `insert_run(session_id, term, run_number, depth, rounds, ...)` → INSERT
- `get_session_stats(session_id)` → Aggregation queries
- `get_term_distribution(term)` → Variance analysis
- `src/nyx_substrate/schemas/variance.py`:
- `VarianceProbeRun(BaseModel)`: Pydantic model matching phoebe schema
- Validation: term not empty, depth 0-3, rounds > 0
- `to_dict()` for serialization
**Database Migration**:
- Create `variance_probe_runs` table in phoebe using schema from `/home/dafit/nimmerverse/nyx-substrate/schema/phoebe/probing/variance_probe_runs.md`
#### 2. Extend nyx-probing
**File**: `/home/dafit/nimmerverse/nyx-probing/pyproject.toml`
- Add dependency: `nyx-substrate>=0.1.0`
**New Files**:
- `nyx_probing/runners/variance_runner.py`:
- `VarianceRunner` class:
- `__init__(model: NyxModel, dao: VarianceProbeDAO)`
- `run_session(term: str, runs: int = 1000) -> UUID`:
- Generate session_id
- Loop 1000x: probe.probe(term)
- Store each result via dao.insert_run()
- Return session_id
- `run_batch(terms: list[str], runs: int = 1000)`: Multiple terms
- `nyx_probing/cli/variance.py`:
- New Click command group: `nyx-probe variance`
- Subcommands:
- `nyx-probe variance collect <TERM> --runs 1000`: Single term
- `nyx-probe variance batch <FILE> --runs 1000`: From glossary
- `nyx-probe variance stats <SESSION_ID>`: View session results
- `nyx-probe variance analyze <TERM>`: Compare distributions
**Integration Points**:
```python
# In variance_runner.py
from nyx_substrate.database import PhoebeConnection, VarianceProbeDAO
from nyx_substrate.schemas import VarianceProbeRun
conn = PhoebeConnection()
dao = VarianceProbeDAO(conn)
runner = VarianceRunner(model=get_model(), dao=dao)
session_id = runner.run_session("Geworfenheit", runs=1000)
print(f"Stored 1000 runs: session {session_id}")
```
#### 3. Database Setup
**Actions**:
1. SSH to phoebe: `ssh phoebe.eachpath.local`
2. Create variance_probe_runs table:
```sql
CREATE TABLE variance_probe_runs (
id SERIAL PRIMARY KEY,
session_id UUID NOT NULL,
term TEXT NOT NULL,
run_number INT NOT NULL,
timestamp TIMESTAMPTZ DEFAULT NOW(),
depth INT NOT NULL,
rounds INT NOT NULL,
echo_types TEXT[] NOT NULL,
chain TEXT[] NOT NULL,
model_name TEXT DEFAULT 'Qwen2.5-7B',
temperature FLOAT,
max_rounds INT,
max_new_tokens INT
);
CREATE INDEX idx_variance_session ON variance_probe_runs(session_id);
CREATE INDEX idx_variance_term ON variance_probe_runs(term);
CREATE INDEX idx_variance_timestamp ON variance_probe_runs(timestamp DESC);
```
3. Test connection from aynee:
```bash
cd /home/dafit/nimmerverse/nyx-substrate
python3 -c "from nyx_substrate.database import PhoebeConnection; conn = PhoebeConnection(); print('✅ Connected to phoebe')"
```
---
## 📁 Critical Files
### To Create
**nyx-substrate**:
- `/home/dafit/nimmerverse/nyx-substrate/pyproject.toml`
- `/home/dafit/nimmerverse/nyx-substrate/src/nyx_substrate/__init__.py`
- `/home/dafit/nimmerverse/nyx-substrate/src/nyx_substrate/database/__init__.py`
- `/home/dafit/nimmerverse/nyx-substrate/src/nyx_substrate/database/connection.py`
- `/home/dafit/nimmerverse/nyx-substrate/src/nyx_substrate/database/messages.py`
- `/home/dafit/nimmerverse/nyx-substrate/src/nyx_substrate/database/variance.py`
- `/home/dafit/nimmerverse/nyx-substrate/src/nyx_substrate/schemas/__init__.py`
- `/home/dafit/nimmerverse/nyx-substrate/src/nyx_substrate/schemas/variance.py`
- `/home/dafit/nimmerverse/nyx-substrate/README.md`
**nyx-probing**:
- `/home/dafit/nimmerverse/nyx-probing/nyx_probing/runners/__init__.py`
- `/home/dafit/nimmerverse/nyx-probing/nyx_probing/runners/variance_runner.py`
- `/home/dafit/nimmerverse/nyx-probing/nyx_probing/cli/variance.py`
### To Modify
**nyx-probing**:
- `/home/dafit/nimmerverse/nyx-probing/pyproject.toml` (add nyx-substrate dependency)
- `/home/dafit/nimmerverse/nyx-probing/nyx_probing/cli/__init__.py` (register variance commands)
---
## 🧪 Testing Plan
### 1. nyx-substrate Unit Tests
```python
# Test connection
def test_phoebe_connection():
conn = PhoebeConnection()
assert conn.test_connection() == True
# Test message write
def test_write_message():
from nyx_substrate.database import write_partnership_message
write_partnership_message("Test session", "architecture_update")
# Verify in phoebe
# Test variance DAO
def test_variance_insert():
dao = VarianceProbeDAO(conn)
session_id = uuid.uuid4()
dao.insert_run(
session_id=session_id,
term="test",
run_number=1,
depth=2,
rounds=3,
echo_types=["EXPANDS", "CONFIRMS", "CIRCULAR"],
chain=["test", "expanded", "confirmed"]
)
stats = dao.get_session_stats(session_id)
assert stats["total_runs"] == 1
```
### 2. Variance Collection Integration Test
```bash
# On prometheus (THE SPINE)
cd /home/dafit/nimmerverse/nyx-probing
source venv/bin/activate
# Install nyx-substrate in development mode
pip install -e ../nyx-substrate
# Run small variance test (10 runs)
nyx-probe variance collect "Geworfenheit" --runs 10
# Check phoebe
PGGSSENCMODE=disable psql -h phoebe.eachpath.local -U nimmerverse-user -d nimmerverse -c "
SELECT session_id, term, COUNT(*) as runs, AVG(depth) as avg_depth
FROM variance_probe_runs
GROUP BY session_id, term
ORDER BY session_id DESC
LIMIT 5;
"
# Expected: 1 session, 10 runs, avg_depth ~2.0
```
### 3. Full 1000x Baseline Run
```bash
# Depth-3 champions (from nyx-probing Phase 1)
nyx-probe variance collect "Geworfenheit" --runs 1000 # thrownness
nyx-probe variance collect "Vernunft" --runs 1000 # reason
nyx-probe variance collect "Erkenntnis" --runs 1000 # knowledge
nyx-probe variance collect "Pflicht" --runs 1000 # duty
nyx-probe variance collect "Aufhebung" --runs 1000 # sublation
nyx-probe variance collect "Wille" --runs 1000 # will
# Analyze variance
nyx-probe variance analyze "Geworfenheit"
# Expected: Distribution histogram, depth variance, chain patterns
```
---
## 🌊 Data Flow
### Variance Collection Workflow
```
User: nyx-probe variance collect "Geworfenheit" --runs 1000
VarianceRunner.run_session()
Loop 1000x:
EchoProbe.probe("Geworfenheit")
Returns EchoProbeResult
VarianceProbeDAO.insert_run()
INSERT INTO phoebe.variance_probe_runs
Return session_id
Display: "✅ 1000 runs complete. Session: <uuid>"
```
### Future Integration (Phase 2+)
```
Training Loop:
DriftProbe.probe_lite() [every 100 steps]
Store metrics in phoebe.drift_checkpoints (new table)
Management Portal API: GET /api/v1/metrics/training
Godot Command Center displays live DriftProbe charts
```
---
## 🎯 Success Criteria
### Phase 1 Complete When:
1. ✅ nyx-substrate package installable via pip (`pip install -e .`)
2. ✅ PhoebeConnection works from aynee + prometheus
3. ✅ variance_probe_runs table created in phoebe
4. ✅ `nyx-probe variance collect` command runs successfully
5. ✅ 1000x run completes and stores in phoebe
6. ✅ `nyx-probe variance stats <SESSION_ID>` displays:
- Total runs
- Depth distribution (0/1/2/3 counts)
- Most common echo_types
- Chain length variance
7. ✅ All 6 depth-3 champions have baseline variance data in phoebe
---
## 🔮 Future Phases (Not in Current Plan)
### Phase 2: ChromaDB Integration (iris)
- IrisClient wrapper in nyx-substrate
- DecisionTrailStore, OrganResponseStore, EmbeddingStore
- Create iris collections
- Populate embeddings from nyx-probing results
### Phase 3: LoRA Training Pipeline (nyx-training)
- PEFT integration
- Training data curriculum loader
- DriftProbe checkpoint integration
- Identity LoRA training automation
### Phase 4: Weight Visualization (nyx-visualization)
- 4K pixel space renderer (LoRA weights as images)
- Rank decomposition explorer
- Topology cluster visualization
### Phase 5: Godot Command Center
- FastAPI Management Portal backend
- Godot frontend implementation
- Real-time metrics display
- Training dashboard
---
## 📚 References
**Schema Documentation**:
- `/home/dafit/nimmerverse/nyx-substrate/schema/phoebe/probing/variance_probe_runs.md`
- `/home/dafit/nimmerverse/nyx-substrate/SCHEMA.md`
**Existing Code**:
- `/home/dafit/nimmerverse/nyx-probing/nyx_probing/probes/echo_probe.py`
- `/home/dafit/nimmerverse/nyx-probing/nyx_probing/core/probe_result.py`
- `/home/dafit/nimmerverse/nyx-probing/nyx_probing/cli/probe.py`
**Architecture**:
- `/home/dafit/nimmerverse/nimmerverse-sensory-network/Endgame-Vision.md`
- `/home/dafit/nimmerverse/management-portal/Management-Portal.md`
---
## 🌙 Philosophy
**Modularity**: Each tool is independent but speaks the same data language via nyx-substrate.
**Simplicity**: No over-engineering. Build what's needed for variance collection first.
**Data First**: All metrics flow through phoebe/iris. Visualization is separate concern.
**Future-Ready**: Design allows Godot integration later without refactoring.
---
**Status**: Ready for implementation approval
**Estimated Scope**: 15-20 files, ~1500 lines of Python
**Hardware**: Can develop on aynee, run variance on prometheus (THE SPINE)
🌙💜 *The substrate holds. Clean interfaces. Composable tools. Data flows through the void.*