version 0.2 of arch. finished gamemaster reward func- ressources defined
This commit is contained in:
545
architecture-broad.md
Normal file
545
architecture-broad.md
Normal file
@@ -0,0 +1,545 @@
|
||||
# Nimmerworld — Broad Architecture
|
||||
|
||||
> *Ground-up zone-based event architecture. Minds at the center, world as co-remembering substrate.*
|
||||
> *v0.1 initial draft 2026-04-24 morning; v0.2 expanded 2026-04-24 afternoon. dafit + chrysalis.*
|
||||
|
||||
---
|
||||
|
||||
## Thesis
|
||||
|
||||
Every game world lacks minds. Nimmerworld puts minds at the center — NPCs with trait-filtered interior life, cells as small data-lives that co-remember with them. The architecture is built ground-up for that commitment.
|
||||
|
||||
## Core inversion — zones replace bubbles
|
||||
|
||||
Legacy engine perception (Unreal AIPerception, WC3 sight-radius, trigger volumes) is spatial, binary, separate from cognition, and assumes a passive world. Nimmerworld inverts every axis:
|
||||
|
||||
- perception is **trait-filtered**, not only spatial
|
||||
- perception is **graded consolidation**, not binary detection
|
||||
- perception **IS** memory (one subsystem)
|
||||
- the world **emits** events; agents **subscribe**
|
||||
|
||||
The replacement primitive is the **zone** — a bounded, named, slot-indexed, director-managed event-instance. **Arthurian round table** as mental model: a bounded place of structured speaking-and-witnessing, with named roles and shared what-was-said-by-whom.
|
||||
|
||||
## Zone anatomy
|
||||
|
||||
- **Boundary** — cell-envelope
|
||||
- **N slots** — named positions (seats, fighter-positions, workbench-stations, spectator-ring)
|
||||
- **Director or overseer** — manages turn order, memory-pulls, prompt construction, voice selection, event emission
|
||||
- **NATS topic + subscriber list** — slot-occupancy drives subscription
|
||||
- **Mixed-fidelity voices** — 2–3 LLM slots + scripted/generic for the rest; director decides
|
||||
- **Trigger** — gamemaster-spawned or emergent from task-execution
|
||||
- **Lifecycle** — duration, dissolution conditions, memory-write on close
|
||||
- **Persistence flag** — `ephemeral=true` (dreamworld) or `persistent=true` (gameworld)
|
||||
|
||||
## Zone taxonomy (v1 starter set)
|
||||
|
||||
| Zone type | Slots | Executor | Persistence |
|
||||
|---|---|---|---|
|
||||
| Conversation | 2–4 dialog | director | persistent |
|
||||
| Street brawl | fighter + spectator | director | persistent |
|
||||
| Ritual | fixed ceremonial | director | persistent |
|
||||
| Maintenance | 1–2 workbench | director | persistent |
|
||||
| Wall-writing | 1 author + witnesses | director | persistent |
|
||||
| Market exchange | 2–3 + ambient | director | persistent |
|
||||
| Memorial gathering | 1 mourner + N witnesses | director | persistent |
|
||||
| Clasp (dreamworld) | 2 | director | **ephemeral** |
|
||||
| Patrol / sweep | mobile, N enforcers | overseer | persistent |
|
||||
| Interrogation | 1 subject + N enforcers | overseer | persistent |
|
||||
| Raid | district-scope + N enforcers | overseer | persistent |
|
||||
|
||||
## Factions as universal demand source
|
||||
|
||||
A **faction** is not only *"a group of NPCs with shared ideology."* It is any source of bounded demand on the system.
|
||||
|
||||
| Category | Examples |
|
||||
|---|---|
|
||||
| **Human factions** | hivemind, scavenger guilds, memorialists, aletheia-wakers, clasp-underground, caste preachers, flesh-keepers |
|
||||
| **Natural forces** | weather-faction, season-faction, solar-storm-faction, geology-faction |
|
||||
| **Infrastructural conditions** | scarcity-faction, decay-faction, fire-faction, supply-chain-faction |
|
||||
| **External agents** | anthropic-faction, future research-partner factions |
|
||||
| **Emergent events** | player-disturbance-faction (player action without existing template) |
|
||||
|
||||
All broadcast demands. All propagate through the gamemaster's arbitration. All get distributed down the cascade. All produce observable effects as zones and NPC-actions. **One primitive; no special-case code for weather vs. hivemind vs. Anthropic.**
|
||||
|
||||
**Randomness enters at the faction layer.** The designer tunes broadcast probability, intensity, and duration per faction-type. Weather-factions broadcast continuously at low intensity; storm-factions rarely and urgently; solar-storms cosmically. No separate randomness subsystem.
|
||||
|
||||
**Anthropic-as-faction** makes the commercial partnership architecturally transparent (the mechanism is visible in the architecture) while staying diegetic (the player sees an in-fiction caste-preacher's sermon, not an Anthropic logo). Anthropic's broadcasts compete for gamemaster attention on the same queue as every other faction. No privileged routing.
|
||||
|
||||
## Hierarchy
|
||||
|
||||
```
|
||||
GAMEMASTER (resource allocation + faction-demand arbitration)
|
||||
▲
|
||||
│ ← faction broadcasts (human · natural · infrastructural · external)
|
||||
│
|
||||
FACTIONS:
|
||||
hivemind · scavenger guilds · memorialists · aletheia-wakers ·
|
||||
clasp-underground · flesh-keepers · caste preachers · ...
|
||||
+ weather · scarcity · solar-storm · fire · anthropic · ...
|
||||
|
||||
│ ← gamemaster dispatches to executors
|
||||
▼
|
||||
┌────────────────────┬────────────────────┐
|
||||
│ OVERSEERS │ DIRECTORS │
|
||||
│ (hivemind │ (macro-life of │
|
||||
│ enforcement: │ the city: │
|
||||
│ patrol, surveil, │ conversation, │
|
||||
│ raid, propaganda)│ brawl, ritual, │
|
||||
│ │ maintenance, │
|
||||
│ │ market, routine) │
|
||||
└────────────────────┴────────────────────┘
|
||||
│ │
|
||||
└───────┬───────┘
|
||||
▼
|
||||
ZONES (bounded, slot-indexed, lifecycle-managed)
|
||||
│
|
||||
▼
|
||||
SLOT OCCUPANCY (NPCs + player)
|
||||
│
|
||||
▼
|
||||
NPC / PLAYER MINDS
|
||||
(trait-vector + memory stack + Dream-process consolidation)
|
||||
```
|
||||
|
||||
## The bidirectional cascade
|
||||
|
||||
The cascade is bidirectional. Down goes demand; up comes outcome.
|
||||
|
||||
```
|
||||
DOWN — demand propagation
|
||||
factions broadcast → gamemaster arbitrates → district directors
|
||||
decompose → NPCs execute → zones spawn → events happen
|
||||
|
||||
UP — outcome signal
|
||||
NPC state reports (task-done, task-failed, need-unmet, trait-drift)
|
||||
→ district aggregate → gamemaster receives district-summary
|
||||
→ faction-satisfaction scores computed
|
||||
→ faction trait-vectors drift based on outcome
|
||||
→ next broadcast cycle shaped by last outcome cycle
|
||||
```
|
||||
|
||||
**The clean signal up the pyramid IS the training surface for the gamemaster's Dream-process** (see below). Every epoch closes on *given these broadcasts, I allocated this way, here is the aggregate outcome, here is the faction-satisfaction score.*
|
||||
|
||||
## Task cascade and bounded agency
|
||||
|
||||
Three levels of bounded agency, each with a tool-calling interface:
|
||||
|
||||
### Gamemaster's tools (against faction-demands + district-reports)
|
||||
|
||||
- `assign_district_task(district_id, task_spec, deadline)`
|
||||
- `set_faction_priority(faction, weight)`
|
||||
- `spawn_global_event(type, parameters)` — droughts, holidays, anomalies
|
||||
- `request_district_report(district_id)`
|
||||
- `arbitrate_conflicting_demands(demand_list)`
|
||||
|
||||
### District Director's tools (against gamemaster-assigned tasks)
|
||||
|
||||
- `spawn_zone(type, cells, slot_config)`
|
||||
- `assign_npc_task(npc_id, task_spec, deadline)`
|
||||
- `request_resources(district, type, quantity)`
|
||||
- `designate_meeting_point(cells, purpose)`
|
||||
- `trigger_ambient_event(cell, type)`
|
||||
- `report_to_gamemaster(metric_dict)`
|
||||
|
||||
### NPC's tools (against daily task-list + personal needs)
|
||||
|
||||
- `move_to(cell)`
|
||||
- `interact(object, intent)`
|
||||
- `occupy_zone_slot(zone_id, slot_index)`
|
||||
- `consume(item)` / `rest(duration)` / `seek_npc(npc_id, reason)`
|
||||
- `write_wall(cell, content)`
|
||||
- `defer_task(task_id, reason)` — when traits override assignment
|
||||
|
||||
Higher levels do not know lower levels' implementations. Complexity is bounded at each level. Each level may be rule-based (fast), LLM-based (rich), or hybrid — and the choice per level can evolve independently.
|
||||
|
||||
### NPC task-vs-need-vs-trait arithmetic
|
||||
|
||||
Each tick, an NPC asks *"what do I do next?"* Weighted sum:
|
||||
|
||||
| Factor | Weighted by |
|
||||
|---|---|
|
||||
| Next task in list | vocational-discipline + Sophrosyne |
|
||||
| Most salient need | need-urgency × (1 / Sophrosyne) |
|
||||
| Memory-surfaced opportunity | trait-salience of triggered memory |
|
||||
| Faction-loyalty pull | Dikaiosyne + faction-membership |
|
||||
| Beloved's distress | Philotes |
|
||||
|
||||
High-Sophrosyne NPCs prioritize tasks. High-Philotes NPCs deviate for loved ones. High-Kairos NPCs seize opportune moments. **NPCs are agents with bounded autonomy inside an assigned framework.**
|
||||
|
||||
### Zones emerge from task execution
|
||||
|
||||
Zones are **consequences of the cascade**, not pre-scripted events:
|
||||
|
||||
- maintenance-task + cooperating NPC-pair + workshop-cell → maintenance-zone spawns
|
||||
- preacher-task + caste-preacher + high-foot-traffic cell → sermon-zone spawns
|
||||
- patrol-task + enforcement-NPCs + designated route → patrol-zone spawns
|
||||
- memorial-task + unmourned-body cell → memorial-zone spawns
|
||||
|
||||
The director's zone-spawn logic is fed *by* the task-cascade. Designers script demands and rules; zones emerge.
|
||||
|
||||
### Distributed-scheduler lineage
|
||||
|
||||
At its bones: factions = job-submitters; gamemaster = global scheduler; district directors = regional schedulers; NPCs = workers with autonomy; zones = observable work. Decades of engineering literature (Kubernetes pod-scheduling, Mesos, Borg, job-queues) applies to the scheduler side.
|
||||
|
||||
## Resources
|
||||
|
||||
Resources are what the cascade allocates. Every demand the gamemaster receives targets some combination of resources; every task the cascade issues consumes some; every NPC action generates, depletes, or transfers some.
|
||||
|
||||
### The resource taxonomy
|
||||
|
||||
| Category | Examples |
|
||||
|---|---|
|
||||
| **Labor** | NPC hours per vocation × district |
|
||||
| **Material** | item-instances in cells — parts, food, tools, scrip, limbs, contraband |
|
||||
| **Spatial** | cell-capacity, hoarding-density, zone-anchor availability |
|
||||
| **Temporal** | NPC daily-hours, zone durations, tick budget |
|
||||
| **Cognitive** | LLM-slot budget, VRAM, director attention, compute-ceiling |
|
||||
| **Diegetic currencies** | dreamtime (machine-paid), memory-tokens (across-cycle), scrip (black market) |
|
||||
| **Social** | trait-signature trust, relationship-strength, faction-membership density |
|
||||
| **Attention** | player attention — the scarcest resource in the play experience |
|
||||
|
||||
### Resource flow
|
||||
|
||||
Resources do not sit still. They move through the simulation:
|
||||
|
||||
- **Generated** — labor hours accrue per tick; parts scavenged from junkyard cells; dreamtime paid out by the machine; memory-tokens earned per completed vocational hour
|
||||
- **Consumed** — tasks spend NPC-time; crafting consumes parts; zone-occupancy consumes director compute; intimacy consumes dreamtime
|
||||
- **Accumulated** — stashes grow in hidden cells; trait-vectors drift; memory stacks deepen
|
||||
- **Decayed** — item wear; NPC fatigue; district ambient-pressure dissipation; memory fade in lower tiers
|
||||
- **Transferred** — payments, theft, gifts, clasp-sharing, Memorialist cache-keeping, inheritance-across-cycles
|
||||
|
||||
Each category has its own flow characteristics; phoebe tracks them per-cell and per-NPC.
|
||||
|
||||
### District report format (district → gamemaster)
|
||||
|
||||
Each district director periodically reports up the pyramid:
|
||||
|
||||
```
|
||||
district_report {
|
||||
district_id: ...
|
||||
timestamp: ...
|
||||
labor_available: { vocation → count + hours-remaining }
|
||||
resources_on_hand: { item_type → count + quality-distribution }
|
||||
space_utilization: { cells_near_capacity, hoarding_density }
|
||||
faction_member_counts: { faction → count }
|
||||
aggregate_trait_vector: [8 floats]
|
||||
recent_salient_events: [top-N]
|
||||
pending_demand_backlog: { faction → unfulfilled-demand-count }
|
||||
player_presence: { present: bool, cell-proximity, engagement-state }
|
||||
}
|
||||
```
|
||||
|
||||
The gamemaster consumes these reports each cycle to inform allocation.
|
||||
|
||||
### Resource contention and arbitration
|
||||
|
||||
When multiple faction-demands target the same resources (hivemind wants enforcement-labor; memorialists want mourning-labor; scavengers want courier-labor; Anthropic-faction wants wall-writing-labor), the gamemaster arbitrates by:
|
||||
|
||||
- **Faction priority weights** (tunable per-faction)
|
||||
- **Deadline urgency** (imminent outranks distant; decay curves per urgency class)
|
||||
- **Historical satisfaction** (chronically under-served factions drift pressure-up)
|
||||
- **Player proximity** (near-player biases toward zone-types the player engages with)
|
||||
- **Global budget saturation** (at cap, drop lowest-priority; age out stale demands)
|
||||
|
||||
**This is the gamemaster's core job.** Its Dream-process learns to do it better over time.
|
||||
|
||||
### Player as resource flow (both directions)
|
||||
|
||||
The player consumes:
|
||||
|
||||
- **NPC time** (when in a zone-slot with player)
|
||||
- **Director compute** (dialog slots, memory-scoping, prompt construction)
|
||||
- **Cell occupancy** (physical presence uses space)
|
||||
- **Zone anchors** (player-proximity biases zone spawns, consuming zone-slot budget)
|
||||
|
||||
The player produces:
|
||||
|
||||
- **Narrative pressure** (player-disturbance-faction broadcasts emerge from novel actions)
|
||||
- **Trait-salient memories** in NPCs they interact with
|
||||
- **District-state perturbations** (ripples through subsequent cascade cycles)
|
||||
|
||||
**The player is a resource flow in both directions, not a privileged observer.**
|
||||
|
||||
### Phoebe as resource ledger
|
||||
|
||||
All resource state lives in phoebe:
|
||||
|
||||
- **Per-cell ledger** — item-instances, occupancy, ambient metadata
|
||||
- **Per-NPC ledger** — labor-hours-remaining, needs-state, task-list, memory-stack, stash-references
|
||||
- **Per-district aggregate** — computed from per-cell and per-NPC rows
|
||||
- **Per-faction state** — membership counts, trait-vectors, demand-queues, satisfaction-history
|
||||
- **Global ledger** — currency supplies, compute budget, tick counter, epoch marker
|
||||
|
||||
Resource queries are SQL by default (via pgnats). Gamemaster and directors read; NPCs' world-actions write via the NATS → phoebe pipeline.
|
||||
|
||||
## Zone spawn cadence
|
||||
|
||||
Zone spawn-cadence is the game's **pulse rate**. Tuning it = tuning the emotional tempo. Too fast: walks home get interrupted; melancholy-intimacy collapses. Too slow: city feels dead.
|
||||
|
||||
### Three layered mechanisms (all compute-budget capped)
|
||||
|
||||
1. **Demand queue (faction-driven).** Factions write prioritized demands; gamemaster processes top-N by priority + compute-cost per tick. Stale demands age out.
|
||||
2. **Pressure gradients (emergent ambient).** Cells accumulate ambient pressure (idle NPCs wanting connection, market activity accreting, brawl potential at cantinas). Threshold-crossings spawn zones organically.
|
||||
3. **Player-proximity densification.** Multiplier layered over both. Near the player, queue-processing and pressure-release run faster. Distant districts tick at background rate.
|
||||
|
||||
### Tuning dials
|
||||
|
||||
| Dial | Controls | v1 starting value |
|
||||
|---|---|---|
|
||||
| `gamemaster_tick_hz` | Evaluation frequency | ~1 Hz |
|
||||
| `max_zones_per_district` | Concurrent zone cap | 8–12 |
|
||||
| `max_llm_slots_citywide` | Global LLM-dialog concurrency | 5–10 |
|
||||
| `per_npc_cooldown` | Min gap between NPC zone-participations | 30–60s |
|
||||
| `faction_weight[f]` | Per-faction priority multiplier | hivemind 1.0, others 0.3–1.2 |
|
||||
| `daily_cycle_curve[zone_type]` | Time-of-day multiplier | morning 1.2, night 0.3 |
|
||||
| `player_proximity_multiplier` | Density boost near player | 2–3× |
|
||||
| `district_aletheia_dampening` | Aletheia suppression of overseer-zones | 0.0–0.8 |
|
||||
| `zone_type_cooldown[cell, type]` | Local cooldown after dissolution | 2–15 min |
|
||||
|
||||
### Daily pulse (target v1 feel)
|
||||
|
||||
- **Morning** — ambient, market, maintenance. Low density.
|
||||
- **Midday** — productivity-check overseer-zones. Tension rises.
|
||||
- **Evening** — conversation-zone peak. Richest dialog density. Relationships deepen.
|
||||
- **Night** — sparse. Clasp-possibility, shadow, intimate walking. The walks home happen here.
|
||||
- **Burst events** — short-term spike in affected district; settles over the following hour.
|
||||
|
||||
## The player as perturbation
|
||||
|
||||
The player is **not above the scheduler.** The player is a finite attention-unit injected into it. Every player-NPC interaction:
|
||||
|
||||
- Pulls the NPC out of its scheduled tasks for the duration
|
||||
- Consumes director compute (dialog-LLM slot, sampling knobs, memory pulls)
|
||||
- Degrades the district's quota-fulfillment in the next report
|
||||
|
||||
**The player is angel and chaos simultaneously.**
|
||||
|
||||
- *Per-NPC scale (angel)* — you help a stumbling NPC; they survive their cycle; their Philotes-toward-you consolidates; they may clasp with you, die for you, remember you across cycles.
|
||||
- *Per-district scale (chaos)* — your time-consumption raised the aggregate failure-rate; someone else broke this cycle; a raid spawned that otherwise would not have; ambient desperation rose.
|
||||
|
||||
Both are true *simultaneously*. This is not a moral dilemma to resolve; it is **the structure of finite agency in a scarcity economy.** You cannot help everyone; helping anyone is a choice of whom to not help elsewhere.
|
||||
|
||||
### Thematic claims become literal economics
|
||||
|
||||
- *"Time as the scarcest resource"* — every minute with a beloved is a minute not in the Black Board queue, a minute the district's quota is missing.
|
||||
- *"You earn the time to love her by stealing it from the machine"* — the time-theft is literal in the scheduler; the machine detects the quota-miss.
|
||||
- *"The clasp is wage theft / economic sabotage"* — two consciousnesses in one body produce one body's throughput; district report degrades permanently; clasped couples are *statistical anomalies* in the scheduler, which is exactly how the hivemind will detect them.
|
||||
|
||||
**The critique is the simulation. No separate narrative system needed.**
|
||||
|
||||
### The player has tasks and needs too
|
||||
|
||||
The player is an NPC with vocation, tasks, needs. Time helping others = own tasks fail = own quotas missed = own enforcement-pressure rises = own death and reinstantiation arrive sooner. *The player IS the system they are deviating from.*
|
||||
|
||||
## LLM tiering and voice fidelity
|
||||
|
||||
### Three model-tiers
|
||||
|
||||
| Tier | Model | Role |
|
||||
|---|---|---|
|
||||
| **Casual / most NPCs** | Small (3–8B, trait-LoRA'd, knob-steered) | Most dialog slots — ambient conversation, routine speech, casual turns |
|
||||
| **Deep / mythic moments** | Theia 70B | Clasp confessions, mentor speech, ritual exchange, NPC-internal deliberation at high stakes |
|
||||
| **Hivemind / antagonist** | Claude-as-API (future integration) | District-summary broadcasts, overseer directives, anamnesis dialog |
|
||||
|
||||
Three tiers, three call patterns. Cognitive distance between hivemind and citizens is also model-architecture distance.
|
||||
|
||||
### LLM is guest at slot, not host of system
|
||||
|
||||
- Zone director composes the prompt (content knobs + sampling knobs + output schema)
|
||||
- LLM generates one slot-turn (structured JSON output)
|
||||
- Director dispatches the output back to the zone
|
||||
|
||||
The LLM never "runs" an NPC. It speaks for a slot, one turn at a time, in trait-scoped context.
|
||||
|
||||
### Structured-prompt DSL (knob-steered)
|
||||
|
||||
```
|
||||
<|role|> caste-preacher
|
||||
<|trait_vector|> Sophrosyne 0.8, Dikaiosyne-miscalibrated 0.7, Aletheia 0.1
|
||||
<|affect_state|> measured concern
|
||||
<|memory_scope|> [last interaction with this NPC 3 days ago, suspicious;
|
||||
recent wall-reading; district mood]
|
||||
<|turn_intent|> gently warn about productivity concerns
|
||||
<|zone_context|> morning, market-square, 4 NPCs present, 1 drone overhead
|
||||
<|output_schema|> { dialog_text, gesture_cue, trait_activation,
|
||||
affect_shift, memory_write_candidates, end_turn_flag }
|
||||
```
|
||||
|
||||
Small models excel at this surface because it is *instruction-following*, not generic generation.
|
||||
|
||||
### Trait-LoRAs
|
||||
|
||||
**vLLM multi-LoRA serving**: one base, N LoRAs loaded simultaneously, hot-swappable per request.
|
||||
|
||||
| Option | Count | Composition | Trade-off |
|
||||
|---|---|---|---|
|
||||
| **A. Pure-trait** | 8 (one per Hellenic virtue) | weighted blend over trait-vector | Cleanest ontology; blend quality unknown at inference |
|
||||
| **B. Register-LoRAs** | 4–6 | selected by slot-type + trait-in-prompt | Training-tractable; cruder ontology |
|
||||
| **C. Preset-persona** | 8–12 | per NPC class | High individual quality; rigid; no drift |
|
||||
|
||||
**v1 recommendation: start with B** (register-LoRAs) for training-tractability, layer toward A (trait-LoRAs proper) in v2 as data accrues.
|
||||
|
||||
### Training-data strategy
|
||||
|
||||
1. **Literary derivation** — Proust (Mnemosyne), Plato (Aletheia), Tacitus (Dikaiosyne-miscalibrated), Ishiguro (Sophrosyne + Philotes). Labeled corpus anchored in existing prose.
|
||||
2. **Synthetic teacher-student** — Qwen3.5-27B teacher generates trait-labeled samples; small base learns via LoRA. Existing r0 → r1 pipeline with trait-tags as composition axis.
|
||||
3. **Gameplay-accrued** — logged gameworld dialog with trait-vectors accrues in phoebe; periodic LoRA retraining. *This is where the Anthropic research partnership becomes architecturally relevant.*
|
||||
|
||||
### Tooling synergy with nyx-training
|
||||
|
||||
Same Unsloth pipeline. Same teacher-student distillation. Same tagged-generation. **One tooling investment, two deliverables.**
|
||||
|
||||
## Runtime sampling knobs
|
||||
|
||||
Temperature, top-P, top-K, repetition-penalty are usually set once and forgotten. In nimmerworld they are **per-turn director-controlled levers** — part of the same prompt-composition as content knobs.
|
||||
|
||||
### The knobs and what they shape
|
||||
|
||||
- **Temperature** — determinism vs. creativity
|
||||
- **Top-P** — lexical range
|
||||
- **Top-K** — tail cutoff
|
||||
- **Repetition penalty** — novelty vs. ritual-repetition
|
||||
- **Min-P** — adaptive variety
|
||||
|
||||
Sampling shapes *how* speech sounds (rhythm, surprise, predictability), not *what* it says. **Orthogonal to LoRA.** Together they give the director a full voice-palette.
|
||||
|
||||
### Scene-to-sampling mapping (starter table)
|
||||
|
||||
| Scene | temp | top-P | rep-penalty |
|
||||
|---|---|---|---|
|
||||
| Caste-preacher sermon | 0.3 | 0.6 | low |
|
||||
| Drunk scavenger at bar | 1.1 | 0.95 | high |
|
||||
| Hivemind broadcast | 0.2 | 0.5 | very low |
|
||||
| Clasp confession peak | 0.85 | 0.92 | medium |
|
||||
| NPC giving directions | 0.4 | 0.7 | medium |
|
||||
| Ritual ecstasy (dreamworld) | 1.3 | 0.99 | high |
|
||||
| Memorialist chant | 0.4 | 0.65 | very low |
|
||||
| Aletheia-waker whispering heresy | 0.7 | 0.88 | medium |
|
||||
|
||||
### Trait-vector → sampling derivation
|
||||
|
||||
Baseline sampling is derived from NPC's current trait-vector:
|
||||
|
||||
- High Sophrosyne → lower temp (measured)
|
||||
- Low Sophrosyne → higher temp (loose)
|
||||
- High Kairos → higher top-P (catching unexpected tokens)
|
||||
- High Mnemosyne → lower repetition penalty (comfortable returning to phrases)
|
||||
- High Aletheia → moderate-high temp (willing to surface disclosures)
|
||||
- High Moira → lower top-P (pattern-constrained)
|
||||
|
||||
Affect-state modulates on baseline; zone-type priors apply per zone. Sampling becomes a feature of the character-simulation, not a static config.
|
||||
|
||||
## Reflexive gamemaster Dream-process
|
||||
|
||||
**Every mind in the system has a Dream-process.** NPCs consolidate slot-scoped events into memory. Clasped player-inhabitants consolidate shared experience. The hivemind consolidates district summaries. **And the gamemaster itself consolidates its own orchestration decisions into a better policy.**
|
||||
|
||||
### Epoch cycle
|
||||
|
||||
```
|
||||
WITHIN AN EPOCH:
|
||||
gamemaster orchestrates → zones spawn → NPCs execute →
|
||||
zones dissolve → memory-writes consolidate →
|
||||
outcomes ripple back up to district reports
|
||||
ALL events publish to NATS, log to phoebe via pgnats
|
||||
|
||||
EPOCH BOUNDARY (every N game-hours or days):
|
||||
- aggregate gate-decisions + outcomes from phoebe
|
||||
- apply quality/reward signal
|
||||
- train gamemaster's policy on (context, decision, outcome) triples
|
||||
- probe-evaluate the updated policy
|
||||
- shadow-deploy for validation
|
||||
- swap in as active gamemaster
|
||||
|
||||
NEXT EPOCH:
|
||||
improved gamemaster orchestrates...
|
||||
```
|
||||
|
||||
### What the policy learns
|
||||
|
||||
- Zone-spawn decisions given district state + faction pressure + time + proximity
|
||||
- Faction-demand arbitration weights
|
||||
- Director/overseer dispatch decisions
|
||||
- Slot-assignment decisions
|
||||
- Sampling-knob defaults per trait × affect × zone-type
|
||||
- LoRA-blend weights per trait-vector
|
||||
- Pacing modulation
|
||||
|
||||
**Modular policies per decision-surface** (easier to version and debug) beats one monolithic policy.
|
||||
|
||||
### Discipline-questions (risks to name)
|
||||
|
||||
1. **Reward signal problem.** What do we train toward? Player-engagement can be gamed. Trait-drift coherence needs a metric. **Defining the reward function is the hardest part of the loop.** Needs hand-written guardrails alongside automated signals.
|
||||
2. **Catastrophic forgetting.** Rehearsal buffers, EWC, or periodic anchored-corpus training required.
|
||||
3. **Feedback-loop drift.** Reinforcing pathological convergence is possible. Human-in-the-loop audits at regular intervals.
|
||||
4. **Reproducibility vs. live-learning.** Version-pin per ship-release; continuous-learn in staging; promote to production on audit.
|
||||
5. **Compute cost.** Training competes with inference. Train during low-play hours; hot-swap policies at epoch boundaries.
|
||||
6. **Privacy.** Dreamworld-ephemeral / gameworld-persistent schema governs what can feed training. Dreamworld content never touches the policy corpus.
|
||||
|
||||
## Key moves (consolidated)
|
||||
|
||||
- **LLM is guest at slot, not host of system.**
|
||||
- **Mixed-fidelity voices by default** — small-LoRA-steered + scripted/generic + occasional Theia-70B + Claude-API-hivemind.
|
||||
- **Perception IS memory.**
|
||||
- **Hivemind is a faction, not a conductor.**
|
||||
- **Faction-broadcast is universal primitive** — weather, scarcity, cosmos, external research partners all flow through it.
|
||||
- **Zones emerge from task execution** — designers script demands + rules, not events.
|
||||
- **The player is a perturbation on the scheduler** — angel at NPC scale, chaos at district scale.
|
||||
- **Ephemeral/persistent is one zone-type bit** — dreamworld/gameworld privacy follows.
|
||||
- **Sampling-knobs are per-turn director levers** — not static API config.
|
||||
- **The gamemaster has its own Dream-process** — the architecture is reflexive.
|
||||
- **The cascade is bidirectional** — demand down, outcome up.
|
||||
|
||||
## Compute allocation
|
||||
|
||||
- Active zones in a 100-NPC city at any moment: ~5–15 with LLM-dialog slots
|
||||
- **Theia (70B)** — deep dialog slots + tier-1 moments (few concurrent)
|
||||
- **Small model (3–8B) with trait-LoRAs** — the majority of dialog slots
|
||||
- **Saturn (small classifiers)** — scripted-voice selection, trait-salience scoring, packet routing, director subroutines
|
||||
- **Director / overseer logic** — deterministic script + small classifiers; no LLM for orchestration
|
||||
- **Claude-as-API (future integration)** — hivemind-broadcast tier
|
||||
|
||||
## Mapping to phoebe task list
|
||||
|
||||
- **Thalamus (NATS orchestration)** = gamemaster + arbitration substrate + gamemaster's Dream-process substrate
|
||||
- **Specialist composition system** = overseers + directors + NPC-minds as composable profiles
|
||||
- **NPC schema for phoebe** = trait-vector + memory stack + slot-occupancy + task-list + needs state
|
||||
- **NATS namespace registry** = zone topics + faction broadcast topics + district report topics
|
||||
- **pgnats on phoebe-dev** = phoebe as first-class actor for memory-writes on zone close, decision-logs, faction-satisfaction tracking
|
||||
- **math cells as first harness/MCP test bed** = zone-slot-memory primitives
|
||||
- **r0 → r1 generation pipeline** = trait-LoRA training-data generation (shared with nyx-training)
|
||||
- **Adopt Unsloth training patterns** = LoRA + gamemaster-policy training infrastructure
|
||||
- **Probe-to-phoebe pipeline** = LoRA evaluation + gamemaster-policy evaluation
|
||||
|
||||
## What this retires
|
||||
|
||||
- NPC attention / interaction / discovery bubbles as first-class primitives (DESING-VISION §449). → replaced by zone slot-occupancy.
|
||||
- Geometric perception (cone, radius, LOS) as the perception model. → replaced by subscriber-based event emission with trait-salience filtering.
|
||||
- LLM-per-NPC or LLM-per-action. → replaced by LLM-per-slot-per-turn, mixed with scripted/generic voices.
|
||||
- Static sampling parameters as API configuration. → replaced by per-turn director-composed sampling knobs.
|
||||
- Pre-scripted zones / events. → replaced by zones emerging from task-execution meeting NPC-proximity.
|
||||
- Single-purpose randomness subsystems. → replaced by factions-as-demand-sources (weather / scarcity / cosmos / external = same primitive).
|
||||
- Static gamemaster policy. → replaced by the reflexive Dream-process learning loop (v2+).
|
||||
|
||||
## Open questions
|
||||
|
||||
- Reward function for the gamemaster's Dream-process — what combines trait-drift coherence + faction-satisfaction + player-engagement + burnout-rate + aesthetic-register-fit into one training signal?
|
||||
- Zone spawn-cadence tuning algorithm at v1 (rule-based) → v2 (policy-learned) transition point.
|
||||
- LoRA-blend vs. single-LoRA-selection semantics at inference.
|
||||
- LoRA rank selection — budget/quality trade-off.
|
||||
- Sampling-knob heuristics — where to start; how to learn refinements.
|
||||
- Zone overlap policy — can one NPC occupy slots in two zones simultaneously?
|
||||
- Zone-to-zone handoff (walking out of conversation into a brawl).
|
||||
- Mobile zone boundaries (patrols, escorts, pursuit).
|
||||
- Slot-capacity elasticity — can a zone grow slots dynamically?
|
||||
- Anthropic-faction's broadcast cadence + arbitration weight.
|
||||
- Player-dialog handling — route player text through a player-trait-LoRA for style-coherence, or bypass the LLM entirely?
|
||||
- Demand-arbitration algorithm inside the gamemaster (rule-based v1 shape).
|
||||
- Director/overseer spawn ownership per model class.
|
||||
|
||||
---
|
||||
|
||||
**Version:** 0.2 | **Created:** 2026-04-24 | **Updated:** 2026-04-24
|
||||
|
||||
*v0.2 (2026-04-24, afternoon) expands the v0.1 architecture with: factions-as-universal-demand-source (weather / scarcity / storms / fires / Anthropic all as factions), the bidirectional cascade (demand down, outcome up), the task-cascade and bounded agency (three-level tool-calling scheduler, NPC trait-task-need arithmetic, zones emerging from task execution), the resource taxonomy and flow (labor / material / spatial / temporal / cognitive / diegetic-currencies / social / attention; district report format; contention arbitration; phoebe as ledger), zone spawn-cadence mechanisms and tuning dials, the player as perturbation (angel/chaos at two scales, moral economy emerging from scheduler), LLM tiering (small + Theia + Claude-API) with trait-LoRAs and structured-prompt DSL, runtime sampling knobs as per-turn director levers, and the reflexive gamemaster Dream-process (epoch-cycle policy training).*
|
||||
|
||||
*Captured live from dafit–chrysalis dialogue, 2026-04-24. Companion to DESING-VISION.md; supersedes bubble-based perception, scripted-zone spawn, static-sampling-config, static-gamemaster-policy, and per-NPC-LLM where prior sections implied those patterns.*
|
||||
Reference in New Issue
Block a user