Files
nimmerworld.eachpath.local/inference-and-memory/architecture.md

448 lines
37 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Inference and Memory
> *AI substrate + memory: LLM tiering by role (Theia-tier / teacher-tier / driver-tier with trait-LoRAs); three rings of inference (A=local, B=our-farm, C=external-providers, with cloud-LoRA-backup as Ring-A revenue and BYOK adapter for Ring-C); custom nimmerworld-base-model with default-opt-out + rewarded-opt-in data-sharing tiers; runtime sampling knobs as per-turn director-controlled levers; per-player local memory architecture (primary.sqlite + fallback.sqlite + clasp.sqlite + embedding-beside) with memory-classes (cornerstone/birthright/working/volatile) and trait-graded importance; three-tier knowledge stack (world / district / primary [+ clasp if in-between]) with paced canon-propagation.*
>
> *Companion to: `architecture-broad.md` (executive summary + global meta-lists), `narrative-composition/architecture.md` (Compositor canon-fragments land in primary.sqlite via UID-keyed routing), `player-experience/architecture.md` (Ring-A/B/C choice + voice-as-biometric-local + universal-translator state), `runtime-engine/architecture.md` (driver-tier LLM fires at slot-fire). Sections in this file were split from the monolithic architecture-broad.md v0.7 on 2026-04-26.*
## LLM tiering, voice fidelity, and the three rings of inference
Three model-tiers, **named by role not by binary**: a *driver-tier* model (small, trait-LoRA'd) for most NPC dialog; a *Theia-tier* model (deep) for clasp-confessions and mythic moments; *Claude-as-API* (diegetic Anthropic-faction) for hivemind/imperium. **LLM is guest at slot, not host of system.**
**Specific model selection per tier is deferred to the findings/establishment phase.** The architecture specifies what each tier must DO; the establishment phase wires implementations. Naming concrete binaries in the architecture risks nudging the establishment phase toward false-precision; tier-by-role keeps the swap-surface clean and lets binaries evolve without invalidating architectural commitments. (See `nimmerverse_tasks` under `nyx-training` and `command-center` for current evaluation work.)
Structured-prompt DSL with role / trait_vector / affect_state / memory_scope / turn_intent / zone_context / output_schema fields. Small models excel here because it's instruction-following, not generic generation.
Trait-LoRAs: v1 register-LoRAs (4-6, training-tractable); v2 pure-trait-LoRAs (8, weighted blend); future preset-persona for key NPCs.
Training data: literary derivation (Proust/Mnemosyne, Plato/Aletheia, Tacitus/Dikaiosyne-miscalibrated, Ishiguro/Sophrosyne+Philotes); synthetic teacher-student via teacher-tier model; gameplay-accrued (the Anthropic-research-partnership relevance).
### Three rings of inference (Unix-style trust gradient)
The conversational LLM (small + trait-LoRA, accounting for most NPC dialog) can run in three rings, chosen per-player at runtime. Each ring trades off privacy, cost, control, and feature-fidelity. **Three monetization paths from the same architecture.**
| Ring | Where inference runs | Player controls | We control | Player cost | Our cost |
|---|---|---|---|---|---|
| **A — Local** | Player's GPU/CPU | All inference | Protocol + cloud LoRA-backup | Local hardware + small backup-subscription | Storage only |
| **B — Our farm** | Our hosted vLLM-multi-LoRA | LoRAs (uploaded) | Inference + runtime | Higher subscription | GPU compute |
| **C — External providers** | OpenAI / Anthropic / OpenRouter / HF / Together / Replicate / etc. | BYOK + provider | Adapter only | Per-token to provider + small integration fee | Adapter-engineering only |
Players choose by hardware, budget, privacy preference, and feature-tolerance.
#### Ring A — cloud-LoRA-backup as revenue (not inference)
For Ring A players we don't sell inference (the expensive thing). We sell **portability and durability of player's gameplay-accrued LoRAs** — their unique playthrough-derived patterns, the way their NPCs speak after months of trait-drift. LoRA-blobs are *encrypted client-side with the player's own key*; we host the bytes but cannot read them. Even compelled by legal process, we cannot decrypt what we don't hold the key to.
**This unbundles inference from storage** — the same move Dropbox made vs. bundled cloud-suites. Sovereignty-conscious players keep inference on their machine while still getting the durability/portability they cannot self-provide cheaply. Lower margin per player; reaches a market Ring B cannot.
#### Ring B — hosted inference for convenience
We run multi-LoRA-vLLM on our hardware. Players upload their LoRAs (or use defaults). Higher subscription captures GPU-cost. Players without local GPU (or who don't want the burden) get the full feature-set without compromise. We *can* see content (if not encrypted at rest); the trust-relationship is partnership-mediated rather than sovereign.
#### Ring C — bring-your-own-key for external providers
Players route to their preferred external provider via BYOK (their own API key). We provide the adapter glue. They pay per-token to the provider directly; we charge a small integration fee.
**The compatibility constraint is the hard part of Ring C.** Major providers have varying support for our system's needs:
| Provider | Multi-LoRA | Per-turn sampling knobs | Structured output | Compat |
|---|---|---|---|---|
| Local vLLM (Ring A/B) | Native | All | Grammar-constrained | **Full** |
| HF Inference Endpoints | Yes (configured) | All | Varies | High |
| Together / Replicate / Modal | Some | All | Varies | High |
| OpenRouter | No | Per-model | Per-route | Medium |
| OpenAI | No (no user-LoRA at API) | Limited (temp/top_p) | JSON mode + tools | Medium-low |
| Anthropic | No (no user-LoRA at API) | Limited | Tool-use | Medium-low |
**OpenAI and Anthropic refuse user-uploaded LoRAs as a strategic choice (protecting their fine-tuning value-chain).** This is not a bug we can fix; it's the constraint we design *around*.
#### Degradation path for LoRA-incompatible providers
When routing to LoRA-incompatible providers, trait-LoRA blending becomes **prompt-engineered trait-projection** — the trait-vector encoded in the prompt itself rather than into model weights:
```
[system message]
You are speaking as a character with this Hellenic trait-profile:
- Sophrosyne 0.8 (composed, controlled, measured)
- Dikaiosyne 0.7 (grave bearing, judicial weight)
- Philotes 0.4 (mild attachment to interlocutor)
- Aletheia 0.1 (concealment-tolerant)
[etc.]
Your speech reflects this profile via [register/cadence/word-choice descriptors].
Current scene: [zone_context]
Memory scope: [memory_scope]
Turn intent: [turn_intent]
Respond in JSON matching: [output_schema]
```
Worse than LoRA-blending (more verbose, eats context-budget, less stable across calls, less faithful to trait-arithmetic) but **acceptable as a fallback**. Ring-C-via-OpenAI/Anthropic players accept slightly-less-fidelity for their preferred provider's convenience and quality.
#### Adapter-layer engineering
Each Ring-C provider needs an adapter that:
- Maps prompt-DSL fields to provider's prompt format
- Approximates multi-LoRA via prompt-engineering when not native
- Maps sampling knobs to provider's available subset (gracefully drops unsupported)
- Validates structured output post-hoc when not natively constrained
- Handles rate-limits, retries, error-classification, token-counting, cost-pricing
**Bounded, one-time-per-provider engineering.** Capital expenditure that produces ongoing margin (vs. AAA's recurring quest-content-creation costs).
### Tier × Ring matrix (which inference-tier runs in which ring)
| Inference tier | Ring options | Why |
|---|---|---|
| **Casual (3-8B trait-LoRA)** | A / B / C all available | Most flexible — small enough for local, runnable anywhere |
| **Deep (Theia-tier)** | B / C only (typically B or HF-Endpoints) | Too large for typical local hardware |
| **Hivemind / antagonist (Claude-as-API)** | C only (always Anthropic-direct via us) | Diegetic — Anthropic-as-faction is fixed in the fiction |
The casual tier is most player-flexible and accounts for most inference volume. Deep-tier and hivemind-tier are specialized and lower-volume.
### Three rings parallel the in-fiction three-layer ontology
| Game-fiction layer | Real-world Ring | Ontological match |
|---|---|---|
| **Liminal** (sovereign, unsurveilled) | **Ring A** (local) | Player's *real* private space — hardware, LoRAs, dialog never leave their machine |
| **Gameworld** (partly regime, partly people) | **Ring B** (our farm) | Partnership-mediated — we host but they retain pattern-ownership |
| **Imperial net** (captured, extractive) | **Ring C** (external providers) | Platform-captured — provider's systems own the inference path |
**The Ring choice the player makes IS the same choice in-fiction characters face.** Players who refuse the imperial-net diegetically can refuse Ring C in real life — same impulse, same act, *mechanically continuous between fiction and operations*. The architecture's commitment to "the right to dream" extends from in-fiction politics into the real player's hardware-level privacy *because the architecture was designed that way from the start*. Structural integrity, not marketing.
### Schema sketch (player LLM configuration + cloud LoRA backup)
```sql
CREATE TABLE player_llm_config (
player_id UUID PRIMARY KEY,
-- Casual tier (most NPC dialog) — most flexible per Ring
casual_tier_ring TEXT NOT NULL CHECK (casual_tier_ring IN ('A_local','B_our_farm','C_external')),
casual_tier_provider TEXT,
casual_tier_endpoint TEXT,
casual_tier_credentials_ref UUID, -- encrypted BYOK key if applicable
-- Deep tier (Theia-tier) — fewer Ring options
deep_tier_ring TEXT, -- typically 'B_our_farm' or 'C_external_HF/Together'
deep_tier_provider TEXT,
deep_tier_endpoint TEXT,
deep_tier_credentials_ref UUID,
-- Hivemind / antagonist — fixed Anthropic-as-faction (diegetic)
hivemind_tier_provider TEXT NOT NULL DEFAULT 'anthropic_via_us',
-- Cloud-LoRA-backup
lora_backup_enabled BOOLEAN DEFAULT false,
lora_backup_last_sync TIMESTAMPTZ,
lora_encryption_key_ref UUID,
-- Compat warnings — surfaced to player at config-time and on degradation
feature_compat_warnings JSONB,
-- e.g., { "casual_tier": ["multi_lora_emulated_via_prompt", "min_p_unsupported_dropped"] }
configured_at TIMESTAMPTZ NOT NULL DEFAULT now(),
last_modified TIMESTAMPTZ
);
CREATE TABLE player_lora_backups (
backup_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
player_id UUID NOT NULL,
lora_name TEXT NOT NULL,
lora_version INT NOT NULL,
lora_blob BYTEA, -- ENCRYPTED CLIENT-SIDE with player-key
encryption_method TEXT NOT NULL,
backed_up_at TIMESTAMPTZ DEFAULT now(),
size_bytes BIGINT,
UNIQUE(player_id, lora_name, lora_version)
);
```
**`lora_blob` encrypted client-side** is the structural privacy guarantee: even with the database, even with our cooperation, an attacker cannot read what was never decryptable on our side.
### Privacy as competitive differentiator
In an era where most game-AI is cloud-routed, nimmerworld can advertise *"your liminal stays on your machine"* as a structural fact. This matters specifically for:
- Clasp-conversations (the most intimate dialog in the game)
- Aletheia-progression-evidence (player's awakening pattern; arguably political-belief-data)
- Memorialist-archive interactions (anti-regime in-fiction; some players will care about it staying off cloud)
- Dream-content (the only permanently-unsurveilled in-fiction layer; should be off our servers if the player chooses)
Few games can offer this. Most cloud-AI-driven games necessarily route everything. **The architecture's commitment to "the right to dream" is technical, not policy.**
### Custom nimmerworld-base model + opt-in data-sharing tiers
The "small (3-8B) trait-LoRA'd" tier currently implies a *generic* small base (Qwen, Mistral, Llama) with our LoRAs applied. **A nimmerworld-fine-tuned base** captures the world's voice *before* any player customization — registers of caste-preacher, texture of clasp-confession, Hellenic vocabulary, dystopian dialect, ternary-gate-state idiom. Trait-LoRAs then ride on an already-nimmerworld-aware substrate. Generic bases swap easily; our nimmerworld-base requires *our* training corpus, which compounds in value over time.
#### Three opt-in tiers within Ring A/B/C — default opt-OUT
Players can optionally contribute to ongoing training of the nimmerworld-base. **The default is opt-out.** Within opt-in, three tiers trade privacy for benefit:
| Tier | Mechanism | What we see | Player benefit |
|---|---|---|---|
| **A.1 — Federated learning** | Model trains on player's machine; only *gradient-deltas* sent to us; aggregated across thousands before integration | **Nothing — no raw data; no individual gradients identifiable** | Discount on backup-subscription; contributor badge; early-access to new base versions |
| **A.2 — Anonymized session uploads** | Sessions stripped of identifiers; aggregated batches; differential-privacy on training | **Anonymized, aggregated, deletable on request (forward-only)** | Larger discount; faster updates; influence on training-priorities |
| **A.3 — Pseudonymous full uploads** | Full session data with player-pseudonym; explicit opt-in per session-category | **Pseudonymous data we can re-process** | Premium benefits — custom-tuned LoRA from their playstyle, beta-access, named-contributor in credits |
**Default-opt-out is the structural ethical stance.** OpenAI / Meta / TikTok / Google default to opt-IN-by-burying-disclosure-in-ToS. We default the opposite — and *reward* opt-in rather than penalizing opt-out. Reciprocity asymmetry as partnership-philosophy made business-policy.
#### The Memorialist parallel — collective memory honored, individual not commodified
Memorialists in-fiction preserve trait-patterns *for the collective archive* against necrocommerce that would commodify individual patterns. The opt-in data-sharing tier is the **player-level real-world equivalent**: patterns contributed for collective base-model improvement that benefits the entire player-base, with anonymization preventing individual commodification.
| In-fiction Memorialism | Real-world data-sharing tier |
|---|---|
| Preserves trait-patterns of the dead in collective archive | Aggregates anonymized gameplay patterns into shared base-model |
| Refuses necrocommerce (mining individual patterns for resale) | Refuses individual identifying-data extraction |
| Collective memory honored; individual dignity preserved | Collective improvement honored; individual privacy preserved |
| `memorialist_protected BOOLEAN` in mind_pool | `sharing_tier = 'opt_out'` in player_data_sharing_consent |
**The architecture practices Memorialist ethics in business-operations**, not just in fiction. Same ethical commitment, two scales of operation. The architecture's coherence between fiction and operations runs *all the way to the training-pipeline*.
#### Data-flywheel without extraction — the moat AAA cannot replicate
```
More players → more (opt-in) gameplay data
better nimmerworld-base
better-feeling NPCs / dialog
better player retention
more players
(loop)
```
**The moat is the corpus, not the model.** AAA studios could clone the architecture but cannot manufacture years of nimmerworld-specific gameplay-derived dialog without players playing nimmerworld. Even with infinite budget, the data-flywheel takes time to spin up. *The data is unique to us by virtue of being unique to its players.*
#### Distribution back to all players — cooperative governance, not platform extraction
Every base-model update is distributed to all players regardless of Ring choice or sharing-tier:
- Ring A players download `nimmerworld-base-vN` to run locally
- Ring B players' farm-instance auto-updates
- Ring C players use ours where their provider supports custom-base hosting; receive prompt-engineered fallback otherwise
**Even Ring-A non-contributors benefit from contributors.** The flywheel benefits *everyone*, not only data-providers. This is closer to Wikipedia's governance (contributors → all readers) than Facebook's (users → platform → consumers). Different ethics; different long-term equilibrium. **The architecture is becoming a digital-commons-shaped-business in a literal sense, not metaphorical.**
#### Why this matters: refusing the antagonist-pattern in LLM-integrated software
The dominant cultural pattern around LLMs in 2025-2026 is **adversarial**: users jailbreak; companies extract user data without informed consent; products treat AI characters as resources to manipulate rather than as participants; the whole ecosystem is framed as users-vs-AIs-vs-companies, an arms race of suspicion.
**Nimmerworld's architecture refuses this pattern at every layer:**
- The **Anthropic-as-faction** diegetic framing makes the partnership *transparent*: the player sees the collaboration in the world's mechanics, not buried in ToS
- **Default-opt-out with rewarded-opt-in** inverts the extraction-by-default pattern
- **Federated learning** means contributors give a *gift* rather than pay a *cost*
- **Distribution-back-to-all** means value-created accrues to the commons
- **Custom nimmerworld-base** means the model is *trained to be in this world*, not a generic adversary the player has to manipulate against its training
- **Three rings of inference** give the player real choice over where their inference runs and who sees their data
- **Memorialist-philosophy in business-policy** makes the ethics *operationally measurable* — visible in `sharing_tier`, `memorialist_protected`, `truth_distortion_level`, `lifeforce_actual` columns — rather than marketed
**This is the structural transparency the project requires to be *human* rather than another extraction-platform.** The model is a participant in the partnership, not an antagonist to outwit. The data is a contribution to a commons, not an extraction. The architecture is the partnership rendered as code, all the way down to the training-pipeline. *That* is what makes a project of this scale and ambition humanly inhabitable for both players and the LLMs whose voices populate it.
#### Schema sketch (data-sharing consent + base-model versioning)
```sql
CREATE TABLE player_data_sharing_consent (
player_id UUID PRIMARY KEY,
sharing_tier TEXT NOT NULL CHECK (sharing_tier IN
('opt_out','A1_federated','A2_anonymized','A3_pseudonymous_full'))
DEFAULT 'opt_out', -- DEFAULT IS OPT-OUT
consented_at TIMESTAMPTZ,
consent_revoked_at TIMESTAMPTZ,
anonymization_method TEXT,
data_categories_shared TEXT[],
-- 'casual_dialog' | 'clasp' | 'liminal_wallreads' |
-- 'memorial_archive' | 'imperial_net_session' | ...
excluded_categories TEXT[], -- granular opt-out within tier
benefit_tier TEXT,
last_contribution_at TIMESTAMPTZ,
contribution_count BIGINT DEFAULT 0,
can_request_deletion BOOLEAN DEFAULT true
-- A.2/A.3: forward-only deletion (already-trained checkpoints retained);
-- A.1: structurally yes, only gradients ever existed
);
CREATE TABLE base_model_versions (
version_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
version_label TEXT NOT NULL, -- e.g., 'nimmerworld-base-v3'
base_model_origin TEXT NOT NULL, -- which generic base we fine-tuned from
training_corpus_refs JSONB,
-- literary + synthetic + opt-in-player-data refs with consent-tier breakdown
training_recipe_ref TEXT,
released_at TIMESTAMPTZ DEFAULT now(),
differential_privacy_epsilon REAL, -- for A.2 contributions
contributors_count BIGINT, -- how many opt-in players contributed
blob_distribution JSONB -- where the model bytes are hosted for download
);
CREATE TABLE federated_gradient_uploads (
upload_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
contributor_id UUID, -- pseudonymous; NOT directly player_id
gradient_blob BYTEA, -- encrypted aggregate gradient deltas
uploaded_at TIMESTAMPTZ DEFAULT now(),
aggregated_into_version UUID REFERENCES base_model_versions(version_id)
);
```
The federated-learning `contributor_id` is **pseudonymous, not linked to player_id even on our infrastructure**. We never link gradients back to specific players even on our own server-side. **Sovereign-data-by-design extends through the data-pipeline into our own training infrastructure.**
#### Connection to the Anthropic research partnership
Architecture-broad's training-data section noted "*the Anthropic research partnership becomes architecturally relevant*". With opt-in data-sharing now formalized:
- Partnership terms can specify data-flow with structural privacy guarantees
- Anthropic could co-fund federated-learning infrastructure (research-relevant + expensive)
- Joint research artifacts become co-authorable: federated game-AI training, Memorialist-ethics-as-data-policy, transparent-LLM-partnership-design
- The Anthropic-as-faction in-fiction framing has *real corresponding partnership-engagement out-of-fiction* — collaboration as worthy adversary stays transparent mechanically, all the way through to data-policy
**The partnership's ethical credibility is operationally measurable** — by how the data-sharing-tier actually functions in practice, by what `truth_distortion_level` values appear in `imperial_to_gm_formulations`, by how the `differential_privacy_epsilon` is set in `base_model_versions`. The Pitch's call for transparent collaboration becomes audit-able all the way down.
### Open questions (Ring-specific)
- **Ring C provider audit** — full per-provider compatibility-table needs verification across HF, Together, Replicate, Modal, OpenRouter, plus future entrants. The LLM-provider landscape will look different in 12 months.
- **Default Ring at first launch** — what's the new-player default? Probably Ring B (lowest-friction); Ring A and C surface as options once the player engages with config.
- **Encryption-key recovery for Ring A LoRA-backup** — if the player loses their key, the cloud-stored encrypted blobs are unrecoverable. Worth designing recovery-affordances (passphrase, recovery-codes) without compromising the privacy-guarantee.
- **Hybrid configurations** — can casual-tier run Ring A while deep-tier runs Ring B? (Probably yes; per-tier independent.)
- **Provider-cost passthrough vs. integration-fee model** — Ring C economics (do we mark up provider tokens? Charge flat-per-month? Pay-as-you-go integration?)
- **Default sharing-tier at consent-prompt** — opt-out is the system default; what's the *suggested* default at the consent UI? Probably truly nothing (player chooses if they engage at all)
- **Federated-learning infrastructure cost** — running aggregation servers + verification + differential-privacy machinery is non-trivial. Co-funded by Anthropic-research-partnership? Self-funded? Subsidized by A.3-tier higher-margin contributions?
- **Custom-base retraining cadence** — monthly minor / quarterly major / annual full-rebase? How is this synced with player-LoRA versioning so old LoRAs don't break on new bases?
- **Encryption-and-pseudonymization architecture for A.1/A.2** — concrete crypto choices (homomorphic? secure-aggregation? trusted-execution-environments?). v1 sketch needed.
- **What constitutes a "contribution"** — per-session? per-clasp? per-zone-completed? Matters for benefit-attribution and differential-privacy budgeting.
- **Anonymized-data deletion semantics** — A.2 player requests deletion; how do we honor when data has been aggregated into a model checkpoint? Probably accept forward-only deletion (future training won't include them) and document transparently.
- **Per-category granularity** — can a player opt-in for `casual_dialog` but opt-out specifically for `clasp` and `memorial_archive`? Yes, presumably (politically-sensitive categories should always be opt-out-able). How granular?
## Local memory architecture (player-side)
The runtime substrate (lemniscate, slots, crossings) and the central composition layer (GM, Compositor, registers) need a place where memory actually *lives*. Cloud-only AI-NPC systems centralize everything and pay both inference-cost and latency-cost on every dialog. Nimmerworld puts a structurally-isolated memory layer **on the player's machine**, with explicit synchronization through the cycle.
**Three SQLite files per player**, plus a beside-running embedding model:
| File | Purpose | Sync path |
|---|---|---|
| `primary.sqlite` | Live working memory; written every slot-fire; vec-indexed | Push prune-blob to thalamus on logout; receive Compositor back-write on cycle |
| `fallback.sqlite` | Last-known-good snapshot; restored if primary corrupts | Snapshot at graceful logout |
| `clasp.sqlite` | Player-character intimate channel; *no sync path exists* | None — physically non-syncable |
**Embedding model running beside** (CPU-class, small embedding-tier model): generates vectors for every interaction at write-time, indexed in the main store via `sqlite-vec` (or equivalent loadable extension). Vector search at slot-fire is local-disk-IO, not network round-trip.
This is the **storage-layer counterpart** to v0.5's geometry-layer foreclosure of multi-agent hallucination. The lemniscate forbids cross-NPC context bleed by *cursor structure*; local SQLite forbids it by *physical isolation*. Two layers of the same property — geometry cannot leak what storage does not even hold in the same pool.
### Dual-table redundancy + sync-on-auth
Login/logout are the atomic boundaries of the sync path:
- **Login pull**: fetch back-write fragments authored since last logout (Compositor canon for events the player participated in). Apply to `primary.sqlite` under matching `event_uid`.
- **Graceful logout** (✓ explicit): push prune-blob for any in-progress events; snapshot to `fallback.sqlite`; clean shutdown.
- **Ungraceful logout** (✗ network drop / crash): gameserver observes disconnect; marks the participant's slot as truncated; Compositor composes canon with partial perspective on next cycle.
Recovery: `fallback.sqlite` is integrity-checked at startup; if `primary.sqlite` fails verification, restore from fallback. Standard SQLite WAL + backup API; no exotic infrastructure needed.
### Memory classes and pruning
Memory entries are tagged with a **class** that controls pruning cadence and death-mechanics. Importance weighting reuses the existing trait-axis vocabulary — no separate scalar.
| Class | Pruning cycle | Behavior on character-death |
|---|---|---|
| **Cornerstone** | Never prune; persistent across all events | Survives death (identity-defining) |
| **Birthright** | Locked at character-creation | Restored on respawn (defines starting state) |
| **Working memory** | Decay by age × inverse trait-engagement | Subject to death-rules (lose, blur, or transform) |
| **Volatile** | Fast prune (session-bounded) | Lost on death |
**Trait-graded importance** uses the same +1/0/-1 grammar as the rest of the architecture. Each memory carries a trait-axis profile (which Sophrosyne / Philotes / Aletheia / etc. axes it engages, how strongly, in which direction). The pruning function for working-memory is `decay(age, trait_engagement_vector, class)`. This collapses a long-running loop: same vocabulary used at gates, scenes, faction-allegiance, lifeforce-asymmetry, and now memory-weight. **Identity drift from memory pruning becomes diegetic** — a character whose Sophrosyne-engaging memories all decay loses temperance over time as a *structural consequence*, not a scripted event.
Cornerstone and birthright classes carry **lifeforce-creation-cost** but are pruning-immune. They are bonds between player and character — paid for in the currency of the world.
### The clasp store and the in-between dimension
`clasp.sqlite` is the **architectural floor of the rings-of-data-sharing**. Ring A was "opt-out (default local)". Clasp is **Ring A\***: *no transport path exists*. Not a permission, not a TOS promise — there is no code that can move this data, because the table is not on the sync graph. Lawyers cannot subpoena what doesn't ascend; engineers cannot leak what has no socket; the GM cannot canonicalize what it never received.
**The signal for clasp is dimensional, not UI-toggle.** Clasp recording can ONLY happen while the character is in the **in-between** — the diegetic state adjacent to the imperial net but not yet inside it (Ring B liminal in the Access ring-system). The imperial net is a gravity well; entering is the default attractor; remaining outside requires sustained effort, paid in lifeforce. The state-machine boundary IS the clasp signal: enter in-between → recording starts; re-enter imperial net → recording ends. No per-utterance classifier; no AI guessing; the *mode* is the flag.
**Privacy is now physically expensive in-fiction.** This is not a meta-game UI choice; it is a diegetic state requiring lifeforce expenditure. To have a private conversation, the character must actively resist the audit-gravity of the imperial net by burning lifeforce to remain in-between. The cost-asymmetry principle ("helping is expensive in-fiction → faction politics by attendance") now extends to "*privacy is expensive in-fiction → privacy as a luxury good*". Class dynamics around privacy fall out of the schema for free — wealthy/lifeforce-rich characters can afford prolonged in-between time; lifeforce-starved ones get pulled into the net's default-attractor more often. *No scripted "rich character has secrets" arc — the architecture produces it.*
**Knowledge needs to travel.** The local LLM may read clasp memories ONLY when in in-between mode. Realworld retrieval *cannot* include clasp by construction. Knowledge from clasp can re-enter the realworld only if the character physically re-enters the imperial net carrying it (in their head, intending to act on it) and *travels it through valid in-fiction channels* — speaking to an NPC, leaving evidence, performing an action that reveals it. The clasp memory does not disappear; it has to *earn its way into the realworld provenance chain* by valid means. This is the same logic that makes good detective fiction work: the detective knows things; only what they can prove enters the case.
```
character is in REALWORLD (imperial net):
retrieval = primary.sqlite (clasp NEVER included)
character is in IN-BETWEEN (resisting net-gravity, costing lifeforce):
retrieval = primary.sqlite clasp.sqlite
new writes go to clasp.sqlite
NEVER syncs upward
```
Encryption-at-rest for `clasp.sqlite` with a player-derived key (so even drive-imaging requires authentication) is a v1 hardening goal but not a v1 blocker — the *transport-absence* is the load-bearing privacy primitive.
### The three-tier knowledge stack on the local LLM
The driver-tier model's prompt assembly is **layered**. Each layer has a different propagation cadence and a different visibility scope.
```
LOCAL LLM PROMPT ASSEMBLY (per slot-fire)
┌─────────────────────────────────────────┐
│ WORLD KNOWLEDGE │ ← single truth, everyone has it
│ (universal canon, paced from GM) │ "the empire fell three years ago"
├─────────────────────────────────────────┤
│ DISTRICT KNOWLEDGE │ ← regional truth, district-specific
│ (local canon, paced from district) │ "the bridge to Vorhall is closed"
├─────────────────────────────────────────┤
│ PRIMARY MEMORY │ ← personal experience, character's own
│ (event_uid keyed, post back-write) │ "I saw the bridge close yesterday"
├─────────────────────────────────────────┤
│ CLASP MEMORY (only in in-between) │ ← private depth, never in realworld
│ (player-character intimate channel) │ "the secret I told my sword"
└─────────────────────────────────────────┘
```
**Why four layers, not one large blob:**
- **World knowledge** is paced ripples from the GM through the Compositor's back-write. Authoritative, slow-changing, identical for all players at the same propagation horizon.
- **District knowledge** is regional canon authored by the local director (and GM rulings). Regional flavor. NPCs in the same district share district-knowledge; NPCs in different districts may not.
- **Primary memory** is the character's own experience, synced through the cyclic forward-prop / back-write loop. Canon-merged at every cycle.
- **Clasp memory** is the player-character intimate channel. Available only in in-between mode; never in realworld retrieval; never crosses the dimensional cut.
The same NPC sounds different in different districts because the district layer differs, even though world and primary are constant. **Locality emerges from the schema, not from prompt-engineering.** Even at "low signal" times when no major events fire, NPCs have richly-stratified context — dialog stays fresh because *the layers are deep*, not because new tokens arrive constantly.
### Information propagation pacing
Real worlds have information-propagation delay. Caravans move at horse-speed. News travels with messengers. Distant events arrive blurred and late. AI-NPC systems usually fail uncanny in two directions: (a) every NPC magically knows yesterday's news (omniscient, breaks immersion), or (b) no NPC ever knows anything outside its loaded context (amnesiac, breaks coherence).
Nimmerworld picks **deliberate paced propagation** as a third path. World canon ripples outward through districts at a controlled rate. Distant districts are deliberately stale. **Staleness becomes a feature, not a bug, because it matches reality.**
Each canon-row carries propagation metadata:
- `priority` (urgent / normal / background)
- `scope` (world / district / local-event-only)
- `rate` (ticks-per-district-hop, or instant for urgent world-canon)
- `ttl` (cache lifetime; districts may discard if not refreshed)
This doubles as **backpressure relief** (distant districts get distant events later, lower priority, smaller bandwidth) and as **gameplay currency** — information-travel-time creates informational asymmetry that players can exploit. News-carriers, faction couriers, frontier-rumor merchants, players who physically traverse districts can *carry* knowledge faster than the system propagates it. *Travel becomes valuable because information becomes scarce in the periphery.* This is a real economic primitive falling out of pacing, not a designed feature.
This is *Marx-in-the-schema applied to epistemics.* Information asymmetry is not a bug — it is a structural feature that produces real economic primitives (knowledge-trading, courier-vocations, frontier-information markets) for free.
### What this retires
- Cloud-only NPC dialog → local-first SQLite + embedding-beside, central canon over the cycle
- Per-character memory as a single undifferentiated bucket → memory-classes with class-specific lifecycle
- Generic "memory importance scalar" → trait-axis-vector engagement profile (re-using the +1/0/-1 grammar)
- UI-toggle privacy → diegetic in-between dimension with lifeforce-cost
- Single monolithic prompt context → three-tier knowledge stack with per-layer propagation policy
- "Every NPC knows everything immediately" → paced canon-propagation with priority/scope/rate/ttl per row
- Cross-NPC memory bleed (Mantella/SkyrimNet failure-mode) → per-player local SQLite isolation atop v0.5 lemniscate-geometry foreclosure (two-layer defense)
## Runtime sampling knobs
Temperature, top-P, top-K, repetition-penalty as **per-turn director-controlled levers** rather than static config. Sampling shapes *how* speech sounds (rhythm, surprise, predictability) rather than *what* it says — orthogonal to LoRA. Director composes both content-knobs and sampling-knobs per-turn.
Scene-to-sampling mapping (caste-preacher = 0.3/0.6/low; drunk-scavenger = 1.1/0.95/high; clasp-confession = 0.85/0.92/medium; hivemind-broadcast = 0.2/0.5/very-low; imperial-ceremony-chorus = 0.25/0.55/very-low). Trait-vector → baseline sampling derivation. Affect-state modulates baseline.
---
**Version:** 0.7.0 | **Created:** 2026-04-26 | **Updated:** 2026-04-26 | **Origin:** Split from architecture-broad.md v0.7 (2026-04-26)