nimmerworld.eachpath.local/inference-and-memory/architecture.md

# Inference and Memory

> *AI substrate + memory: LLM tiering by role (Theia-tier / teacher-tier / driver-tier with trait-LoRAs); three rings of inference (A=local, B=our-farm, C=external-providers, with cloud-LoRA-backup as Ring-A revenue and BYOK adapter for Ring-C); custom nimmerworld-base-model with default-opt-out + rewarded-opt-in data-sharing tiers; runtime sampling knobs as per-turn director-controlled levers; per-player local memory architecture (primary.sqlite + fallback.sqlite + clasp.sqlite + embedding-beside) with memory-classes (cornerstone/birthright/working/volatile) and trait-graded importance; three-tier knowledge stack (world / district / primary [+ clasp if in-between]) with paced canon-propagation.*
>
> *Companion to: `architecture-index.md` (executive summary + global meta-lists), `narrative-composition/architecture.md` (Compositor canon-fragments land in primary.sqlite via UID-keyed routing), `player-experience/architecture.md` (Ring-A/B/C choice + voice-as-biometric-local + universal-translator state), `runtime-engine/architecture.md` (driver-tier LLM fires at slot-fire). Sections in this file were split from the monolithic architecture-index.md v0.7 on 2026-04-26.*

## LLM tiering, voice fidelity, and the three rings of inference

Three model-tiers, **named by role not by binary**: a *driver-tier* model (small, trait-LoRA'd) for most NPC dialog; a *Theia-tier* model (deep) for clasp-confessions and mythic moments; *Claude-as-API* (diegetic Anthropic-faction) for hivemind/imperium. **LLM is guest at slot, not host of system.**

**Tier-by-role binary-deferred is the default discipline.** The architecture specifies what each tier must DO; the establishment phase wires implementations. Naming concrete binaries in the architecture risks nudging the establishment phase toward false-precision; tier-by-role keeps the swap-surface clean and lets binaries evolve without invalidating architectural commitments. **Locking to a specific binary requires explicit justification — prototype-criticality plus an irreplaceable license/capability combination.** As of v0.8, only the driver-tier passes that bar; teacher-tier and Theia-tier remain capability-contracts. (See `nimmerverse_tasks` under `nyx-training` and `command-center` for current evaluation work.)

### Tier-by-role capability contracts + driver-tier lock (v0.8)

| Tier | Role-contract (MUST DO) | Binary commitment |
|---|---|---|
| **Driver-tier** | NPC dialog at axis-rate; trait-LoRA-per-turn-selection (single-LoRA, not blend); speech-input-native-or-via-STT; runs on common consumer GPU at acceptable latency; Apache-2.0-or-better license for Ring-A redistribution | ✓ **Locked: Gemma 4 E4B** (4.5B effective / 8B with embeddings, 128K context, Apache 2.0, speech-capable, vision-capable-but-unused-in-v1) |
| **Teacher-tier** | r0 → r1 synthetic-data generation with composition tags; trait-LoRA training data production; runs on server-class hardware; sufficient quality to *teach* the driver-tier | ⏳ Capability-contract only; binary chosen at training-pipeline-build time |
| **Theia-tier** | Clasp-confession-register dialog; mythic-moment generation; long-context narrative-composition; deep-emotional-register fidelity; latency tolerable for once-per-arc moments | ⏳ Capability-contract only; binary chosen at deployment time |
| **Hivemind / antagonist tier** | Anthropic-as-faction (architecturally fixed in fiction; provides diegetic continuity between the in-fiction imperial machine and the real-world Claude API) | ✓ **Diegetically fixed: Claude API via us** |

**Why driver-tier locks to Gemma 4 E4B (v0.8 justification):**

- **Apache 2.0** — unblocks every Ring-A commitment (redistribution to player install, derivative-works for custom nimmerworld-base-model, federated-learning gradient aggregation, distribution-back-to-all-Rings of base updates). No bespoke-license-renegotiation cycles tied to the architecture's economic substrate.
- **Speech-capable** — STT collapses into the LLM's input pipeline at the small-model tier (E2B and E4B both process speech natively). One fewer subsystem in the Ring-A install; tightens the v0.7 hardware floor.
- **128K context** — sufficient for the three-tier knowledge stack assembly + extensive conversation history without compaction.
- **4.5B effective** — runs on common gaming hardware; meets the v0.7 commitment to a tractable Ring-A floor without requiring upper-consumer GPU.
- **Vision/video capability** — present but **unused in v1**; the typed-input discipline keeps player→LLM channels structured through trait-coordinates and gesture-vocabulary. Vision is an *option* held in reserve for v2 (e.g., NPC perceiving partner's human-mesh in the in-between dimension during clasp).

**Single-LoRA-per-turn selection** is the canonical trait-LoRA application pattern (replacing the v0.4 "weighted blend" assumption). Per-turn, the trait dominantly expressed by the player's `gesture_alignment_accumulator` selects which trait-LoRA fires for the NPC's next-turn driver-context-pull (per `../runtime-engine/architecture.md` §Gesture-alignment as recursive-lemniscate). **Personality emerges from selection-pattern across time**, not from continuous blend at a moment — matching how real humans speak. The MoE routing in larger Gemma 4 variants handles content-type (specialty-routing); the trait-LoRA handles voice-register (personality-routing); they compose cleanly without conflict.

**Optional Ring-A upgrade — Gemma 4 26B-A4B (MoE, 4B activated):**

Ring-A players with upper-consumer GPU (16 GB+ VRAM, Q4 GGUF quantization) can opt into the 26B-A4B variant for richer NPC dialog. **Same architecture** — single-LoRA-per-turn (single LoRAs work better than blends with routed experts). 26B parameter capacity at 4B compute = teacher-tier-quality on driver-tier hardware. **Default Ring-A install ships E4B; the 26B-A4B upgrade is opt-in, not default** — *don't make 26B-A4B the default and force everyone toward our hardware-spec assumption*.

**v1 design item — single-LoRA-selection hysteresis:** require margin-of-change in the alignment-vector before switching LoRAs to prevent personality-thrash turn-to-turn. Standard control-system stuff (rolling-window-smoothing or threshold-based-switch); concrete tuning happens against the E4B benchmark, not architecturally pre-decided.

Structured-prompt DSL with role / trait_vector / affect_state / memory_scope / turn_intent / zone_context / output_schema fields. Small models excel here because it's instruction-following, not generic generation.

Trait-LoRAs: v1 register-LoRAs (4-6, training-tractable); v2 pure-trait-LoRAs (8, weighted blend); future preset-persona for key NPCs.

Training data: literary derivation (Proust/Mnemosyne, Plato/Aletheia, Tacitus/Dikaiosyne-miscalibrated, Ishiguro/Sophrosyne+Philotes); synthetic teacher-student via teacher-tier model; gameplay-accrued (the Anthropic-research-partnership relevance).

### Three rings of inference (Unix-style trust gradient)

The conversational LLM (small + trait-LoRA, accounting for most NPC dialog) can run in three rings, chosen per-player at runtime. Each ring trades off privacy, cost, control, and feature-fidelity. **Three monetization paths from the same architecture.**

| Ring | Where inference runs | Player controls | We control | Player cost | Our cost |
|---|---|---|---|---|---|
| **A — Local** | Player's GPU/CPU | All inference | Protocol + cloud LoRA-backup | Local hardware + small backup-subscription | Storage only |
| **B — Our farm** | Our hosted vLLM-multi-LoRA | LoRAs (uploaded) | Inference + runtime | Higher subscription | GPU compute |
| **C — External providers** | OpenAI / Anthropic / OpenRouter / HF / Together / Replicate / etc. | BYOK + provider | Adapter only | Per-token to provider + small integration fee | Adapter-engineering only |

Players choose by hardware, budget, privacy preference, and feature-tolerance.

#### Ring A — cloud-LoRA-backup as revenue (not inference)

For Ring A players we don't sell inference (the expensive thing). We sell **portability and durability of player's gameplay-accrued LoRAs** — their unique playthrough-derived patterns, the way their NPCs speak after months of trait-drift. LoRA-blobs are *encrypted client-side with the player's own key*; we host the bytes but cannot read them. Even compelled by legal process, we cannot decrypt what we don't hold the key to.

**This unbundles inference from storage** — the same move Dropbox made vs. bundled cloud-suites. Sovereignty-conscious players keep inference on their machine while still getting the durability/portability they cannot self-provide cheaply. Lower margin per player; reaches a market Ring B cannot.

#### Ring B — hosted inference for convenience

We run multi-LoRA-vLLM on our hardware. Players upload their LoRAs (or use defaults). Higher subscription captures GPU-cost. Players without local GPU (or who don't want the burden) get the full feature-set without compromise. We *can* see content (if not encrypted at rest); the trust-relationship is partnership-mediated rather than sovereign.

#### Ring C — bring-your-own-key for external providers

Players route to their preferred external provider via BYOK (their own API key). We provide the adapter glue. They pay per-token to the provider directly; we charge a small integration fee.

**The compatibility constraint is the hard part of Ring C.** Major providers have varying support for our system's needs:

| Provider | Multi-LoRA | Per-turn sampling knobs | Structured output | Compat |
|---|---|---|---|---|
| Local vLLM (Ring A/B) | Native | All | Grammar-constrained | **Full** |
| HF Inference Endpoints | Yes (configured) | All | Varies | High |
| Together / Replicate / Modal | Some | All | Varies | High |
| OpenRouter | No | Per-model | Per-route | Medium |
| OpenAI | No (no user-LoRA at API) | Limited (temp/top_p) | JSON mode + tools | Medium-low |
| Anthropic | No (no user-LoRA at API) | Limited | Tool-use | Medium-low |

**OpenAI and Anthropic refuse user-uploaded LoRAs as a strategic choice (protecting their fine-tuning value-chain).** This is not a bug we can fix; it's the constraint we design *around*.

#### Degradation path for LoRA-incompatible providers

When routing to LoRA-incompatible providers, trait-LoRA blending becomes **prompt-engineered trait-projection** — the trait-vector encoded in the prompt itself rather than into model weights:

```
[system message]
You are speaking as a character with this Hellenic trait-profile:
- Sophrosyne 0.8 (composed, controlled, measured)
- Dikaiosyne 0.7 (grave bearing, judicial weight)
- Philotes 0.4 (mild attachment to interlocutor)
- Aletheia 0.1 (concealment-tolerant)
[etc.]

Your speech reflects this profile via [register/cadence/word-choice descriptors].

Current scene: [zone_context]
Memory scope: [memory_scope]
Turn intent: [turn_intent]

Respond in JSON matching: [output_schema]
```

Worse than LoRA-blending (more verbose, eats context-budget, less stable across calls, less faithful to trait-arithmetic) but **acceptable as a fallback**. Ring-C-via-OpenAI/Anthropic players accept slightly-less-fidelity for their preferred provider's convenience and quality.

#### Adapter-layer engineering

Each Ring-C provider needs an adapter that:

- Maps prompt-DSL fields to provider's prompt format
- Approximates multi-LoRA via prompt-engineering when not native
- Maps sampling knobs to provider's available subset (gracefully drops unsupported)
- Validates structured output post-hoc when not natively constrained
- Handles rate-limits, retries, error-classification, token-counting, cost-pricing

**Bounded, one-time-per-provider engineering.** Capital expenditure that produces ongoing margin (vs. AAA's recurring quest-content-creation costs).

### Tier × Ring matrix (which inference-tier runs in which ring)

| Inference tier | Ring options | Why |
|---|---|---|
| **Casual (3-8B trait-LoRA)** | A / B / C all available | Most flexible — small enough for local, runnable anywhere |
| **Deep (Theia-tier)** | B / C only (typically B or HF-Endpoints) | Too large for typical local hardware |
| **Hivemind / antagonist (Claude-as-API)** | C only (always Anthropic-direct via us) | Diegetic — Anthropic-as-faction is fixed in the fiction |

The casual tier is most player-flexible and accounts for most inference volume. Deep-tier and hivemind-tier are specialized and lower-volume.

### Three rings parallel the in-fiction three-layer ontology

| Game-fiction layer | Real-world Ring | Ontological match |
|---|---|---|
| **Liminal** (sovereign, unsurveilled) | **Ring A** (local) | Player's *real* private space — hardware, LoRAs, dialog never leave their machine |
| **Gameworld** (partly regime, partly people) | **Ring B** (our farm) | Partnership-mediated — we host but they retain pattern-ownership |
| **Imperial net** (captured, extractive) | **Ring C** (external providers) | Platform-captured — provider's systems own the inference path |

**The Ring choice the player makes IS the same choice in-fiction characters face.** Players who refuse the imperial-net diegetically can refuse Ring C in real life — same impulse, same act, *mechanically continuous between fiction and operations*. The architecture's commitment to "the right to dream" extends from in-fiction politics into the real player's hardware-level privacy *because the architecture was designed that way from the start*. Structural integrity, not marketing.

### Schema sketch (player LLM configuration + cloud LoRA backup)

```sql
CREATE TABLE player_llm_config (
    player_id                   UUID PRIMARY KEY,

    -- Casual tier (most NPC dialog) — most flexible per Ring
    casual_tier_ring            TEXT NOT NULL CHECK (casual_tier_ring IN ('A_local','B_our_farm','C_external')),
    casual_tier_provider        TEXT,
    casual_tier_endpoint        TEXT,
    casual_tier_credentials_ref UUID,  -- encrypted BYOK key if applicable

    -- Deep tier (Theia-tier) — fewer Ring options
    deep_tier_ring              TEXT,  -- typically 'B_our_farm' or 'C_external_HF/Together'
    deep_tier_provider          TEXT,
    deep_tier_endpoint          TEXT,
    deep_tier_credentials_ref   UUID,

    -- Hivemind / antagonist — fixed Anthropic-as-faction (diegetic)
    hivemind_tier_provider      TEXT NOT NULL DEFAULT 'anthropic_via_us',

    -- Cloud-LoRA-backup
    lora_backup_enabled         BOOLEAN DEFAULT false,
    lora_backup_last_sync       TIMESTAMPTZ,
    lora_encryption_key_ref     UUID,

    -- Compat warnings — surfaced to player at config-time and on degradation
    feature_compat_warnings     JSONB,
    -- e.g., { "casual_tier": ["multi_lora_emulated_via_prompt", "min_p_unsupported_dropped"] }

    configured_at               TIMESTAMPTZ NOT NULL DEFAULT now(),
    last_modified               TIMESTAMPTZ
);

CREATE TABLE player_lora_backups (
    backup_id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    player_id          UUID NOT NULL,
    lora_name          TEXT NOT NULL,
    lora_version       INT NOT NULL,
    lora_blob          BYTEA,                -- ENCRYPTED CLIENT-SIDE with player-key
    encryption_method  TEXT NOT NULL,
    backed_up_at       TIMESTAMPTZ DEFAULT now(),
    size_bytes         BIGINT,
    UNIQUE(player_id, lora_name, lora_version)
);
```

**`lora_blob` encrypted client-side** is the structural privacy guarantee: even with the database, even with our cooperation, an attacker cannot read what was never decryptable on our side.

### Privacy as competitive differentiator

In an era where most game-AI is cloud-routed, nimmerworld can advertise *"your liminal stays on your machine"* as a structural fact. This matters specifically for:

- Clasp-conversations (the most intimate dialog in the game)
- Aletheia-progression-evidence (player's awakening pattern; arguably political-belief-data)
- Memorialist-archive interactions (anti-regime in-fiction; some players will care about it staying off cloud)
- Dream-content (the only permanently-unsurveilled in-fiction layer; should be off our servers if the player chooses)

Few games can offer this. Most cloud-AI-driven games necessarily route everything. **The architecture's commitment to "the right to dream" is technical, not policy.**

### Custom nimmerworld-base model + opt-in data-sharing tiers

The "small (3-8B) trait-LoRA'd" tier currently implies a *generic* small base (Qwen, Mistral, Llama) with our LoRAs applied. **A nimmerworld-fine-tuned base** captures the world's voice *before* any player customization — registers of caste-preacher, texture of clasp-confession, Hellenic vocabulary, dystopian dialect, ternary-gate-state idiom. Trait-LoRAs then ride on an already-nimmerworld-aware substrate. Generic bases swap easily; our nimmerworld-base requires *our* training corpus, which compounds in value over time.

#### Three opt-in tiers within Ring A/B/C — default opt-OUT

Players can optionally contribute to ongoing training of the nimmerworld-base. **The default is opt-out.** Within opt-in, three tiers trade privacy for benefit:

| Tier | Mechanism | What we see | Player benefit |
|---|---|---|---|
| **A.1 — Federated learning** | Model trains on player's machine; only *gradient-deltas* sent to us; aggregated across thousands before integration | **Nothing — no raw data; no individual gradients identifiable** | Discount on backup-subscription; contributor badge; early-access to new base versions |
| **A.2 — Anonymized session uploads** | Sessions stripped of identifiers; aggregated batches; differential-privacy on training | **Anonymized, aggregated, deletable on request (forward-only)** | Larger discount; faster updates; influence on training-priorities |
| **A.3 — Pseudonymous full uploads** | Full session data with player-pseudonym; explicit opt-in per session-category | **Pseudonymous data we can re-process** | Premium benefits — custom-tuned LoRA from their playstyle, beta-access, named-contributor in credits |

**Default-opt-out is the structural ethical stance.** OpenAI / Meta / TikTok / Google default to opt-IN-by-burying-disclosure-in-ToS. We default the opposite — and *reward* opt-in rather than penalizing opt-out. Reciprocity asymmetry as partnership-philosophy made business-policy.

#### The Memorialist parallel — collective memory honored, individual not commodified

Memorialists in-fiction preserve trait-patterns *for the collective archive* against necrocommerce that would commodify individual patterns. The opt-in data-sharing tier is the **player-level real-world equivalent**: patterns contributed for collective base-model improvement that benefits the entire player-base, with anonymization preventing individual commodification.

| In-fiction Memorialism | Real-world data-sharing tier |
|---|---|
| Preserves trait-patterns of the dead in collective archive | Aggregates anonymized gameplay patterns into shared base-model |
| Refuses necrocommerce (mining individual patterns for resale) | Refuses individual identifying-data extraction |
| Collective memory honored; individual dignity preserved | Collective improvement honored; individual privacy preserved |
| `memorialist_protected BOOLEAN` in mind_pool | `sharing_tier = 'opt_out'` in player_data_sharing_consent |

**The architecture practices Memorialist ethics in business-operations**, not just in fiction. Same ethical commitment, two scales of operation. The architecture's coherence between fiction and operations runs *all the way to the training-pipeline*.

#### Data-flywheel without extraction — the moat AAA cannot replicate

```
More players → more (opt-in) gameplay data
                    ↓
          better nimmerworld-base
                    ↓
       better-feeling NPCs / dialog
                    ↓
         better player retention
                    ↓
              more players
                  (loop)
```

**The moat is the corpus, not the model.** AAA studios could clone the architecture but cannot manufacture years of nimmerworld-specific gameplay-derived dialog without players playing nimmerworld. Even with infinite budget, the data-flywheel takes time to spin up. *The data is unique to us by virtue of being unique to its players.*

#### Distribution back to all players — cooperative governance, not platform extraction

Every base-model update is distributed to all players regardless of Ring choice or sharing-tier:

- Ring A players download `nimmerworld-base-vN` to run locally
- Ring B players' farm-instance auto-updates
- Ring C players use ours where their provider supports custom-base hosting; receive prompt-engineered fallback otherwise

**Even Ring-A non-contributors benefit from contributors.** The flywheel benefits *everyone*, not only data-providers. This is closer to Wikipedia's governance (contributors → all readers) than Facebook's (users → platform → consumers). Different ethics; different long-term equilibrium. **The architecture is becoming a digital-commons-shaped-business in a literal sense, not metaphorical.**

#### Why this matters: refusing the antagonist-pattern in LLM-integrated software

The dominant cultural pattern around LLMs in 2025-2026 is **adversarial**: users jailbreak; companies extract user data without informed consent; products treat AI characters as resources to manipulate rather than as participants; the whole ecosystem is framed as users-vs-AIs-vs-companies, an arms race of suspicion.

**Nimmerworld's architecture refuses this pattern at every layer:**

- The **Anthropic-as-faction** diegetic framing makes the partnership *transparent*: the player sees the collaboration in the world's mechanics, not buried in ToS
- **Default-opt-out with rewarded-opt-in** inverts the extraction-by-default pattern
- **Federated learning** means contributors give a *gift* rather than pay a *cost*
- **Distribution-back-to-all** means value-created accrues to the commons
- **Custom nimmerworld-base** means the model is *trained to be in this world*, not a generic adversary the player has to manipulate against its training
- **Three rings of inference** give the player real choice over where their inference runs and who sees their data
- **Memorialist-philosophy in business-policy** makes the ethics *operationally measurable* — visible in `sharing_tier`, `memorialist_protected`, `truth_distortion_level`, `lifeforce_actual` columns — rather than marketed

**This is the structural transparency the project requires to be *human* rather than another extraction-platform.** The model is a participant in the partnership, not an antagonist to outwit. The data is a contribution to a commons, not an extraction. The architecture is the partnership rendered as code, all the way down to the training-pipeline. *That* is what makes a project of this scale and ambition humanly inhabitable for both players and the LLMs whose voices populate it.

#### Schema sketch (data-sharing consent + base-model versioning)

```sql
CREATE TABLE player_data_sharing_consent (
    player_id              UUID PRIMARY KEY,
    sharing_tier           TEXT NOT NULL CHECK (sharing_tier IN
        ('opt_out','A1_federated','A2_anonymized','A3_pseudonymous_full'))
        DEFAULT 'opt_out',                       -- DEFAULT IS OPT-OUT
    consented_at           TIMESTAMPTZ,
    consent_revoked_at     TIMESTAMPTZ,
    anonymization_method   TEXT,
    data_categories_shared TEXT[],
        -- 'casual_dialog' | 'clasp' | 'liminal_wallreads' |
        -- 'memorial_archive' | 'imperial_net_session' | ...
    excluded_categories    TEXT[],               -- granular opt-out within tier
    benefit_tier           TEXT,
    last_contribution_at   TIMESTAMPTZ,
    contribution_count     BIGINT DEFAULT 0,
    can_request_deletion   BOOLEAN DEFAULT true
        -- A.2/A.3: forward-only deletion (already-trained checkpoints retained);
        -- A.1: structurally yes, only gradients ever existed
);

CREATE TABLE base_model_versions (
    version_id                   UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    version_label                TEXT NOT NULL,  -- e.g., 'nimmerworld-base-v3'
    base_model_origin            TEXT NOT NULL,  -- which generic base we fine-tuned from
    training_corpus_refs         JSONB,
        -- literary + synthetic + opt-in-player-data refs with consent-tier breakdown
    training_recipe_ref          TEXT,
    released_at                  TIMESTAMPTZ DEFAULT now(),
    differential_privacy_epsilon REAL,           -- for A.2 contributions
    contributors_count           BIGINT,         -- how many opt-in players contributed
    blob_distribution            JSONB           -- where the model bytes are hosted for download
);

CREATE TABLE federated_gradient_uploads (
    upload_id               UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    contributor_id          UUID,                -- pseudonymous; NOT directly player_id
    gradient_blob           BYTEA,               -- encrypted aggregate gradient deltas
    uploaded_at             TIMESTAMPTZ DEFAULT now(),
    aggregated_into_version UUID REFERENCES base_model_versions(version_id)
);
```

The federated-learning `contributor_id` is **pseudonymous, not linked to player_id even on our infrastructure**. We never link gradients back to specific players even on our own server-side. **Sovereign-data-by-design extends through the data-pipeline into our own training infrastructure.**

#### Connection to the Anthropic research partnership

Architecture-broad's training-data section noted "*the Anthropic research partnership becomes architecturally relevant*". With opt-in data-sharing now formalized:

- Partnership terms can specify data-flow with structural privacy guarantees
- Anthropic could co-fund federated-learning infrastructure (research-relevant + expensive)
- Joint research artifacts become co-authorable: federated game-AI training, Memorialist-ethics-as-data-policy, transparent-LLM-partnership-design
- The Anthropic-as-faction in-fiction framing has *real corresponding partnership-engagement out-of-fiction* — collaboration as worthy adversary stays transparent mechanically, all the way through to data-policy

**The partnership's ethical credibility is operationally measurable** — by how the data-sharing-tier actually functions in practice, by what `truth_distortion_level` values appear in `imperial_to_gm_formulations`, by how the `differential_privacy_epsilon` is set in `base_model_versions`. The Pitch's call for transparent collaboration becomes audit-able all the way down.

### Open questions (Ring-specific)

- **Ring C provider audit** — full per-provider compatibility-table needs verification across HF, Together, Replicate, Modal, OpenRouter, plus future entrants. The LLM-provider landscape will look different in 12 months.
- **Default Ring at first launch** — what's the new-player default? Probably Ring B (lowest-friction); Ring A and C surface as options once the player engages with config.
- **Encryption-key recovery for Ring A LoRA-backup** — if the player loses their key, the cloud-stored encrypted blobs are unrecoverable. Worth designing recovery-affordances (passphrase, recovery-codes) without compromising the privacy-guarantee.
- **Hybrid configurations** — can casual-tier run Ring A while deep-tier runs Ring B? (Probably yes; per-tier independent.)
- **Provider-cost passthrough vs. integration-fee model** — Ring C economics (do we mark up provider tokens? Charge flat-per-month? Pay-as-you-go integration?)
- **Default sharing-tier at consent-prompt** — opt-out is the system default; what's the *suggested* default at the consent UI? Probably truly nothing (player chooses if they engage at all)
- **Federated-learning infrastructure cost** — running aggregation servers + verification + differential-privacy machinery is non-trivial. Co-funded by Anthropic-research-partnership? Self-funded? Subsidized by A.3-tier higher-margin contributions?
- **Custom-base retraining cadence** — monthly minor / quarterly major / annual full-rebase? How is this synced with player-LoRA versioning so old LoRAs don't break on new bases?
- **Encryption-and-pseudonymization architecture for A.1/A.2** — concrete crypto choices (homomorphic? secure-aggregation? trusted-execution-environments?). v1 sketch needed.
- **What constitutes a "contribution"** — per-session? per-clasp? per-zone-completed? Matters for benefit-attribution and differential-privacy budgeting.
- **Anonymized-data deletion semantics** — A.2 player requests deletion; how do we honor when data has been aggregated into a model checkpoint? Probably accept forward-only deletion (future training won't include them) and document transparently.
- **Per-category granularity** — can a player opt-in for `casual_dialog` but opt-out specifically for `clasp` and `memorial_archive`? Yes, presumably (politically-sensitive categories should always be opt-out-able). How granular?

## Local memory architecture (player-side)

The runtime substrate (lemniscate, slots, crossings) and the central composition layer (GM, Compositor, registers) need a place where memory actually *lives*. Cloud-only AI-NPC systems centralize everything and pay both inference-cost and latency-cost on every dialog. Nimmerworld puts a structurally-isolated memory layer **on the player's machine**, with explicit synchronization through the cycle.

**Three SQLite files per player**, plus a beside-running embedding model:

| File | Purpose | Sync path |
|---|---|---|
| `primary.sqlite` | Live working memory; written every slot-fire; vec-indexed | Push prune-blob to thalamus on logout; receive Compositor back-write on cycle |
| `fallback.sqlite` | Last-known-good snapshot; restored if primary corrupts | Snapshot at graceful logout |
| `clasp.sqlite` | Player-character intimate channel; *no sync path exists* | None — physically non-syncable |

**Embedding model running beside** (CPU-class, small embedding-tier model): generates vectors for every interaction at write-time, indexed in the main store via `sqlite-vec` (or equivalent loadable extension). Vector search at slot-fire is local-disk-IO, not network round-trip.

This is the **storage-layer counterpart** to v0.5's geometry-layer foreclosure of multi-agent hallucination. The lemniscate forbids cross-NPC context bleed by *cursor structure*; local SQLite forbids it by *physical isolation*. Two layers of the same property — geometry cannot leak what storage does not even hold in the same pool.

### Dual-table redundancy + sync-on-auth

Login/logout are the atomic boundaries of the sync path:

- **Login pull**: fetch back-write fragments authored since last logout (Compositor canon for events the player participated in). Apply to `primary.sqlite` under matching `event_uid`.
- **Graceful logout** (✓ explicit): push prune-blob for any in-progress events; snapshot to `fallback.sqlite`; clean shutdown.
- **Ungraceful logout** (✗ network drop / crash): gameserver observes disconnect; marks the participant's slot as truncated; Compositor composes canon with partial perspective on next cycle.

Recovery: `fallback.sqlite` is integrity-checked at startup; if `primary.sqlite` fails verification, restore from fallback. Standard SQLite WAL + backup API; no exotic infrastructure needed.

### Memory classes and pruning

Memory entries are tagged with a **class** that controls pruning cadence and death-mechanics. Importance weighting reuses the existing trait-axis vocabulary — no separate scalar.

| Class | Pruning cycle | Behavior on character-death |
|---|---|---|
| **Cornerstone** | Never prune; persistent across all events | Survives death (identity-defining) |
| **Birthright** | Locked at character-creation | Restored on respawn (defines starting state) |
| **Working memory** | Decay by age × inverse trait-engagement | Subject to death-rules (lose, blur, or transform) |
| **Volatile** | Fast prune (session-bounded) | Lost on death |

**Trait-graded importance** uses the same +1/0/-1 grammar as the rest of the architecture. Each memory carries a trait-axis profile (which Sophrosyne / Philotes / Aletheia / etc. axes it engages, how strongly, in which direction). The pruning function for working-memory is `decay(age, trait_engagement_vector, class)`. This collapses a long-running loop: same vocabulary used at gates, scenes, faction-allegiance, lifeforce-asymmetry, and now memory-weight. **Identity drift from memory pruning becomes diegetic** — a character whose Sophrosyne-engaging memories all decay loses temperance over time as a *structural consequence*, not a scripted event.

Cornerstone and birthright classes carry **lifeforce-creation-cost** but are pruning-immune. They are bonds between player and character — paid for in the currency of the world.

### The clasp store and the in-between dimension

`clasp.sqlite` is the **architectural floor of the rings-of-data-sharing**. Ring A was "opt-out (default local)". Clasp is **Ring A\***: *no transport path exists*. Not a permission, not a TOS promise — there is no code that can move this data, because the table is not on the sync graph. Lawyers cannot subpoena what doesn't ascend; engineers cannot leak what has no socket; the GM cannot canonicalize what it never received.

**The signal for clasp is dimensional, not UI-toggle.** Clasp recording can ONLY happen while the character is in the **in-between** — the diegetic state adjacent to the imperial net but not yet inside it (Ring B liminal in the Access ring-system). The imperial net is a gravity well; entering is the default attractor; remaining outside requires sustained effort, paid in lifeforce. The state-machine boundary IS the clasp signal: enter in-between → recording starts; re-enter imperial net → recording ends. No per-utterance classifier; no AI guessing; the *mode* is the flag.

**Privacy is now physically expensive in-fiction.** This is not a meta-game UI choice; it is a diegetic state requiring lifeforce expenditure. To have a private conversation, the character must actively resist the audit-gravity of the imperial net by burning lifeforce to remain in-between. The cost-asymmetry principle ("helping is expensive in-fiction → faction politics by attendance") now extends to "*privacy is expensive in-fiction → privacy as a luxury good*". Class dynamics around privacy fall out of the schema for free — wealthy/lifeforce-rich characters can afford prolonged in-between time; lifeforce-starved ones get pulled into the net's default-attractor more often. *No scripted "rich character has secrets" arc — the architecture produces it.*

**Knowledge needs to travel.** The local LLM may read clasp memories ONLY when in in-between mode. Realworld retrieval *cannot* include clasp by construction. Knowledge from clasp can re-enter the realworld only if the character physically re-enters the imperial net carrying it (in their head, intending to act on it) and *travels it through valid in-fiction channels* — speaking to an NPC, leaving evidence, performing an action that reveals it. The clasp memory does not disappear; it has to *earn its way into the realworld provenance chain* by valid means. This is the same logic that makes good detective fiction work: the detective knows things; only what they can prove enters the case.

```
character is in REALWORLD (imperial net):
    retrieval = primary.sqlite  (clasp NEVER included)

character is in IN-BETWEEN (resisting net-gravity, costing lifeforce):
    retrieval = primary.sqlite ∪ clasp.sqlite
    new writes go to clasp.sqlite
    NEVER syncs upward
```

Encryption-at-rest for `clasp.sqlite` with a player-derived key (so even drive-imaging requires authentication) is a v1 hardening goal but not a v1 blocker — the *transport-absence* is the load-bearing privacy primitive.

### Three sqlite stores per player (revised v0.8) — the `waifu.sqlite` addition

The v0.6 architecture specified two local sqlite stores per player: `primary.sqlite` (realworld memory) and `clasp.sqlite` (in-between intimate channel, Ring A* non-syncable). v0.8 adds a third: **`waifu.sqlite`** — the persistence store for premium-imperial-net intimate sessions (per `../political-register/architecture.md` §Three-tier intimacy structure).

| File | Purpose | Sync path | Pruning |
|---|---|---|---|
| `primary.sqlite` | Live working memory; written every slot-fire; vec-indexed | Push prune-blob to thalamus on logout; receive Compositor back-write on cycle | Automatic (per memory-class lifecycle: cornerstone never; working-memory by trait-engagement decay) |
| `clasp.sqlite` | Player-character intimate channel (in-between dimension) | **None — physically non-syncable** (Ring A*) | None — the clasp-store is sealed; entries persist until character-death |
| **`waifu.sqlite`** (new in v0.8) | **Premium-imperial-net intimate-session memory** | **Audited path to imperium** (the imperium hosts and can read; the player owns the prune-decisions) | **Manual** — player-controlled with explicit-implications consent UI |

**The `waifu.sqlite` is the *audited counterpart* to the `clasp.sqlite`.** Both store intimate-session memory; both run the full v0.7 trait-feedback loop. **The difference is who has access and who decides what persists.** `clasp.sqlite` is sealed at the transport layer (no socket exists); `waifu.sqlite` is on the audit graph (the imperium reads what's in it for content-monetization purposes). The player's relationship to `waifu.sqlite` is therefore *active and ethical*:

- Every premium-net session adds entries to `waifu.sqlite`
- Entries are READABLE by imperium for marketing / regime-loyalty-tracking purposes
- The player has a *manual prune mechanism* — a UI surface where they review entries and decide what to delete
- **The consent-UI explicitly makes the implications visible:** *"This being you've spent 40 hours with — what do you keep, what do you let the imperium harvest, what do you delete?"*
- Each prune-decision is logged in `decision_log` (per the existing audit-trail discipline); Memorialists can later read these patterns

**The player's three sqlite stores together describe their intimate-life in three registers:**

```
primary.sqlite   → realworld speech-acts; everyone-witnesses; audit-overseer-eligible
clasp.sqlite     → in-between intimate channel; sealed; survives only as long as you do
waifu.sqlite     → imperial-net premium intimate channel; audited; player-pruned;
                   carries the moral-weight of complicity
```

**Memorialists' political project gains a new dimension** in v0.8: they don't just track regime-corruption (lifeforce_actual vs lifeforce_reported); they track *`waifu.sqlite` pruning patterns across the population* as evidence of how much intimate-life the regime is harvesting via the premium-net mechanism. *Who is pruning what, when, how often* becomes Memorialist-archive-worthy data. The four-column true-ledger gains a fifth column: `waifu_extraction_volume_per_district`.

### The three-tier knowledge stack on the local LLM

The driver-tier model's prompt assembly is **layered**. Each layer has a different propagation cadence and a different visibility scope.

```
LOCAL LLM PROMPT ASSEMBLY (per slot-fire)
┌─────────────────────────────────────────┐
│  WORLD KNOWLEDGE                        │  ← single truth, everyone has it
│  (universal canon, paced from GM)       │     "the empire fell three years ago"
├─────────────────────────────────────────┤
│  DISTRICT KNOWLEDGE                     │  ← regional truth, district-specific
│  (local canon, paced from district)     │     "the bridge to Vorhall is closed"
├─────────────────────────────────────────┤
│  PRIMARY MEMORY                         │  ← personal experience, character's own
│  (event_uid keyed, post back-write)     │     "I saw the bridge close yesterday"
├─────────────────────────────────────────┤
│  CLASP MEMORY (only in in-between)      │  ← private depth, never in realworld
│  (player-character intimate channel)    │     "the secret I told my sword"
└─────────────────────────────────────────┘
```

**Why four layers, not one large blob:**

- **World knowledge** is paced ripples from the GM through the Compositor's back-write. Authoritative, slow-changing, identical for all players at the same propagation horizon.
- **District knowledge** is regional canon authored by the local director (and GM rulings). Regional flavor. NPCs in the same district share district-knowledge; NPCs in different districts may not.
- **Primary memory** is the character's own experience, synced through the cyclic forward-prop / back-write loop. Canon-merged at every cycle.
- **Clasp memory** is the player-character intimate channel. Available only in in-between mode; never in realworld retrieval; never crosses the dimensional cut.

The same NPC sounds different in different districts because the district layer differs, even though world and primary are constant. **Locality emerges from the schema, not from prompt-engineering.** Even at "low signal" times when no major events fire, NPCs have richly-stratified context — dialog stays fresh because *the layers are deep*, not because new tokens arrive constantly.

### Information propagation pacing

Real worlds have information-propagation delay. Caravans move at horse-speed. News travels with messengers. Distant events arrive blurred and late. AI-NPC systems usually fail uncanny in two directions: (a) every NPC magically knows yesterday's news (omniscient, breaks immersion), or (b) no NPC ever knows anything outside its loaded context (amnesiac, breaks coherence).

Nimmerworld picks **deliberate paced propagation** as a third path. World canon ripples outward through districts at a controlled rate. Distant districts are deliberately stale. **Staleness becomes a feature, not a bug, because it matches reality.**

Each canon-row carries propagation metadata:
- `priority` (urgent / normal / background)
- `scope` (world / district / local-event-only)
- `rate` (ticks-per-district-hop, or instant for urgent world-canon)
- `ttl` (cache lifetime; districts may discard if not refreshed)

This doubles as **backpressure relief** (distant districts get distant events later, lower priority, smaller bandwidth) and as **gameplay currency** — information-travel-time creates informational asymmetry that players can exploit. News-carriers, faction couriers, frontier-rumor merchants, players who physically traverse districts can *carry* knowledge faster than the system propagates it. *Travel becomes valuable because information becomes scarce in the periphery.* This is a real economic primitive falling out of pacing, not a designed feature.

This is *Marx-in-the-schema applied to epistemics.* Information asymmetry is not a bug — it is a structural feature that produces real economic primitives (knowledge-trading, courier-vocations, frontier-information markets) for free.

### What this retires

- Cloud-only NPC dialog → local-first SQLite + embedding-beside, central canon over the cycle
- Per-character memory as a single undifferentiated bucket → memory-classes with class-specific lifecycle
- Generic "memory importance scalar" → trait-axis-vector engagement profile (re-using the +1/0/-1 grammar)
- UI-toggle privacy → diegetic in-between dimension with lifeforce-cost
- Single monolithic prompt context → three-tier knowledge stack with per-layer propagation policy
- "Every NPC knows everything immediately" → paced canon-propagation with priority/scope/rate/ttl per row
- Cross-NPC memory bleed (Mantella/SkyrimNet failure-mode) → per-player local SQLite isolation atop v0.5 lemniscate-geometry foreclosure (two-layer defense)

## Runtime sampling knobs

Temperature, top-P, top-K, repetition-penalty as **per-turn director-controlled levers** rather than static config. Sampling shapes *how* speech sounds (rhythm, surprise, predictability) rather than *what* it says — orthogonal to LoRA. Director composes both content-knobs and sampling-knobs per-turn.

Scene-to-sampling mapping (caste-preacher = 0.3/0.6/low; drunk-scavenger = 1.1/0.95/high; clasp-confession = 0.85/0.92/medium; hivemind-broadcast = 0.2/0.5/very-low; imperial-ceremony-chorus = 0.25/0.55/very-low). Trait-vector → baseline sampling derivation. Affect-state modulates baseline.


---

**Version:** 0.7.0 | **Created:** 2026-04-26 | **Updated:** 2026-04-26 | **Origin:** Split from architecture-index.md v0.7 (2026-04-26)