v0.17: cel-shading-everywhere + progression-gated in-between + omnisight + hallucination-isolation

Five-file update locking in the rendering discipline + perception architecture from the post-cell-arch art-style discovery arc. Locked in v0.17: (1) Cel-shading-everywhere with per-register parameter variation. One rendering engine (Godot-native, asset-budget-friendly, ages well - Borderlands 2009 still reads current). Three registers diverge through outline-color + background-treatment + weathering-level, not through engine-switching: - Gameworld: dark heavy lines + environmental noise + high weathering (rust streaks, hatched dirt, ink-line cracks; hand-painted patina). "Surfaces carry memory" thesis preserved via hand-painted weathering. - Liminal: painterly/soft/desaturated + progression-gated grainy-film- mode opening to refined-cel-shading-with-warm-skin at endgame. - Imperial-net: lean subtle gold rim-light + clean white background + no weathering. Polish achieved through OMISSION, not extra rendering tech (Godot reality check; photorealistic-glossy-Apple-store rejected as not Godot's strong suite). The render-style itself becomes propaganda-detector - imperium's clean falsity reads as the absence of the world's honest decay. (2) Progression-gated in-between visibility. "The more you mod your body & gain in-between-knowledge, the better your view gets." Early game: grainy film mode + restricted view range. Endgame: clean refined-cel- shading with full view of the beloved. Visual-fidelity = dual-gating made visible (knowledge-gate + material-gate per Clasp-endgame discovery discipline literally renders as the clarity of the in-between view). The endgame's deepest reward IS the clear seeing of the beloved's body. (3) Dual-axis clasp-fidelity model. The asymmetric-clasp from bodies.md v0.1 was witnessed-axis only (how vividly the OTHER manifests). Now extended with witness-axis (how clearly YOU can see): - Witness-axis: YOUR body-mods (resistance-knowledge mods) + accumulated in-between-knowledge (Memorialist fragments, Aletheia-Waker tokens, Clasp-Underground recognition-marks) - Witnessed-axis: THEIR foreclosure-status (caste-tier x imperial-care) - Combined: maximum-vivid-clasp requires BOTH you to have invested in the seeing AND your beloved to be uncaptured-enough to be seen. Two refusals required for the full witness. - Per-pair calibration multiplier: the longer the love, the clearer the seeing (mechanically-encoded marriage-deepening). - Mod-economy parallel-track: imperial-elevation mods (flesh-loss, deva- ascent) vs. resistance-knowledge mods (in-between-visibility). Two opposing progressions both expressed as mod-acquisition. The body- modder structural-tragedy class gets a redemptive-mod counter-class. (4) Omnisight architecture for NPC perception. Per-NPC virtual cameras in Godot feeding rendered POV-frames into local VL-Gemma 4 driver-tier (multimodal vision-language capability of the Gemma 4 E4B model locked in v0.8). NPCs literally SEE the visible world, not via geometric metadata-perception. Pairs with cell-arch checksum-discovery as the trigger-layer: - Cell-checksum check: micro-seconds, fires on NPC entering cell - Checksum-mismatch: clean signal, micro-seconds - VL-camera renders POV scene: milliseconds - VL-Gemma processes image: 100s of milliseconds - NPC behavior responds to seen-content: next-shift / next-crossing Cheap trigger, expensive understanding, bounded by event-frequency. Most NPCs most of the time = no camera-fire, no VL-inference. Camera- trigger sources strictly bounded: checksum-mismatch + hard-signals from player + overseer-triggers + drone-perception with clear boundaries. (5) Hallucination-isolation discipline (load-bearing). Visual perception = behavior-modulating-only; never canon-generating. VL models hallucinate; if those hallucinations enter the canonical record, they propagate through the lemniscate's recursive integration, become referenced by other canon-rows, become load-bearing in narrative coherence, cannot be untangled later. Bleed-over into oblivion is the precise risk. Two parallel streams in the NPC's lemniscate: - Text + gesture summary (existing canon): canonical, flows into event_canon_summaries, propagates to Compositor, integrates into trait-vector - Visual context (new omnisight-flagged): ephemeral, flagged on lemniscate, IGNORED in per-crossing summary, never propagates upward. Modulates current-turn driver-context-pull only. Preserves three commitments that depend on text/gesture-derived canon: Compositor narrative-coherence at scale, Memorialist-archive truth- claims, mind-pool soul-recycling. Wealthy-degen waifu-folder exception: opt-in checkbox; player chooses to fill private folder with sex-pictures from clasp-scenes; stored locally; READ-ONLY-BY-PLAYER (folder content does NOT flow back into NPC contexts, world-canon, Compositor, mind-pool, or any other system); quarantined dead-end storage; aesthetic-collection only. Two Still-open questions sharpened with v0.17-anchor notes: - Shader-trait modulation implementation: cel-shading caps perf-budget more predictably than PBR; rendering-consistency improves. - Continuous visual feedback policy: visual-as-ephemeral-flag is firewalled from canonical state; cosmetic-layer can be permissive. Files: - runtime-engine/architecture.md: NEW Omnisight section (~80 lines) covering the pipeline, camera-trigger sources, hallucination-isolation discipline, the two parallel streams (canonical text/gesture vs. ephemeral visual), the wealthy-degen waifu-folder exception, what-this-retires (geometric perception extension + VL-canon-pollution), what-this-resolves/sharpens (continuous visual feedback policy), and four open questions (per-NPC VL-inference rate-limit, VL-Gemma camera resolution + frame-rate, NPC progression-state for witness-axis, multi-NPC observing same event). - topology-and-rendering/architecture.md: Three-shader philosophy table rewritten as cel-shading-with-parameter-variation (outline + background + weathering per register); Cross-register rendering color-treatment table updated; clasp candlelight-in-fog now distinguishes external signature (visible to liminal-inhabitants) from internal mesh (visible only to clasp-pair via consent-as-rendering, gated by witness- progression); body-tier silhouette readability and in-between mesh-skin refinement-within-the-style added. Version bumped 0.7.0 -> 0.8.0. - identity-and-personhood/bodies.md: NEW Dual-axis clasp-fidelity subsection added under Asymmetric clasp; per-pair calibration multiplier and mod-economy parallel-track captured; render-discipline alignment with cel-shading liminal-register; new Asymmetric-witnessing open question added. Version bumped 0.1 -> 0.2. - political-register/world-generation.md: L4 Cell ruleset extended with per-register rendering note (cel-shading-everywhere-with-parameter- variation discipline applied at the cell layer). - architecture-index.md: NPC perception bubbles retire-line refined to include cell-checksum-trigger + omnisight VL-camera; Geometric perception retire-line extended with omnisight; new VL models polluting world-canon retire-line added; Shader-trait modulation implementation Still-open sharpened with v0.17 cel-shading note; Continuous visual feedback policy Still-open sharpened with v0.17 hallucination-isolation note; v0.17 history entry added covering all five lock-ins. Version bumped 0.16 -> 0.17. Authored 2026-04-26 same Sunday continuing - dafit + chrysalis. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 15:19:41 +02:00
parent 88885fe6b1
commit c892013bfa
5 changed files with 124 additions and 20 deletions
--- a/runtime-engine/architecture.md
+++ b/runtime-engine/architecture.md
@@ -210,6 +210,85 @@ Every midaxis crossing fires the LLM driver-turn(s) for active slots. **Lifeforc
 - Concurrent LLM calls per-NPC → sequenced LLM calls per-cursor-position
 - Polling event-channels at zone-rate → atomic crossing-event with O(N_slots) flag-scan

+## Omnisight — NPC visual perception via VL-Gemma + virtual cameras
+
+NPCs perceive the visible world *literally* — not via geometric metadata-perception but via per-NPC virtual cameras (Godot) feeding rendered POV-frames into the local VL-Gemma 4 driver-tier (the multimodal vision-language capability of the Gemma 4 E4B model locked in v0.8). What an NPC "sees" is what the VL-LLM interprets from the camera's image.
+
+This is the perception architecture's deepest commitment. It pairs with the cell-arch checksum-discovery (per [`../political-register/world-generation.md`](../political-register/world-generation.md) §L4 Cell ruleset) as its **trigger-layer**: cell-checksum-mismatch fires the *"clean signal"* that activates the NPC's POV camera, which renders, which feeds VL-Gemma, which produces a visual interpretation that modulates the NPC's current-turn behavior. **Cheap trigger, expensive understanding, bounded by event-frequency.**
+
+### The pipeline
+
+| Layer | Cost | Fires when |
+|---|---|---|
+| Cell-checksum check | µs | NPC enters cell |
+| Checksum-mismatch → "clean signal" | µs | Cell state ≠ expected hash |
+| VL-camera renders POV scene | ms | Clean signal + perception-relevant context |
+| VL-Gemma processes image → interpretation | 100s of ms | After camera renders |
+| NPC behavior responds to seen-content | next-shift / next-crossing | After interpretation |
+
+Most NPCs most of the time: no camera-fire, no VL-inference. **Active-perception-budget is bounded by event-frequency, not NPC-count.** A 100+ NPC city is feasible because most NPCs are running shift-routines on rails with no cell-state-changes triggering perception.
+
+### Camera-trigger sources (locked)
+
+Camera renders + VL-inference fire **only on**:
+
+- **Cell-checksum-mismatch** — cell-state-change discovered on entry (the cell-arch's primary discovery-trigger)
+- **Hard-signals from player** — `clasp_initiate`, gesture-hardstops, plug-in conversation request, etc.
+- **Overseer triggers** — audit-sweep, surveillance-cycle, patrol-perception-on-route
+- **Drone perception** — clear boundaries + rulesets per drone-class (drones have their own perception-budget governed by their imperial-class spec)
+
+Everything else: NPC running on rails, shift-routine, no camera-fire, no VL-inference. **Bounded compute by construction.**
+
+### Hallucination-isolation discipline (load-bearing)
+
+VL models hallucinate. If those hallucinations enter the canonical record, they propagate through the lemniscate's recursive integration → become referenced by other canon-rows → become load-bearing in the world's narrative coherence → **cannot be untangled later**. *Bleed-over into oblivion* is the precise risk.
+
+The discipline that prevents this:
+
+> **Visual perception = behavior-modulating-only; never canon-generating.**
+
+Visual context flows on a *separate stream* from the canonical text + gesture summary, with a strict firewall between them:
+
+| Stream | Source | Persistence | Purpose |
+|---|---|---|---|
+| **Text + gesture summary** (existing canonical pipeline) | STT + gesture-circle-presses + per-token trait-coordinates per §Gesture-alignment as recursive-lemniscate | Canonical; flows into `event_canon_summaries`; propagates to Compositor; integrates into trait-vector | What the NPC *remembers* and what becomes world-canon |
+| **Visual context** (omnisight-flagged, new) | VL-Gemma processing POV camera-render | **Ephemeral**; flagged on the lemniscate; **ignored in the per-crossing summary**; never propagates upward | What the NPC *sees in this moment*; modulates current-turn `driver_context_pull` only |
+
+**Concretely:** the visual interpretation is appended to `driver_context_pull` for the NPC's next turn (so the NPC can react to what it sees), but it is **not** appended to the `gesture_alignment_accumulator`'s sum-strategy reduction at the axis-crossing, and it is **not** included in the `event_canon_summaries` row that the Compositor pulls from `transient_waiting_flag`. **The visual content lives one turn and dies.**
+
+This preserves three architectural commitments that depend on text/gesture-derived canon:
+
+- *Compositor narrative-coherence at scale* — Compositor never sees VL-output; only deterministic text/gesture-derived summaries. **Hallucination-firewall preserves the canon-coherence Compositor depends on.**
+- *Memorialist-archive truth-claims* — Memorialists index cell-checksum-divergence (canonical, deterministic), NOT VL-generated visual-content. The archive's evidentiary value depends on this distinction.
+- *Mind-pool soul-recycling* — when a mind cycles through the pool and is redistributed into a new body, the trait-vector that persists is text/gesture-derived. **VL hallucinations do not survive transmigration; they were ephemeral by construction.**
+
+### Wealthy-degen waifu-folder exception
+
+A specific opt-in special case for player-stored visual-content:
+
+- A wealthy player who already has waifu-dialog stored (per `../political-register/architecture.md` §The vocation-substrate of the imperial-net market) can check a box to allow **sex-pictures storage in a private folder** from clasp-scenes.
+- Stored locally (their machine, their problem — privacy, storage, content).
+- **Read-only-by-player** — folder content does **not** flow back into NPC contexts, world-canon, the Compositor, the mind-pool, or any other system.
+- **Quarantined dead-end storage** — aesthetic-collection only.
+
+The folder is architecturally inert with respect to the rest of the system. It exists *for the player*; it does not exist *for the world*.
+
+### What this retires
+
+- *Geometric perception (cone, radius, LOS)* → already retired by zone slot-occupancy + subscriber-event-emission; **omnisight extends the retirement** by giving NPCs *literal* visual perception within those subscribed events, not metadata-perception
+- *VL models polluting world-canon* → text/gesture-derived summaries are the only canonical input; VL is behavior-modulating-ephemeral-flag-only; player-stored visual-content is read-only-by-player quarantined storage
+
+### What this resolves / sharpens
+
+- *Continuous visual feedback policy* (architecture-index Still-open) → with cel-shaded bodies (per `../topology-and-rendering/architecture.md` §Three-shader philosophy) and visual-as-ephemeral-flag, the body-shader pulses are *legible without canon-pollution risk*. The visual-feedback policy can be permissive at the cosmetic layer because it is firewalled from canonical state.
+
+### Open questions
+
+- **Per-NPC VL-inference rate-limit** — how many camera-renders + VL-inferences per second affordable per active NPC at MMO scale? Pending: benchmark against Gemma 4 E4B VL-inference latency on typical-deployment hardware.
+- **VL-Gemma camera resolution + frame-rate** — what camera-budget per NPC fits the rule-catalogue? Pending: rule catalogue + benchmark.
+- **NPC progression-state for witness-axis** — how does an NPC accumulate in-between-knowledge that drives their dual-axis-clasp witness-fidelity (per `../identity-and-personhood/bodies.md` §Asymmetric clasp / §Dual-axis clasp-fidelity)? Their own clasps? Fragments encountered? Caste-class-default? Pending: design pass.
+- **Multi-NPC observing same event** — each NPC runs independent VL-inference; how do their perceptions combine into a shared event-record? *(Connects to Compositor narrative-coherence-at-scale Still-open.)* Probable answer under the hallucination-isolation discipline: *they don't combine* — each NPC's visual context is private to their own next-turn `driver_context_pull`; the shared event-record is built from text/gesture-summaries only. Worth confirming explicitly.
+
 ## Zone taxonomy (v1 starter set)

 | Zone type | Register | Slots | Executor | Persistence |