Files
nimmersky/skyrimnet/architecture.md

9.3 KiB

SkyrimNet Architecture — High-Level Model

What SkyrimNet is

A multi-agent LLM orchestrator that hijacks vanilla Skyrim NPC behavior — replacing static dialogue topics and idle routines with context-aware, LLM-driven scenes. NPCs talk to each other and the player through generated dialogue; their world-affecting actions are picked from a registry of "actions" contributed by SkyrimNet itself and any cooperating mod.

[verified] from SkyrimNet.log:14123-14266 (action library initialization), Source/Scripts/SkyrimNetApi.psc (public API), prompts/gamemaster_action_selector.prompt (GM orchestrator prompt).

The two-plugin architecture

SkyrimNet ships alongside a sibling SKSE plugin called IntelEngine. They're independent SKSE plugins that share the SQLite-backed persistence layer.

Plugin Role Storage
SkyrimNet LLM orchestration, dialogue generation, agent pipelines, TTS/STT, action dispatch overwrite/SKSE/Plugins/SkyrimNet/data/SkyrimNet-{epoch}-{nnnnnn}.db
IntelEngine Persistent narrative/intelligence layer (third-party "story DM"-style agent) overwrite/SKSE/Plugins/IntelEngine/data/IntelEngine-{epoch}-{nnnnnn}.db

[verified] from disk layout. Per-game-session DB sharding (epoch suffix = save game timestamp).

The four code layers

┌─────────────────────────────────────────────────┐
│  Closed-source C++ DLL                          │
│    SKSE/Plugins/SkyrimNet.dll                   │
│    - LLM orchestration, agent dispatch          │
│    - Action parser (ParseEmbeddedAction)        │
│    - Decorator implementation                    │
│    - SQLite persistence + vector embeddings     │
└─────────────────────────────────────────────────┘
                       ▲ ▼
┌─────────────────────────────────────────────────┐
│  Open-source Papyrus glue                       │
│    mods/SkyrimNet/Source/Scripts/*.psc          │
│    - SkyrimNetApi.psc (public API surface)      │
│    - SkyrimNetInternal.psc (DLL callbacks)      │
│    - skynet_MainController.psc (quest entry)    │
│    - skynet_Library.psc (shipped action impls)  │
│    - skynet_VoiceInput*.psc (STT integration)   │
└─────────────────────────────────────────────────┘
                       ▲ ▼
┌─────────────────────────────────────────────────┐
│  Open-source .esp content (Spriggit JSON)       │
│    mods/SkyrimNet/plugins/SkyrimNet/            │
│    - 8 custom AI Packages (NPC/Player Dialogue, │
│      Follow, TalkToPlayer)                      │
│    - Custom Magic Effects (voice input spells)  │
│    - Factions (Whitelist/Blacklist/Following)   │
│    - Keywords (DialogueTarget/FollowTarget)     │
│    - Quests (skynet_MainController, skynet_Mcm) │
└─────────────────────────────────────────────────┘
                       ▲ ▼
┌─────────────────────────────────────────────────┐
│  Configuration & content (text files)           │
│    mods/SkyrimNet/SKSE/Plugins/SkyrimNet/       │
│    - prompts/ (Inja templates, three-layer)     │
│    - sql/migrations/ (17 schema migrations)     │
│    overwrite/SKSE/Plugins/SkyrimNet/            │
│    - config/ (38 YAML files + defaults_manifest)│
│    - data/ (SQLite per-session DBs)             │
│    - prompts/ (runtime UI overrides)            │
│  Plus contributing mods' config/actions/*.yaml  │
└─────────────────────────────────────────────────┘

[verified] All layers exist. The closed-source DLL is the only piece we cannot read directly — we infer behavior from logs, headers, Papyrus callbacks, and traces.

The four agent families

Each agent maps to a "variant" in OpenRouter.yaml, which maps to a model/endpoint. See agent-pipelines.md for the full table.

  1. Gamemaster (GM) — scene-level orchestrator. Decides "should anything happen now, and if so what?" Polls every ~30s in continuous mode + fires on player input. Emits one ACTION: line.
  2. Dialogue — generates the actual NPC speech. Triggered by GM actions like StartConversation / ContinueConversation or by player dialogue input. Can optionally append an ACTION: line for inline action firing.
  3. Meta — classifiers and helpers (mood eval, memory query generation, dialogue speaker selection). Capped at ~100 tokens per call.
  4. Vision (OmniSight) — describes the current scene from a screenshot. Uses a local Qwen3-VL model. Fires on player_text_input and player_direct_input_voice events.

Plus a fifth implicit agent type:

  1. Native Action Selectorpost-dialogue classifier that asks "what in-game action does this NPC's spoken line imply?" Two-stage: category → leaf. Distinct from the GM's scene-level action selection.

End-to-end orchestration trace

For a player text-input event (verified against all_traces_1776478948530.json):

event_received
  ├─ papyrus_decorator_cache_warmup
  │     ├─ get_player
  │     ├─ get_nearby_actors
  │     └─ papyrus_decorators_async       ← warm caches before LLM render
  ├─ scene_capture
  │     └─ omnisight_immediate_scene_capture
  │           └─ omnisight_capture_image  ← screenshot for vision model
  ├─ chat_ui_open                          ← UI block for input
  ├─ warmup_player_dialogue
  │     └─ many decorator:* spans (decnpc, render_subcomponent, …)
  └─ dialogue_manager_handle_player_speech
        ├─ target_selection_llm           ← meta-model: who responds?
        └─ generate_response
              ├─ initiate_eligibility_checks   (Papyrus IsEligible callbacks)
              ├─ build_action_context
              │     ├─ wait_eligibility_results  (≤ 2500ms)
              │     ├─ filter_eligible_actions
              │     └─ build_action_schemas      (JSON schema list for LLM)
              ├─ build_payload
              │     └─ render_template           (Inja render of dialogue_response.prompt)
              ├─ llm_request                    (variant=AgentDefault → eva)
              ├─ tts_generation
              │     └─ tts_segment_0…N
              ├─ mood_evaluation                 (variant=meta → omega, parallel)
              └─ memory_search_query_generation  (variant=meta → omega, parallel)

For a continuous-mode GM tick (also [verified] from trace):

gamemaster_evaluation_llm
  └─ gamemaster_async_llm
        └─ llm_request (variant=gamemaster_evaluation → claude-sonnet-4-5, max_tokens=256)
              ↓
[parser extracts ACTION: line]
              ↓
if action == StartConversation or ContinueConversation:
   player_dialogue_manager_process_event
     └─ dialogue_manager_handle_perceived_event
           └─ generate_response (full pipeline above)

Where the bottlenecks are

[hypothesis] based on the trace structure and log volumes:

  • GM max_tokens: 256 is a hard ceiling. With three contributor mods registering ~105 actions total, the GM has to reason over a large eligible_actions list and emit one ACTION line — the two-stage drilldown and category wrapper exist precisely to compress this cognitive load.
  • wait_eligibility_results blocks for up to 2500ms. Slow Papyrus eligibility callbacks shrink the available action set. This is a Skyrim-VM-side performance dependency that no LLM tuning can fix.
  • OmniSight vision runs locally on a Qwen3-VL model. Image capture + inference adds latency before any text generation can begin.

Adjacent technologies in the substrate

  • whisper.cpp for local STT (SKSE/Plugins/SkyrimNet/libs/whisper.dll + ggml*.dll for CPU/CUDA/Vulkan/OpenCL backends).
  • all-MiniLM-L6-v2 sentence-transformer for semantic embedding of NPC memories (SKSE/Plugins/SkyrimNet/models/all-MiniLM-L6-v2-tokenizer.json).
  • ONNX runtime (onnxruntime_skyrimnet.dll) — likely VAD or auxiliary model inference.
  • espeak-ng voice data (SKSE/Plugins/SkyrimNet/models/espeak-ng-data/) — TTS phoneme tables for Piper/PocketTTS.
  • Spriggit to git-track the .esp content as JSON.

Cross-references