9.3 KiB
SkyrimNet Architecture — High-Level Model
What SkyrimNet is
A multi-agent LLM orchestrator that hijacks vanilla Skyrim NPC behavior — replacing static dialogue topics and idle routines with context-aware, LLM-driven scenes. NPCs talk to each other and the player through generated dialogue; their world-affecting actions are picked from a registry of "actions" contributed by SkyrimNet itself and any cooperating mod.
[verified] from SkyrimNet.log:14123-14266 (action library initialization), Source/Scripts/SkyrimNetApi.psc (public API), prompts/gamemaster_action_selector.prompt (GM orchestrator prompt).
The two-plugin architecture
SkyrimNet ships alongside a sibling SKSE plugin called IntelEngine. They're independent SKSE plugins that share the SQLite-backed persistence layer.
| Plugin | Role | Storage |
|---|---|---|
| SkyrimNet | LLM orchestration, dialogue generation, agent pipelines, TTS/STT, action dispatch | overwrite/SKSE/Plugins/SkyrimNet/data/SkyrimNet-{epoch}-{nnnnnn}.db |
| IntelEngine | Persistent narrative/intelligence layer (third-party "story DM"-style agent) | overwrite/SKSE/Plugins/IntelEngine/data/IntelEngine-{epoch}-{nnnnnn}.db |
[verified] from disk layout. Per-game-session DB sharding (epoch suffix = save game timestamp).
The four code layers
┌─────────────────────────────────────────────────┐
│ Closed-source C++ DLL │
│ SKSE/Plugins/SkyrimNet.dll │
│ - LLM orchestration, agent dispatch │
│ - Action parser (ParseEmbeddedAction) │
│ - Decorator implementation │
│ - SQLite persistence + vector embeddings │
└─────────────────────────────────────────────────┘
▲ ▼
┌─────────────────────────────────────────────────┐
│ Open-source Papyrus glue │
│ mods/SkyrimNet/Source/Scripts/*.psc │
│ - SkyrimNetApi.psc (public API surface) │
│ - SkyrimNetInternal.psc (DLL callbacks) │
│ - skynet_MainController.psc (quest entry) │
│ - skynet_Library.psc (shipped action impls) │
│ - skynet_VoiceInput*.psc (STT integration) │
└─────────────────────────────────────────────────┘
▲ ▼
┌─────────────────────────────────────────────────┐
│ Open-source .esp content (Spriggit JSON) │
│ mods/SkyrimNet/plugins/SkyrimNet/ │
│ - 8 custom AI Packages (NPC/Player Dialogue, │
│ Follow, TalkToPlayer) │
│ - Custom Magic Effects (voice input spells) │
│ - Factions (Whitelist/Blacklist/Following) │
│ - Keywords (DialogueTarget/FollowTarget) │
│ - Quests (skynet_MainController, skynet_Mcm) │
└─────────────────────────────────────────────────┘
▲ ▼
┌─────────────────────────────────────────────────┐
│ Configuration & content (text files) │
│ mods/SkyrimNet/SKSE/Plugins/SkyrimNet/ │
│ - prompts/ (Inja templates, three-layer) │
│ - sql/migrations/ (17 schema migrations) │
│ overwrite/SKSE/Plugins/SkyrimNet/ │
│ - config/ (38 YAML files + defaults_manifest)│
│ - data/ (SQLite per-session DBs) │
│ - prompts/ (runtime UI overrides) │
│ Plus contributing mods' config/actions/*.yaml │
└─────────────────────────────────────────────────┘
[verified] All layers exist. The closed-source DLL is the only piece we cannot read directly — we infer behavior from logs, headers, Papyrus callbacks, and traces.
The four agent families
Each agent maps to a "variant" in OpenRouter.yaml, which maps to a model/endpoint. See agent-pipelines.md for the full table.
- Gamemaster (GM) — scene-level orchestrator. Decides "should anything happen now, and if so what?" Polls every ~30s in continuous mode + fires on player input. Emits one
ACTION:line. - Dialogue — generates the actual NPC speech. Triggered by GM actions like
StartConversation/ContinueConversationor by player dialogue input. Can optionally append anACTION:line for inline action firing. - Meta — classifiers and helpers (mood eval, memory query generation, dialogue speaker selection). Capped at ~100 tokens per call.
- Vision (OmniSight) — describes the current scene from a screenshot. Uses a local Qwen3-VL model. Fires on
player_text_inputandplayer_direct_input_voiceevents.
Plus a fifth implicit agent type:
- Native Action Selector — post-dialogue classifier that asks "what in-game action does this NPC's spoken line imply?" Two-stage: category → leaf. Distinct from the GM's scene-level action selection.
End-to-end orchestration trace
For a player text-input event (verified against all_traces_1776478948530.json):
event_received
├─ papyrus_decorator_cache_warmup
│ ├─ get_player
│ ├─ get_nearby_actors
│ └─ papyrus_decorators_async ← warm caches before LLM render
├─ scene_capture
│ └─ omnisight_immediate_scene_capture
│ └─ omnisight_capture_image ← screenshot for vision model
├─ chat_ui_open ← UI block for input
├─ warmup_player_dialogue
│ └─ many decorator:* spans (decnpc, render_subcomponent, …)
└─ dialogue_manager_handle_player_speech
├─ target_selection_llm ← meta-model: who responds?
└─ generate_response
├─ initiate_eligibility_checks (Papyrus IsEligible callbacks)
├─ build_action_context
│ ├─ wait_eligibility_results (≤ 2500ms)
│ ├─ filter_eligible_actions
│ └─ build_action_schemas (JSON schema list for LLM)
├─ build_payload
│ └─ render_template (Inja render of dialogue_response.prompt)
├─ llm_request (variant=AgentDefault → eva)
├─ tts_generation
│ └─ tts_segment_0…N
├─ mood_evaluation (variant=meta → omega, parallel)
└─ memory_search_query_generation (variant=meta → omega, parallel)
For a continuous-mode GM tick (also [verified] from trace):
gamemaster_evaluation_llm
└─ gamemaster_async_llm
└─ llm_request (variant=gamemaster_evaluation → claude-sonnet-4-5, max_tokens=256)
↓
[parser extracts ACTION: line]
↓
if action == StartConversation or ContinueConversation:
player_dialogue_manager_process_event
└─ dialogue_manager_handle_perceived_event
└─ generate_response (full pipeline above)
Where the bottlenecks are
[hypothesis] based on the trace structure and log volumes:
- GM
max_tokens: 256is a hard ceiling. With three contributor mods registering ~105 actions total, the GM has to reason over a largeeligible_actionslist and emit one ACTION line — the two-stage drilldown and category wrapper exist precisely to compress this cognitive load. wait_eligibility_resultsblocks for up to 2500ms. Slow Papyrus eligibility callbacks shrink the available action set. This is a Skyrim-VM-side performance dependency that no LLM tuning can fix.- OmniSight vision runs locally on a Qwen3-VL model. Image capture + inference adds latency before any text generation can begin.
Adjacent technologies in the substrate
- whisper.cpp for local STT (
SKSE/Plugins/SkyrimNet/libs/whisper.dll+ggml*.dllfor CPU/CUDA/Vulkan/OpenCL backends). - all-MiniLM-L6-v2 sentence-transformer for semantic embedding of NPC memories (
SKSE/Plugins/SkyrimNet/models/all-MiniLM-L6-v2-tokenizer.json). - ONNX runtime (
onnxruntime_skyrimnet.dll) — likely VAD or auxiliary model inference. - espeak-ng voice data (
SKSE/Plugins/SkyrimNet/models/espeak-ng-data/) — TTS phoneme tables for Piper/PocketTTS. - Spriggit to git-track the .esp content as JSON.
Cross-references
- For per-agent firing details see
agent-pipelines.md. - For the prompt template system see
prompt-templates.md. - For action registration and the
ACTION:parser seeaction-system.md. - For YAML config behavior see
config-knobs.md. - For known bugs and what was tried see
bugs-and-fixes.md.