Files
nimmersky/skyrimnet/agent-pipelines.md

8.5 KiB

Agent Pipelines

Each agent is selected via a "variant" in overwrite/.../config/OpenRouter.yaml, which maps to a model + endpoint + per-call parameters. Variants are referenced from the C++ DLL when it constructs each request.

Variant routing table [verified] from OpenRouter.yaml + trace dump cross-check

Variant Model alias Endpoint (this install) max_tokens Purpose
gamemaster_evaluation claude-sonnet-4-5-20250929 127.0.0.1:8000 (local Claude proxy) 256 "Should I act and how?" — fires GM action selector
AgentDefault (Dialogue) eva (custom local alias) 10.0.30.21:31000 4096 NPC dialogue generation
meta omega 10.0.30.22:31004 100 Mood eval, memory query gen, classifiers, target selection
vision Qwen3-VL-8B-Instruct-abliterated-v2.Q4_K_M.gguf 10.0.30.22:31005 4000 OmniSight scene description from screenshot
combat / action_evaluation eva same as AgentDefault 500 Combat-flavor dialogue / native action selection
gamemaster_scene_planner (no dedicated variant captured — likely uses AgentDefault) Pre-plans 4-6-beat scenes (consumed by gamemaster_action_selector.prompt:96-119 via scene_plan context var)
intel_story_dm claude-sonnet-4-5-20250929 local proxy IntelEngine plugin's persistent narrative DM
gamemaster_evaluation (TTON) claude-sonnet-4-5-20250929 local proxy OstimNet plugin's nearby-NPC GM

[note] Models can be reconfigured per-agent by editing OpenRouter.yaml. The aliases (eva, omega) are user-defined and resolved by the OpenRouter routing layer.


Gamemaster (GM)

Prompt: prompts/gamemaster_action_selector.prompt (action selector) + prompts/gamemaster_scene_planner.prompt (optional scene-plan generator).

When it fires [verified] from trace dump:

  • Polling tick in continuous mode, roughly every gamemaster.continuousSceneCooldownSeconds (= 30s in this install). [hypothesis] exact timer source not isolated.
  • On player input arrival.
  • After a non-trivial in-game event (combat start, NPC death, location change) — though events are filterable via Events.yaml.

Input context:

  • Recent events (gamemaster.recentEventsCount: 25 controls volume).
  • Nearby actors (gamemaster.nearbyActorRadius: 600).
  • Eligible actions list (populated dynamically from C++; in continuous mode ACTION: None is allowed only if exposed by the prompt — see bugs-and-fixes.md Bug #1).
  • Optional scene_plan if scene planner has run.

Output: exactly one line of the form

ACTION: ActionName PARAMS: {"key": "value", ...}

or ACTION: None. max_tokens: 256 enforces this — no room for prose.

Consumers: the C++ orchestrator parses the ACTION line and dispatches:

  • StartConversation / ContinueConversation → kicks the Dialogue pipeline for the named speaker/target with the given topic.
  • Narrate → triggers a narration-mode LLM call (no specific TTS speaker).
  • None → no-op, scene breathes.
  • Native actions (e.g. OpenTrade) when registered as eligible — fires the Papyrus/C++ callback.

Dialogue Agent (AgentDefault)

Prompt: prompts/dialogue_response.prompt (16-line wrapper) + submodules/system_head/* (load-ordered) + submodules/user_final_instructions/* (load-ordered) + submodules/character_bio/* (per-NPC).

When it fires:

  • GM dispatches StartConversation or ContinueConversation with a target speaker.
  • Player provides text or voice input addressed to an NPC.
  • An NPC's AI Package activates one of SkyrimNet's custom packages (Player Dialogue, NPC Dialogue, TalkToPlayer).

Input context: the NPC's full character bio (assembled from character_bio/ submodules), the recent dialogue history (event_history component), the eligible actions list (if embed_actions_in_dialogue: true), the OmniSight scene description (if vision is enabled), and the topic from the GM (when GM-initiated).

Output: the NPC's spoken line(s), optionally followed on a separate line by ACTION: ActionName ... if embed_actions_in_dialogue: true.

Consumers:

  • FilterActionLines (in DLL, ~ActionManager.cpp:1840) strips any ACTION: line from the dialogue text before TTS.
  • ParseEmbeddedAction (in DLL, ~ActionManager.cpp:1783) extracts the action and dispatches it the same way as a GM-emitted action.
  • The remaining dialogue text is fed to the TTS pipeline (tts_generation span in trace).
  • After dialogue completes, mood evaluation and memory search query generation fire in parallel (both meta variant).

Meta Agents (meta variant)

A family of small classifier/helper calls, all capped at max_tokens: 100.

Prompts:

  • prompts/helpers/evaluate_mood.prompt — post-dialogue mood update for the speaker
  • prompts/helpers/generate_search_query.prompt — turns a dialogue into a memory-retrieval query
  • prompts/helpers/generate_profile.prompt — generates/updates a character profile
  • prompts/target_selectors/dialogue_speaker_selector.prompt — picks who in a group should respond to player
  • prompts/target_selectors/player_dialogue_target_selector.prompt — picks the best NPC for player to address
  • prompts/memory/generate_memory.prompt and memory_ranker.prompt — memory creation/ranking
  • prompts/transformers/native_dialogue_transformer.prompt — text→text transformations
  • prompts/transformers/universal_translator.prompt — translation pipeline

When they fire: mostly post-dialogue. target_selection_llm runs before dialogue when player input arrives.

Output: small structured responses (mood enum, search query string, JSON profile, NPC UUID).


Native Action Selector (action_evaluation variant)

Prompts: prompts/native_action_selector.prompt (stage 1: pick category) + prompts/native_action_selector_drilldown.prompt (stage 2: pick leaf action under that category).

When it fires: after the Dialogue agent produces text, asking "what in-game action does this dialogue imply?". [hypothesis] may not fire if embed_actions_in_dialogue: true and the Dialogue agent already emitted a valid ACTION: line — needs verification (see open-questions.md).

Input: the NPC's just-spoken line + the eligible action list with category groupings.

Output: ACTION: CategoryName PARAMS: {"intent": "..."} from stage 1, then ACTION: LeafActionName PARAMS: {...} from stage 2.

Why two stages: with up to ~105 actions across contributor mods, asking the LLM to pick directly from a flat list is cognitively expensive. Categorizing first (Combat/Communication/Travel/Economy/etc.) narrows the choice set dramatically for stage 2.

[verified] action firing pattern from SkyrimNet.log:104547: ACTION: Communication PARAMS: {"intent": "express gratitude..."}.


Vision Agent (vision variant — OmniSight)

Prompts: prompts/omnisight/describe_actor.prompt, describe_scene.prompt, describe_item.prompt, describe_location.prompt, describe_furniture.prompt, with rendering-mode submodules in submodules/omnisight_*/.

When it fires: on player_text_input and player_direct_input_voice events. Captures a Skyrim screenshot via omnisight_capture_image, then feeds it to the local Qwen3-VL model.

Output: scene description text (up to 4000 tokens), inserted into the Dialogue agent's context as the omnisight block.

Consumers: the Dialogue agent uses this to ground its response in what's visually present (objects, characters, environment) — not just what's in event logs.


How agents chain

[verified] agent chains observed in trace trace_1776469194689_100 (583 spans):

GM tick → ACTION: ContinueConversation → DialoguePipeline kicks
   └─ target_selection_llm (meta) → picks speaker
   └─ Dialogue agent generates text + optional ACTION line
       ├─ inline ACTION parsed → action dispatched
       ├─ remaining text → TTS
       ├─ mood_evaluation (meta) parallel
       └─ memory_search_query_generation (meta) parallel

For player input:

player_text_input → OmniSight (vision) capture → DialoguePipeline kicks
   └─ same downstream as above

Open questions about pipelines

See open-questions.md for unresolved items:

  • Does native_action_selector always fire, or only when embed_actions_in_dialogue: false?
  • What event types currently trigger non-GM agent firings? (Events.yaml lists ~40 event types with toggles.)
  • Does gamemaster_scene_planner ever fire in current config?