Files
nimmersky/mod-to-sknpack/readme.md
chrysalis 7a451f86d2 docs(mod-to-sknpack): design sketch for xEdit → sknpack pipeline
Two-stage pipeline (xEdit Pascal dumper → JSONL → Python converter → .sknpack)
to distill mod-specific lore into SkyrimNet knowledge packs alongside the
vanilla Oghma corpus. Sketches target record signatures (BOOK/WEAP/ARMO/ACTI/
QUST/LCTN/NPC_/SPEL/DIAL), importance grading rules, pack-split convention
(Nimmersky_-_{Mod}_-_{Category}), LOTD-specific design, and open questions
around deduplication, condition_expr gating, and book markup sanitization.

Version 0.1 — design doc, not spec.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 10:51:58 +00:00

6.5 KiB

mod-to-sknpack — Extract Mod Lore into SkyrimNet Knowledge Packs

A pipeline that reads record data out of specific Skyrim mods via xEdit, then distills it into .sknpack files that slot in alongside the Oghma vanilla corpus. Each mod gets its own knowledge pack family so NPCs modded into the game can speak about content the vanilla Oghma distillation never saw.

Why

oghma-sknpack/ covers vanilla Tamriel lore beautifully but knows nothing about what a modded load order actually adds. A playthrough with Legacy of the Dragonborn, Vigilant, Wyrmstooth, or Beyond Skyrim puts hundreds of authored books, quest journals, unique artifacts, and named locations into the game world. Without mod-specific knowledge packs, SkyrimNet NPCs will either hallucinate about these or refuse to engage.

The authored narrative content is already in the plugin files — we just need to lift it out, clean it, categorize it, and grade it. xEdit gives us deterministic record access; the .sknpack format is plain JSON; the glue between them is this project.

Pipeline (two stages)

Mod plugin (.esp / .esm) in load order
    │  xEdit Pascal dumper script
    ▼
mod-dump.jsonl  (one record per line)
    │  Python converter (sknpack-from-dump.py)
    ▼
Nimmersky_-_{Mod}_-_{Category}.sknpack
    │  SkyrimNet UI → Knowledge Packs → Import
    ▼
In-game NPC knowledge

Why two stages: xEdit's Pascal is painful for JSON escaping and for cleaning book-body markup (<p>, <br>, <font face='$HandwrittenFont'>). Python handles those trivially. The JSONL intermediate is also inspectable — we can eyeball what xEdit actually found before any transformation runs.

Target record signatures

Signature Field(s) extracted Pack category Default importance
BOOK FULL, DESC (body) {Mod}_Books 0.75 (authored narrative)
WEAP FULL, DESC {Mod}_Weapons 0.40 (0.75 if named artifact)
ARMO FULL, DESC {Mod}_Armor 0.40 (0.75 if named artifact)
MISC FULL, DESC {Mod}_Items 0.40
ACTI FULL, DESC {Mod}_Displays 0.40 (display plaques, static props)
QUST FULL, journal CNAM entries {Mod}_Quests 0.75
LCTN FULL, keywords {Mod}_Locations 0.50 → type: LOCATION
NPC_ FULL, class, faction {Mod}_NPCs 0.40
SPEL FULL, DESC {Mod}_Spells 0.50 → type: SKILL
DIAL + INFO named-NPC dialogue topics {Mod}_Dialogue 0.40

Always use xEdit's WinningOverride so we get the active version of each record under the full load order. Filter GetLoadOrderFormID against the target plugin's FormID range so we only emit records originating in the mod, not every vanilla record it touches.

sknpack format (reminder)

See ../oghma-sknpack/README.md for the full envelope. Each entry is flat:

{
  "content": "...narrative prose, cleaned of markup...",
  "display_name": "stable_editor_id_or_slug",
  "type": "KNOWLEDGE | SKILL | LOCATION",
  "importance": 0.75,
  "location": "",
  "tags": [],
  "emotion": "",
  "always_inject": false,
  "condition_expr": ""
}

Pack envelope: skyrimnet_knowledge_pack with name, description, author, version, format_version: 1, exported_at, entries[], entry_count, npc_groups: [].

Importance grading (carries over from Oghma)

  • 0.75 — Authored narrative: book text, quest journals, named-artifact backstory. The "scholar" tier.
  • 0.50 — Locations, spells, visual descriptions.
  • 0.40 — Generic items, display plaques, unnamed NPCs. The "commoner" tier — provides scaffolding without dominating token budget.

Where a single topic has both authored and generic variants (e.g. a named weapon has both its lore-book entry and a terse display plaque), emit both — SkyrimNet's ranking picks the right depth for the NPC.

Pack split convention

Nimmersky_-_{Mod}_-_{Category}.sknpack

Examples for Legacy of the Dragonborn:

  • Nimmersky_-_LOTD_-_Displays.sknpack — per-relic display descriptions
  • Nimmersky_-_LOTD_-_Books.sknpack — Explorer's Guide series, Auryen's journals, in-game books
  • Nimmersky_-_LOTD_-_Museum.sknpack — one LOCATION entry per wing (Hall of Heroes, Dragonborn Hall, Daedric, Library, Safehouse, Natural Science, Dwemer, Guildhouse, Hall of Lost Empires, Airship), all with location="Solitude"
  • Nimmersky_-_LOTD_-_Quests.sknpack — quest journals (Auryen's commission lines, relic-hunt objectives)

Mod-specific design notes

Legacy of the Dragonborn (LOTD)

  • Museum wings are the natural chunking unit for LOCATION entries — LOTD tags them via keywords and has dedicated location records.
  • The Explorer's Guide books are pure gold (0.75) — long-form authored narrative about relic backstories.
  • Display plaques are mostly terse ("Blade of Woe — dagger once carried by the Dark Brotherhood assassin Astrid"). These go at 0.40.

Vigilant, Wyrmstooth, Beyond Skyrim

  • Larger quest arcs — QUST dumping becomes more valuable than for LOTD.
  • Beyond Skyrim: Bruma has whole LCTN hierarchies to preserve (province → hold → settlement → dungeon).

Open questions

  • Deduplication against vanilla Oghma. A mod may rename or re-describe a vanilla artifact. Do we suppress the Oghma entry when the mod overrides it, or emit both and let importance grading sort it out?
  • condition_expr gating. SkyrimNet supports quest-gated visibility (e.g. Auryen only knows about Relic X after quest Y). Phase 2 of Oghma intends to use this. For mods, do we hand-author conditions or infer them from QUST stage references in the dump?
  • Cross-mod entanglement. A patch plugin (e.g. LOTD Patches) may carry records that belong logically to the parent mod. Decide: filter by originating FormID only, or include overrides that enrich parent-mod content?
  • Book HTML quirks. Skyrim books embed <p>, <br>, <font>, <pre>, and occasional [pagebreak] markers. Need a tested sanitizer that preserves paragraph structure but strips all presentation.

Not yet decided

  • Whether to go via ChromaDB (matches Oghma architecture — ingestion into iris-dev, then export_packs.py-style exporter) or straight to .sknpack from the JSONL (simpler, no RAG search over mod content).
  • Whether to generate two importance tiers per topic by running an LLM summarization step (Qwen3.5-27B on theia was flagged for Oghma Phase 2).

Version: 0.1 | Created: 2026-04-16 | Updated: 2026-04-16