diff --git a/mod-to-sknpack/readme.md b/mod-to-sknpack/readme.md index e69de29..4f3b184 100644 --- a/mod-to-sknpack/readme.md +++ b/mod-to-sknpack/readme.md @@ -0,0 +1,156 @@ +# mod-to-sknpack — Extract Mod Lore into SkyrimNet Knowledge Packs + +A pipeline that reads record data out of specific Skyrim mods via xEdit, then +distills it into `.sknpack` files that slot in alongside the Oghma vanilla +corpus. Each mod gets its own knowledge pack family so NPCs modded into the +game can speak about content the vanilla Oghma distillation never saw. + +## Why + +`oghma-sknpack/` covers vanilla Tamriel lore beautifully but knows nothing +about what a modded load order actually adds. A playthrough with Legacy of +the Dragonborn, Vigilant, Wyrmstooth, or Beyond Skyrim puts hundreds of +authored books, quest journals, unique artifacts, and named locations into +the game world. Without mod-specific knowledge packs, SkyrimNet NPCs will +either hallucinate about these or refuse to engage. + +The authored narrative content is already *in the plugin files* — we just +need to lift it out, clean it, categorize it, and grade it. xEdit gives us +deterministic record access; the `.sknpack` format is plain JSON; the +glue between them is this project. + +## Pipeline (two stages) + +``` +Mod plugin (.esp / .esm) in load order + │ xEdit Pascal dumper script + ▼ +mod-dump.jsonl (one record per line) + │ Python converter (sknpack-from-dump.py) + ▼ +Nimmersky_-_{Mod}_-_{Category}.sknpack + │ SkyrimNet UI → Knowledge Packs → Import + ▼ +In-game NPC knowledge +``` + +**Why two stages:** xEdit's Pascal is painful for JSON escaping and for +cleaning book-body markup (`

`, `
`, ``). +Python handles those trivially. The JSONL intermediate is also *inspectable* +— we can eyeball what xEdit actually found before any transformation runs. + +## Target record signatures + +| Signature | Field(s) extracted | Pack category | Default importance | +|---|---|---|---| +| `BOOK` | `FULL`, `DESC` (body) | `{Mod}_Books` | 0.75 (authored narrative) | +| `WEAP` | `FULL`, `DESC` | `{Mod}_Weapons` | 0.40 (0.75 if named artifact) | +| `ARMO` | `FULL`, `DESC` | `{Mod}_Armor` | 0.40 (0.75 if named artifact) | +| `MISC` | `FULL`, `DESC` | `{Mod}_Items` | 0.40 | +| `ACTI` | `FULL`, `DESC` | `{Mod}_Displays` | 0.40 (display plaques, static props) | +| `QUST` | `FULL`, journal `CNAM` entries | `{Mod}_Quests` | 0.75 | +| `LCTN` | `FULL`, keywords | `{Mod}_Locations` | 0.50 → `type: LOCATION` | +| `NPC_` | `FULL`, class, faction | `{Mod}_NPCs` | 0.40 | +| `SPEL` | `FULL`, `DESC` | `{Mod}_Spells` | 0.50 → `type: SKILL` | +| `DIAL` + `INFO` | named-NPC dialogue topics | `{Mod}_Dialogue` | 0.40 | + +**Always use xEdit's `WinningOverride`** so we get the active version of each +record under the full load order. Filter `GetLoadOrderFormID` against the +target plugin's FormID range so we only emit records *originating* in the +mod, not every vanilla record it touches. + +## sknpack format (reminder) + +See `../oghma-sknpack/README.md` for the full envelope. Each entry is flat: + +```json +{ + "content": "...narrative prose, cleaned of markup...", + "display_name": "stable_editor_id_or_slug", + "type": "KNOWLEDGE | SKILL | LOCATION", + "importance": 0.75, + "location": "", + "tags": [], + "emotion": "", + "always_inject": false, + "condition_expr": "" +} +``` + +Pack envelope: `skyrimnet_knowledge_pack` with `name`, `description`, +`author`, `version`, `format_version: 1`, `exported_at`, `entries[]`, +`entry_count`, `npc_groups: []`. + +## Importance grading (carries over from Oghma) + +- **0.75** — Authored narrative: book text, quest journals, named-artifact + backstory. The "scholar" tier. +- **0.50** — Locations, spells, visual descriptions. +- **0.40** — Generic items, display plaques, unnamed NPCs. The "commoner" + tier — provides scaffolding without dominating token budget. + +Where a single topic has both authored and generic variants (e.g. a named +weapon has both its lore-book entry and a terse display plaque), emit +*both* — SkyrimNet's ranking picks the right depth for the NPC. + +## Pack split convention + +``` +Nimmersky_-_{Mod}_-_{Category}.sknpack +``` + +Examples for Legacy of the Dragonborn: + +- `Nimmersky_-_LOTD_-_Displays.sknpack` — per-relic display descriptions +- `Nimmersky_-_LOTD_-_Books.sknpack` — Explorer's Guide series, Auryen's + journals, in-game books +- `Nimmersky_-_LOTD_-_Museum.sknpack` — one LOCATION entry per wing (Hall + of Heroes, Dragonborn Hall, Daedric, Library, Safehouse, Natural Science, + Dwemer, Guildhouse, Hall of Lost Empires, Airship), all with + `location="Solitude"` +- `Nimmersky_-_LOTD_-_Quests.sknpack` — quest journals (Auryen's commission + lines, relic-hunt objectives) + +## Mod-specific design notes + +### Legacy of the Dragonborn (LOTD) +- Museum wings are the natural chunking unit for LOCATION entries — LOTD + tags them via keywords and has dedicated location records. +- The Explorer's Guide books are pure gold (0.75) — long-form authored + narrative about relic backstories. +- Display plaques are mostly terse ("Blade of Woe — dagger once carried + by the Dark Brotherhood assassin Astrid"). These go at 0.40. + +### Vigilant, Wyrmstooth, Beyond Skyrim +- Larger quest arcs — QUST dumping becomes more valuable than for LOTD. +- Beyond Skyrim: Bruma has whole LCTN hierarchies to preserve (province + → hold → settlement → dungeon). + +## Open questions + +- **Deduplication against vanilla Oghma.** A mod may rename or re-describe + a vanilla artifact. Do we suppress the Oghma entry when the mod overrides + it, or emit both and let importance grading sort it out? +- **`condition_expr` gating.** SkyrimNet supports quest-gated visibility + (e.g. Auryen only knows about Relic X after quest Y). Phase 2 of Oghma + intends to use this. For mods, do we hand-author conditions or infer them + from QUST stage references in the dump? +- **Cross-mod entanglement.** A patch plugin (e.g. LOTD Patches) may carry + records that belong *logically* to the parent mod. Decide: filter by + originating FormID only, or include overrides that enrich parent-mod + content? +- **Book HTML quirks.** Skyrim books embed `

`, `
`, ``, + `

`, and occasional `[pagebreak]` markers. Need a tested sanitizer
+  that preserves paragraph structure but strips all presentation.
+
+## Not yet decided
+
+- Whether to go via ChromaDB (matches Oghma architecture — ingestion into
+  iris-dev, then `export_packs.py`-style exporter) or straight to
+  `.sknpack` from the JSONL (simpler, no RAG search over mod content).
+- Whether to generate two importance tiers per topic by running an LLM
+  summarization step (Qwen3.5-27B on theia was flagged for Oghma Phase 2).
+
+---
+
+**Version:** 0.1 | **Created:** 2026-04-16 | **Updated:** 2026-04-16