diff --git a/mod-to-sknpack/readme.md b/mod-to-sknpack/readme.md index e69de29..4f3b184 100644 --- a/mod-to-sknpack/readme.md +++ b/mod-to-sknpack/readme.md @@ -0,0 +1,156 @@ +# mod-to-sknpack — Extract Mod Lore into SkyrimNet Knowledge Packs + +A pipeline that reads record data out of specific Skyrim mods via xEdit, then +distills it into `.sknpack` files that slot in alongside the Oghma vanilla +corpus. Each mod gets its own knowledge pack family so NPCs modded into the +game can speak about content the vanilla Oghma distillation never saw. + +## Why + +`oghma-sknpack/` covers vanilla Tamriel lore beautifully but knows nothing +about what a modded load order actually adds. A playthrough with Legacy of +the Dragonborn, Vigilant, Wyrmstooth, or Beyond Skyrim puts hundreds of +authored books, quest journals, unique artifacts, and named locations into +the game world. Without mod-specific knowledge packs, SkyrimNet NPCs will +either hallucinate about these or refuse to engage. + +The authored narrative content is already *in the plugin files* — we just +need to lift it out, clean it, categorize it, and grade it. xEdit gives us +deterministic record access; the `.sknpack` format is plain JSON; the +glue between them is this project. + +## Pipeline (two stages) + +``` +Mod plugin (.esp / .esm) in load order + │ xEdit Pascal dumper script + ▼ +mod-dump.jsonl (one record per line) + │ Python converter (sknpack-from-dump.py) + ▼ +Nimmersky_-_{Mod}_-_{Category}.sknpack + │ SkyrimNet UI → Knowledge Packs → Import + ▼ +In-game NPC knowledge +``` + +**Why two stages:** xEdit's Pascal is painful for JSON escaping and for +cleaning book-body markup (`
`, ` `, `
`, ``).
+Python handles those trivially. The JSONL intermediate is also *inspectable*
+— we can eyeball what xEdit actually found before any transformation runs.
+
+## Target record signatures
+
+| Signature | Field(s) extracted | Pack category | Default importance |
+|---|---|---|---|
+| `BOOK` | `FULL`, `DESC` (body) | `{Mod}_Books` | 0.75 (authored narrative) |
+| `WEAP` | `FULL`, `DESC` | `{Mod}_Weapons` | 0.40 (0.75 if named artifact) |
+| `ARMO` | `FULL`, `DESC` | `{Mod}_Armor` | 0.40 (0.75 if named artifact) |
+| `MISC` | `FULL`, `DESC` | `{Mod}_Items` | 0.40 |
+| `ACTI` | `FULL`, `DESC` | `{Mod}_Displays` | 0.40 (display plaques, static props) |
+| `QUST` | `FULL`, journal `CNAM` entries | `{Mod}_Quests` | 0.75 |
+| `LCTN` | `FULL`, keywords | `{Mod}_Locations` | 0.50 → `type: LOCATION` |
+| `NPC_` | `FULL`, class, faction | `{Mod}_NPCs` | 0.40 |
+| `SPEL` | `FULL`, `DESC` | `{Mod}_Spells` | 0.50 → `type: SKILL` |
+| `DIAL` + `INFO` | named-NPC dialogue topics | `{Mod}_Dialogue` | 0.40 |
+
+**Always use xEdit's `WinningOverride`** so we get the active version of each
+record under the full load order. Filter `GetLoadOrderFormID` against the
+target plugin's FormID range so we only emit records *originating* in the
+mod, not every vanilla record it touches.
+
+## sknpack format (reminder)
+
+See `../oghma-sknpack/README.md` for the full envelope. Each entry is flat:
+
+```json
+{
+ "content": "...narrative prose, cleaned of markup...",
+ "display_name": "stable_editor_id_or_slug",
+ "type": "KNOWLEDGE | SKILL | LOCATION",
+ "importance": 0.75,
+ "location": "",
+ "tags": [],
+ "emotion": "",
+ "always_inject": false,
+ "condition_expr": ""
+}
+```
+
+Pack envelope: `skyrimnet_knowledge_pack` with `name`, `description`,
+`author`, `version`, `format_version: 1`, `exported_at`, `entries[]`,
+`entry_count`, `npc_groups: []`.
+
+## Importance grading (carries over from Oghma)
+
+- **0.75** — Authored narrative: book text, quest journals, named-artifact
+ backstory. The "scholar" tier.
+- **0.50** — Locations, spells, visual descriptions.
+- **0.40** — Generic items, display plaques, unnamed NPCs. The "commoner"
+ tier — provides scaffolding without dominating token budget.
+
+Where a single topic has both authored and generic variants (e.g. a named
+weapon has both its lore-book entry and a terse display plaque), emit
+*both* — SkyrimNet's ranking picks the right depth for the NPC.
+
+## Pack split convention
+
+```
+Nimmersky_-_{Mod}_-_{Category}.sknpack
+```
+
+Examples for Legacy of the Dragonborn:
+
+- `Nimmersky_-_LOTD_-_Displays.sknpack` — per-relic display descriptions
+- `Nimmersky_-_LOTD_-_Books.sknpack` — Explorer's Guide series, Auryen's
+ journals, in-game books
+- `Nimmersky_-_LOTD_-_Museum.sknpack` — one LOCATION entry per wing (Hall
+ of Heroes, Dragonborn Hall, Daedric, Library, Safehouse, Natural Science,
+ Dwemer, Guildhouse, Hall of Lost Empires, Airship), all with
+ `location="Solitude"`
+- `Nimmersky_-_LOTD_-_Quests.sknpack` — quest journals (Auryen's commission
+ lines, relic-hunt objectives)
+
+## Mod-specific design notes
+
+### Legacy of the Dragonborn (LOTD)
+- Museum wings are the natural chunking unit for LOCATION entries — LOTD
+ tags them via keywords and has dedicated location records.
+- The Explorer's Guide books are pure gold (0.75) — long-form authored
+ narrative about relic backstories.
+- Display plaques are mostly terse ("Blade of Woe — dagger once carried
+ by the Dark Brotherhood assassin Astrid"). These go at 0.40.
+
+### Vigilant, Wyrmstooth, Beyond Skyrim
+- Larger quest arcs — QUST dumping becomes more valuable than for LOTD.
+- Beyond Skyrim: Bruma has whole LCTN hierarchies to preserve (province
+ → hold → settlement → dungeon).
+
+## Open questions
+
+- **Deduplication against vanilla Oghma.** A mod may rename or re-describe
+ a vanilla artifact. Do we suppress the Oghma entry when the mod overrides
+ it, or emit both and let importance grading sort it out?
+- **`condition_expr` gating.** SkyrimNet supports quest-gated visibility
+ (e.g. Auryen only knows about Relic X after quest Y). Phase 2 of Oghma
+ intends to use this. For mods, do we hand-author conditions or infer them
+ from QUST stage references in the dump?
+- **Cross-mod entanglement.** A patch plugin (e.g. LOTD Patches) may carry
+ records that belong *logically* to the parent mod. Decide: filter by
+ originating FormID only, or include overrides that enrich parent-mod
+ content?
+- **Book HTML quirks.** Skyrim books embed `
`, ``,
+ ``, and occasional `[pagebreak]` markers. Need a tested sanitizer
+ that preserves paragraph structure but strips all presentation.
+
+## Not yet decided
+
+- Whether to go via ChromaDB (matches Oghma architecture — ingestion into
+ iris-dev, then `export_packs.py`-style exporter) or straight to
+ `.sknpack` from the JSONL (simpler, no RAG search over mod content).
+- Whether to generate two importance tiers per topic by running an LLM
+ summarization step (Qwen3.5-27B on theia was flagged for Oghma Phase 2).
+
+---
+
+**Version:** 0.1 | **Created:** 2026-04-16 | **Updated:** 2026-04-16