Files
nimmersky/mod-to-sknpack/readme.md
chrysalis 7a451f86d2 docs(mod-to-sknpack): design sketch for xEdit → sknpack pipeline
Two-stage pipeline (xEdit Pascal dumper → JSONL → Python converter → .sknpack)
to distill mod-specific lore into SkyrimNet knowledge packs alongside the
vanilla Oghma corpus. Sketches target record signatures (BOOK/WEAP/ARMO/ACTI/
QUST/LCTN/NPC_/SPEL/DIAL), importance grading rules, pack-split convention
(Nimmersky_-_{Mod}_-_{Category}), LOTD-specific design, and open questions
around deduplication, condition_expr gating, and book markup sanitization.

Version 0.1 — design doc, not spec.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 10:51:58 +00:00

157 lines
6.5 KiB
Markdown

# mod-to-sknpack — Extract Mod Lore into SkyrimNet Knowledge Packs
A pipeline that reads record data out of specific Skyrim mods via xEdit, then
distills it into `.sknpack` files that slot in alongside the Oghma vanilla
corpus. Each mod gets its own knowledge pack family so NPCs modded into the
game can speak about content the vanilla Oghma distillation never saw.
## Why
`oghma-sknpack/` covers vanilla Tamriel lore beautifully but knows nothing
about what a modded load order actually adds. A playthrough with Legacy of
the Dragonborn, Vigilant, Wyrmstooth, or Beyond Skyrim puts hundreds of
authored books, quest journals, unique artifacts, and named locations into
the game world. Without mod-specific knowledge packs, SkyrimNet NPCs will
either hallucinate about these or refuse to engage.
The authored narrative content is already *in the plugin files* — we just
need to lift it out, clean it, categorize it, and grade it. xEdit gives us
deterministic record access; the `.sknpack` format is plain JSON; the
glue between them is this project.
## Pipeline (two stages)
```
Mod plugin (.esp / .esm) in load order
│ xEdit Pascal dumper script
mod-dump.jsonl (one record per line)
│ Python converter (sknpack-from-dump.py)
Nimmersky_-_{Mod}_-_{Category}.sknpack
│ SkyrimNet UI → Knowledge Packs → Import
In-game NPC knowledge
```
**Why two stages:** xEdit's Pascal is painful for JSON escaping and for
cleaning book-body markup (`<p>`, `<br>`, `<font face='$HandwrittenFont'>`).
Python handles those trivially. The JSONL intermediate is also *inspectable*
— we can eyeball what xEdit actually found before any transformation runs.
## Target record signatures
| Signature | Field(s) extracted | Pack category | Default importance |
|---|---|---|---|
| `BOOK` | `FULL`, `DESC` (body) | `{Mod}_Books` | 0.75 (authored narrative) |
| `WEAP` | `FULL`, `DESC` | `{Mod}_Weapons` | 0.40 (0.75 if named artifact) |
| `ARMO` | `FULL`, `DESC` | `{Mod}_Armor` | 0.40 (0.75 if named artifact) |
| `MISC` | `FULL`, `DESC` | `{Mod}_Items` | 0.40 |
| `ACTI` | `FULL`, `DESC` | `{Mod}_Displays` | 0.40 (display plaques, static props) |
| `QUST` | `FULL`, journal `CNAM` entries | `{Mod}_Quests` | 0.75 |
| `LCTN` | `FULL`, keywords | `{Mod}_Locations` | 0.50 → `type: LOCATION` |
| `NPC_` | `FULL`, class, faction | `{Mod}_NPCs` | 0.40 |
| `SPEL` | `FULL`, `DESC` | `{Mod}_Spells` | 0.50 → `type: SKILL` |
| `DIAL` + `INFO` | named-NPC dialogue topics | `{Mod}_Dialogue` | 0.40 |
**Always use xEdit's `WinningOverride`** so we get the active version of each
record under the full load order. Filter `GetLoadOrderFormID` against the
target plugin's FormID range so we only emit records *originating* in the
mod, not every vanilla record it touches.
## sknpack format (reminder)
See `../oghma-sknpack/README.md` for the full envelope. Each entry is flat:
```json
{
"content": "...narrative prose, cleaned of markup...",
"display_name": "stable_editor_id_or_slug",
"type": "KNOWLEDGE | SKILL | LOCATION",
"importance": 0.75,
"location": "",
"tags": [],
"emotion": "",
"always_inject": false,
"condition_expr": ""
}
```
Pack envelope: `skyrimnet_knowledge_pack` with `name`, `description`,
`author`, `version`, `format_version: 1`, `exported_at`, `entries[]`,
`entry_count`, `npc_groups: []`.
## Importance grading (carries over from Oghma)
- **0.75** — Authored narrative: book text, quest journals, named-artifact
backstory. The "scholar" tier.
- **0.50** — Locations, spells, visual descriptions.
- **0.40** — Generic items, display plaques, unnamed NPCs. The "commoner"
tier — provides scaffolding without dominating token budget.
Where a single topic has both authored and generic variants (e.g. a named
weapon has both its lore-book entry and a terse display plaque), emit
*both* — SkyrimNet's ranking picks the right depth for the NPC.
## Pack split convention
```
Nimmersky_-_{Mod}_-_{Category}.sknpack
```
Examples for Legacy of the Dragonborn:
- `Nimmersky_-_LOTD_-_Displays.sknpack` — per-relic display descriptions
- `Nimmersky_-_LOTD_-_Books.sknpack` — Explorer's Guide series, Auryen's
journals, in-game books
- `Nimmersky_-_LOTD_-_Museum.sknpack` — one LOCATION entry per wing (Hall
of Heroes, Dragonborn Hall, Daedric, Library, Safehouse, Natural Science,
Dwemer, Guildhouse, Hall of Lost Empires, Airship), all with
`location="Solitude"`
- `Nimmersky_-_LOTD_-_Quests.sknpack` — quest journals (Auryen's commission
lines, relic-hunt objectives)
## Mod-specific design notes
### Legacy of the Dragonborn (LOTD)
- Museum wings are the natural chunking unit for LOCATION entries — LOTD
tags them via keywords and has dedicated location records.
- The Explorer's Guide books are pure gold (0.75) — long-form authored
narrative about relic backstories.
- Display plaques are mostly terse ("Blade of Woe — dagger once carried
by the Dark Brotherhood assassin Astrid"). These go at 0.40.
### Vigilant, Wyrmstooth, Beyond Skyrim
- Larger quest arcs — QUST dumping becomes more valuable than for LOTD.
- Beyond Skyrim: Bruma has whole LCTN hierarchies to preserve (province
→ hold → settlement → dungeon).
## Open questions
- **Deduplication against vanilla Oghma.** A mod may rename or re-describe
a vanilla artifact. Do we suppress the Oghma entry when the mod overrides
it, or emit both and let importance grading sort it out?
- **`condition_expr` gating.** SkyrimNet supports quest-gated visibility
(e.g. Auryen only knows about Relic X after quest Y). Phase 2 of Oghma
intends to use this. For mods, do we hand-author conditions or infer them
from QUST stage references in the dump?
- **Cross-mod entanglement.** A patch plugin (e.g. LOTD Patches) may carry
records that belong *logically* to the parent mod. Decide: filter by
originating FormID only, or include overrides that enrich parent-mod
content?
- **Book HTML quirks.** Skyrim books embed `<p>`, `<br>`, `<font>`,
`<pre>`, and occasional `[pagebreak]` markers. Need a tested sanitizer
that preserves paragraph structure but strips all presentation.
## Not yet decided
- Whether to go via ChromaDB (matches Oghma architecture — ingestion into
iris-dev, then `export_packs.py`-style exporter) or straight to
`.sknpack` from the JSONL (simpler, no RAG search over mod content).
- Whether to generate two importance tiers per topic by running an LLM
summarization step (Qwen3.5-27B on theia was flagged for Oghma Phase 2).
---
**Version:** 0.1 | **Created:** 2026-04-16 | **Updated:** 2026-04-16