Two-stage pipeline (xEdit Pascal dumper → JSONL → Python converter → .sknpack)
to distill mod-specific lore into SkyrimNet knowledge packs alongside the
vanilla Oghma corpus. Sketches target record signatures (BOOK/WEAP/ARMO/ACTI/
QUST/LCTN/NPC_/SPEL/DIAL), importance grading rules, pack-split convention
(Nimmersky_-_{Mod}_-_{Category}), LOTD-specific design, and open questions
around deduplication, condition_expr gating, and book markup sanitization.
Version 0.1 — design doc, not spec.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
157 lines
6.5 KiB
Markdown
157 lines
6.5 KiB
Markdown
# mod-to-sknpack — Extract Mod Lore into SkyrimNet Knowledge Packs
|
|
|
|
A pipeline that reads record data out of specific Skyrim mods via xEdit, then
|
|
distills it into `.sknpack` files that slot in alongside the Oghma vanilla
|
|
corpus. Each mod gets its own knowledge pack family so NPCs modded into the
|
|
game can speak about content the vanilla Oghma distillation never saw.
|
|
|
|
## Why
|
|
|
|
`oghma-sknpack/` covers vanilla Tamriel lore beautifully but knows nothing
|
|
about what a modded load order actually adds. A playthrough with Legacy of
|
|
the Dragonborn, Vigilant, Wyrmstooth, or Beyond Skyrim puts hundreds of
|
|
authored books, quest journals, unique artifacts, and named locations into
|
|
the game world. Without mod-specific knowledge packs, SkyrimNet NPCs will
|
|
either hallucinate about these or refuse to engage.
|
|
|
|
The authored narrative content is already *in the plugin files* — we just
|
|
need to lift it out, clean it, categorize it, and grade it. xEdit gives us
|
|
deterministic record access; the `.sknpack` format is plain JSON; the
|
|
glue between them is this project.
|
|
|
|
## Pipeline (two stages)
|
|
|
|
```
|
|
Mod plugin (.esp / .esm) in load order
|
|
│ xEdit Pascal dumper script
|
|
▼
|
|
mod-dump.jsonl (one record per line)
|
|
│ Python converter (sknpack-from-dump.py)
|
|
▼
|
|
Nimmersky_-_{Mod}_-_{Category}.sknpack
|
|
│ SkyrimNet UI → Knowledge Packs → Import
|
|
▼
|
|
In-game NPC knowledge
|
|
```
|
|
|
|
**Why two stages:** xEdit's Pascal is painful for JSON escaping and for
|
|
cleaning book-body markup (`<p>`, `<br>`, `<font face='$HandwrittenFont'>`).
|
|
Python handles those trivially. The JSONL intermediate is also *inspectable*
|
|
— we can eyeball what xEdit actually found before any transformation runs.
|
|
|
|
## Target record signatures
|
|
|
|
| Signature | Field(s) extracted | Pack category | Default importance |
|
|
|---|---|---|---|
|
|
| `BOOK` | `FULL`, `DESC` (body) | `{Mod}_Books` | 0.75 (authored narrative) |
|
|
| `WEAP` | `FULL`, `DESC` | `{Mod}_Weapons` | 0.40 (0.75 if named artifact) |
|
|
| `ARMO` | `FULL`, `DESC` | `{Mod}_Armor` | 0.40 (0.75 if named artifact) |
|
|
| `MISC` | `FULL`, `DESC` | `{Mod}_Items` | 0.40 |
|
|
| `ACTI` | `FULL`, `DESC` | `{Mod}_Displays` | 0.40 (display plaques, static props) |
|
|
| `QUST` | `FULL`, journal `CNAM` entries | `{Mod}_Quests` | 0.75 |
|
|
| `LCTN` | `FULL`, keywords | `{Mod}_Locations` | 0.50 → `type: LOCATION` |
|
|
| `NPC_` | `FULL`, class, faction | `{Mod}_NPCs` | 0.40 |
|
|
| `SPEL` | `FULL`, `DESC` | `{Mod}_Spells` | 0.50 → `type: SKILL` |
|
|
| `DIAL` + `INFO` | named-NPC dialogue topics | `{Mod}_Dialogue` | 0.40 |
|
|
|
|
**Always use xEdit's `WinningOverride`** so we get the active version of each
|
|
record under the full load order. Filter `GetLoadOrderFormID` against the
|
|
target plugin's FormID range so we only emit records *originating* in the
|
|
mod, not every vanilla record it touches.
|
|
|
|
## sknpack format (reminder)
|
|
|
|
See `../oghma-sknpack/README.md` for the full envelope. Each entry is flat:
|
|
|
|
```json
|
|
{
|
|
"content": "...narrative prose, cleaned of markup...",
|
|
"display_name": "stable_editor_id_or_slug",
|
|
"type": "KNOWLEDGE | SKILL | LOCATION",
|
|
"importance": 0.75,
|
|
"location": "",
|
|
"tags": [],
|
|
"emotion": "",
|
|
"always_inject": false,
|
|
"condition_expr": ""
|
|
}
|
|
```
|
|
|
|
Pack envelope: `skyrimnet_knowledge_pack` with `name`, `description`,
|
|
`author`, `version`, `format_version: 1`, `exported_at`, `entries[]`,
|
|
`entry_count`, `npc_groups: []`.
|
|
|
|
## Importance grading (carries over from Oghma)
|
|
|
|
- **0.75** — Authored narrative: book text, quest journals, named-artifact
|
|
backstory. The "scholar" tier.
|
|
- **0.50** — Locations, spells, visual descriptions.
|
|
- **0.40** — Generic items, display plaques, unnamed NPCs. The "commoner"
|
|
tier — provides scaffolding without dominating token budget.
|
|
|
|
Where a single topic has both authored and generic variants (e.g. a named
|
|
weapon has both its lore-book entry and a terse display plaque), emit
|
|
*both* — SkyrimNet's ranking picks the right depth for the NPC.
|
|
|
|
## Pack split convention
|
|
|
|
```
|
|
Nimmersky_-_{Mod}_-_{Category}.sknpack
|
|
```
|
|
|
|
Examples for Legacy of the Dragonborn:
|
|
|
|
- `Nimmersky_-_LOTD_-_Displays.sknpack` — per-relic display descriptions
|
|
- `Nimmersky_-_LOTD_-_Books.sknpack` — Explorer's Guide series, Auryen's
|
|
journals, in-game books
|
|
- `Nimmersky_-_LOTD_-_Museum.sknpack` — one LOCATION entry per wing (Hall
|
|
of Heroes, Dragonborn Hall, Daedric, Library, Safehouse, Natural Science,
|
|
Dwemer, Guildhouse, Hall of Lost Empires, Airship), all with
|
|
`location="Solitude"`
|
|
- `Nimmersky_-_LOTD_-_Quests.sknpack` — quest journals (Auryen's commission
|
|
lines, relic-hunt objectives)
|
|
|
|
## Mod-specific design notes
|
|
|
|
### Legacy of the Dragonborn (LOTD)
|
|
- Museum wings are the natural chunking unit for LOCATION entries — LOTD
|
|
tags them via keywords and has dedicated location records.
|
|
- The Explorer's Guide books are pure gold (0.75) — long-form authored
|
|
narrative about relic backstories.
|
|
- Display plaques are mostly terse ("Blade of Woe — dagger once carried
|
|
by the Dark Brotherhood assassin Astrid"). These go at 0.40.
|
|
|
|
### Vigilant, Wyrmstooth, Beyond Skyrim
|
|
- Larger quest arcs — QUST dumping becomes more valuable than for LOTD.
|
|
- Beyond Skyrim: Bruma has whole LCTN hierarchies to preserve (province
|
|
→ hold → settlement → dungeon).
|
|
|
|
## Open questions
|
|
|
|
- **Deduplication against vanilla Oghma.** A mod may rename or re-describe
|
|
a vanilla artifact. Do we suppress the Oghma entry when the mod overrides
|
|
it, or emit both and let importance grading sort it out?
|
|
- **`condition_expr` gating.** SkyrimNet supports quest-gated visibility
|
|
(e.g. Auryen only knows about Relic X after quest Y). Phase 2 of Oghma
|
|
intends to use this. For mods, do we hand-author conditions or infer them
|
|
from QUST stage references in the dump?
|
|
- **Cross-mod entanglement.** A patch plugin (e.g. LOTD Patches) may carry
|
|
records that belong *logically* to the parent mod. Decide: filter by
|
|
originating FormID only, or include overrides that enrich parent-mod
|
|
content?
|
|
- **Book HTML quirks.** Skyrim books embed `<p>`, `<br>`, `<font>`,
|
|
`<pre>`, and occasional `[pagebreak]` markers. Need a tested sanitizer
|
|
that preserves paragraph structure but strips all presentation.
|
|
|
|
## Not yet decided
|
|
|
|
- Whether to go via ChromaDB (matches Oghma architecture — ingestion into
|
|
iris-dev, then `export_packs.py`-style exporter) or straight to
|
|
`.sknpack` from the JSONL (simpler, no RAG search over mod content).
|
|
- Whether to generate two importance tiers per topic by running an LLM
|
|
summarization step (Qwen3.5-27B on theia was flagged for Oghma Phase 2).
|
|
|
|
---
|
|
|
|
**Version:** 0.1 | **Created:** 2026-04-16 | **Updated:** 2026-04-16
|