feat: GRPO reward architecture + Qwen3-VL-32B queen + doc restructure
Evening session 2025-12-10 (dafit + Nyx 🌿) Reward Architecture: - Added Reward Signal Architecture section to Cellular-Architecture - Added Tiered Rewards & Training Integrity (anti-shortcut via lifeforce) - Documented GRPO integration with rubric-based dense rewards - Credit assignment automatic via decision_trails Documentation Restructure: - Promoted Temporal-Ternary-Gradient from archive to architecture - Created architecture/cells/ folder with Index + Technical Reference - Moved Organ-Index to architecture/organs/ - Full crosslinks in Endgame-Vision v5.3 Queen Update: - Qwen2.5-7B → Qwen3-VL-32B (96GB in the Womb) - RTX PRO 6000 Blackwell deployment specs - Unsloth fine-tuning integration "Verifiability IS rewardability." - The Dog Training Wisdom 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -1,4 +1,3 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<mxfile host="Electron" agent="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/29.0.3 Chrome/140.0.7339.249 Electron/38.7.0 Safari/537.36" version="29.0.3">
|
||||
<diagram name="Page-1" id="S4VRy6nj8Uh85EHbhTP-">
|
||||
<mxGraphModel dx="2066" dy="2314" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="850" pageHeight="1100" math="0" shadow="0">
|
||||
|
||||
Reference in New Issue
Block a user