OpenClawBrain v12.2.6+
- Title: Shadow Routing with QTsim Confidence Mixing and Unified Policy Learning
- Author: Jonathan Gu
- Canonical route: /paper/
- Direct artifact: /openclawbrain.pdf
- Source: openclawbrain.tex
This is the human-facing home for the current OpenClawBrain paper. It packages the March 2, 2026 PDF with version metadata, the core research framing, and the links that keep the paper aligned with the live proof boundary on the site.
Evidence boundary: the deterministic workflow-proof slice is live and reproducible now. Recorded-session, shadow-mode, and narrow online proof are still future work, so the paper should be read alongside /proof/.
/openclawbrain.pdf
The paper is the longer technical framing behind the current site. These are the core claims it tries to organize, without turning mechanism proof into online proof.
The central operating shape is a strict split: local bounded routing on live OpenClaw turns, then replay, labeling, and policy updates later off the hot path.
graph_prior plus QTsim confidence mixingThe runtime policy combines durable structure with query-conditioned fit, then uses uncertainty features like entropy and margin to decide how much each signal should matter per decision.
Teacher distillation and policy-gradient updates are treated as one learning loop with an authority order: human corrections first, then self-learning outcomes, harvested signals, and async-teacher labels.
The paper also frames graph maintenance as a control problem, with shipped Phase 2a hooks now and future connect, split, and merge actions behind conservative guardrails.
The paper is easier to evaluate when it is paired with the packaged proof, the current narrative, and the underlying materials.
Mechanism proof, the evidence ladder, and the claim boundary that keeps the site from overclaiming beyond current artifacts.
The shortest path through shadow routing, the learned runtime router, operator rollout, and the evaluation contract.
Reproduce-eval steps, figure links, GitHub references, and the supporting material list for the current site.
Brain-first mode, fail-open wiring, bounded prompt context, and the background learning loop that serves later turns.
Canonical human-facing paper route: /paper/. Direct artifact route: /openclawbrain.pdf.