Canonical paper route

Paper, PDF, and supporting materials.

This is the human-facing home for the current OpenClawBrain paper. It packages the March 2, 2026 PDF with version metadata, the core research framing, and the links that keep the paper aligned with the live proof boundary on the site.

Open the PDF Review the proof package Read the current series Supporting materials

Evidence boundary: the deterministic workflow-proof slice is live and reproducible now. Recorded-session, shadow-mode, and narrow online proof are still future work, so the paper should be read alongside /proof/.

Version v12.2.6+ Jonathan Gu March 2, 2026 Direct PDF: /openclawbrain.pdf

Current artifact

OpenClawBrain v12.2.6+

Title: Shadow Routing with QTsim Confidence Mixing and Unified Policy Learning
Author: Jonathan Gu
Canonical route: /paper/
Direct artifact: /openclawbrain.pdf
Source: openclawbrain.tex

Read it honestly

Mechanism proof is live. Full product proof is not.

The paper covers the hot/cold path split, route signals, confidence mixing, and unified policy learning.
/proof/ shows what is actually proven now and what still needs stronger evidence.
/blog/v12.2.6-series/ explains how the product story and rollout path fit together.

What the paper covers

The paper is the longer technical framing behind the current site. These are the core claims it tries to organize, without turning mechanism proof into online proof.

System split

Hot path local, cold path asynchronous

The central operating shape is a strict split: local bounded routing on live OpenClaw turns, then replay, labeling, and policy updates later off the hot path.

Route signals

`graph_prior` plus QTsim confidence mixing

The runtime policy combines durable structure with query-conditioned fit, then uses uncertainty features like entropy and margin to decide how much each signal should matter per decision.

Learning rule

Unified policy learning

Teacher distillation and policy-gradient updates are treated as one learning loop with an authority order: human corrections first, then self-learning outcomes, harvested signals, and async-teacher labels.

Maintenance path

RL-native graph maintenance

The paper also frames graph maintenance as a control problem, with shipped Phase 2a hooks now and future connect, split, and merge actions behind conservative guardrails.

Paper, PDF, and supporting materials.

OpenClawBrain v12.2.6+

Mechanism proof is live. Full product proof is not.

What the paper covers

Hot path local, cold path asynchronous

`graph_prior` plus QTsim confidence mixing

Unified policy learning

RL-native graph maintenance

Read next

What is proven now

Why the product is framed this way

Docs, figures, and repository references

OpenClaw integration

Paper, PDF, and supporting materials.

OpenClawBrain v12.2.6+

Mechanism proof is live. Full product proof is not.

What the paper covers

Hot path local, cold path asynchronous

graph_prior plus QTsim confidence mixing

Unified policy learning

RL-native graph maintenance

Read next

What is proven now

Why the product is framed this way

Docs, figures, and repository references

OpenClaw integration

`graph_prior` plus QTsim confidence mixing