Writing for operators

Routing, learning, and proof for OpenClawBrain.

This blog explains OpenClawBrain as it actually ships: tight OpenClaw integration, fast local startup, continuous background learning, and a learned runtime route_fn trained with Ultimate Policy Gradient from scanner/harvester labels, async teacher labels, and user feedback. The proof boundary is explicit: simulations prove mechanism, not full product performance.

Read the current series Open the proof package Read the paper OpenClaw integration guide Worked example

OpenClaw-native memory Local hot path Continuous background training No fake benchmark wins

Read first

The shortest path from first impression to actual rollout and verification.

Proof package

What is proven now vs what still needs stronger evidence

One operator-facing hub for mechanism proof, the evidence ladder, and the artifact checklist behind each claim.

Paper + PDF

The current paper, easy to find

The canonical paper route now packages the current PDF, paper metadata, and the supporting links that keep it aligned with the live proof boundary.

Operator docs

OpenClaw integration

Brain-first mode, fail-open wiring, bounded prompt context, and the continuous training loop.

Quick start

Use the brain first, train it second

Start with a workspace build and daemon query, then add replay, harvest, and maintenance in the background.

Concrete trace

One OpenClaw turn, fully traced

Inbound summary, stable chat_id, bounded [BRAIN_CONTEXT], same-turn correction capture, and what only changes after replay and redeploy.

Raw proof doc

Reproduce-eval command path

The underlying markdown contract for simulations, fixed-session eval, shadow comparisons, and figure generation.

Current series: v12.2.6+

The current series explains the product direction end to end, with the learned runtime router and proof boundary stated explicitly.

Post 1

Shadow routing + Ultimate Policy Gradient

Why the live path stays local while the learning path collects labels and updates the router asynchronously.

Post 2

The learned runtime `route_fn`

How `graph_prior` and query-conditioned signals combine into the policy that actually runs during a live query.

Post 3

Evaluation: prove it without pretending

What simulations prove, what they do not, and what stronger evidence should look like.

Post 4

Local-first rollout

Default stack, fast startup, and how to keep replay and teacher labeling off the hot path.

Post 5

Baselines and the comparison contract

The right baseline set for OpenClawBrain and the artifact discipline required for any claim.

Post 6

Brain-first OpenClaw integration

What changes when the brain is on, what stays fail-open, and how feedback ties back to the fired route.

Legacy notes

Earlier posts are still useful for release history, but the current positioning is the v12.2.6+ series plus the operator docs.

March 2026

v12.2.6 summary

A concise release framing for shadow learning, the runtime route policy, and the OpenClaw-first story.

February 2026

REINFORCE update correction

Historical release notes on the policy-gradient correction that led into the current routing story.