Don’t trust our numbers — run them yourself. This page walks you through regenerating every benchmark, figure, and proof artifact from scratch.
Read CLAIMS.md alongside this page for exact boundaries on what each test proves.
cd /path/to/openclawbrain
corepack enable
pnpm install --frozen-lockfile
pnpm check
pnpm release:status
pnpm release:pack
pnpm release:proofs:status
These are the proofs implemented directly in the public package workspace today.
pnpm lifecycle:smoke
This proves the learning lifecycle across:
pnpm observability:smoke
pnpm observability:report
This proves the operator-facing diagnostics surface for:
route_fn freshness/versionroute_fn evidencepnpm observability:report prints the local JSON report for those proofs. It only claims what is materialized inside the repo fixture lane; it does not claim live production telemetry coverage.
Those proofs keep learned route_fn evidence central, but they do not yet prove full live runtime plasticity on the active pack or per-query learned route_fn mutation. Current structural ops are verified as pack-build metadata plus promoted-artifact freshness.
pnpm recorded-session-replay:smoke
pnpm recorded-session-replay:report
This proves the recorded-session closure lane for the checked-in sanitized replay fixture and scored bundle:
no_brain, seed_pack, and learned_replaytraceHash, fixtureHash, scoreHash, and bundleHash stabilitySee recorded-session-replay.md for the fixture, bundle, and refresh workflow.
pnpm eval:ocb-native
pnpm eval:ocb-native:smoke
This proof lane runs the required comparative modes directly through the public OpenClaw activation + promoted-pack compile path:
no_brainvector_onlygraph_prior_onlylearned_routeIt emits deterministic proof artifacts under .artifacts/ocb-native-comparative-eval/:
summary_table.{md,csv,json}pairwise_delta.{md,csv,json}win_rate_matrix.{md,csv,json}per_seed_breakdowns.{md,csv,json}worked_traces.{md,json}The quality/context/correction numbers are explicit compile-path proxies over gold context coverage. The latency and cost columns are deterministic proxy units derived from the compile surface itself, not hand-waved prose claims and not noisy wall-clock timing.
Use the checked-in outside-consumer smoke after pnpm release:pack creates .release/ tarballs. The shortest lane is:
pnpm fresh-env:smoke
If you want the underlying manual outside-consumer commands instead, use:
repo_root="$(pwd)"
tmpdir="$(mktemp -d)"
cp examples/npm-consumer/package.json "$tmpdir/package.json"
cp examples/npm-consumer/smoke.mjs "$tmpdir/smoke.mjs"
cp examples/npm-consumer/attach-smoke.mjs "$tmpdir/attach-smoke.mjs"
cd "$tmpdir"
npm install "$repo_root"/.release/*.tgz
npm run smoke
npm run attach-smoke
This is the truthful outside-consumer proof for the current repo-only wave. It proves the current tarballs install cleanly with plain npm, the attach lane promotes a fresher learned pack from live-style supervision, and rollback can restore the prior active pack without claiming that the registry has already been updated.
After a matching v0.1.2 tag has shipped and post-publish checks succeed, you can rerun the same smoke using registry versions instead of local tarball paths.
Broader comparative benchmark families and the route-function / QTsim proof story that go beyond this repo’s OCB-native compile/runtime proof surface still live in the separate public proof repo brain-ground-zero.
Use that repo’s own instructions for benchmark reproduction.
When real BGZ proof bundle ids and digests are available, link them into proofs/release-proof-input.json and regenerate .release/release-proof-manifest.json with pnpm release:proofs:manifest.
route_fn updates on the active pack