openclawbrain-site

Reproduce Evaluation + Proofs

Don’t trust our numbers — run them yourself. This page walks you through regenerating every benchmark, figure, and proof artifact from scratch.

Read CLAIMS.md alongside this page for exact boundaries on what each test proves.

1) Bootstrap the workspace

cd /path/to/openclawbrain
corepack enable
pnpm install --frozen-lockfile
pnpm check
pnpm release:status
pnpm release:pack
pnpm release:proofs:status

2) Reproduce the mechanism proofs in this repo

These are the proofs implemented directly in the public package workspace today.

Lifecycle proof

pnpm lifecycle:smoke

This proves the learning lifecycle across:

Observability proof

pnpm observability:smoke
pnpm observability:report

This proves the operator-facing diagnostics surface for:

pnpm observability:report prints the local JSON report for those proofs. It only claims what is materialized inside the repo fixture lane; it does not claim live production telemetry coverage.

Those proofs keep learned route_fn evidence central, but they do not yet prove full live runtime plasticity on the active pack or per-query learned route_fn mutation. Current structural ops are verified as pack-build metadata plus promoted-artifact freshness.

Recorded-session replay proof

pnpm recorded-session-replay:smoke
pnpm recorded-session-replay:report

This proves the recorded-session closure lane for the checked-in sanitized replay fixture and scored bundle:

See recorded-session-replay.md for the fixture, bundle, and refresh workflow.

OCB-native comparative eval proof

pnpm eval:ocb-native
pnpm eval:ocb-native:smoke

This proof lane runs the required comparative modes directly through the public OpenClaw activation + promoted-pack compile path:

It emits deterministic proof artifacts under .artifacts/ocb-native-comparative-eval/:

The quality/context/correction numbers are explicit compile-path proxies over gold context coverage. The latency and cost columns are deterministic proxy units derived from the compile surface itself, not hand-waved prose claims and not noisy wall-clock timing.

3) Reproduce outside-consumer proof from local release tarballs

Use the checked-in outside-consumer smoke after pnpm release:pack creates .release/ tarballs. The shortest lane is:

pnpm fresh-env:smoke

If you want the underlying manual outside-consumer commands instead, use:

repo_root="$(pwd)"
tmpdir="$(mktemp -d)"
cp examples/npm-consumer/package.json "$tmpdir/package.json"
cp examples/npm-consumer/smoke.mjs "$tmpdir/smoke.mjs"
cp examples/npm-consumer/attach-smoke.mjs "$tmpdir/attach-smoke.mjs"
cd "$tmpdir"
npm install "$repo_root"/.release/*.tgz
npm run smoke
npm run attach-smoke

This is the truthful outside-consumer proof for the current repo-only wave. It proves the current tarballs install cleanly with plain npm, the attach lane promotes a fresher learned pack from live-style supervision, and rollback can restore the prior active pack without claiming that the registry has already been updated.

4) Optional post-publish registry proof

After a matching v0.1.2 tag has shipped and post-publish checks succeed, you can rerun the same smoke using registry versions instead of local tarball paths.

Claim boundary

Proven directly in this repo today

Maintained separately

Broader comparative benchmark families and the route-function / QTsim proof story that go beyond this repo’s OCB-native compile/runtime proof surface still live in the separate public proof repo brain-ground-zero.

Use that repo’s own instructions for benchmark reproduction.

When real BGZ proof bundle ids and digests are available, link them into proofs/release-proof-input.json and regenerate .release/release-proof-manifest.json with pnpm release:proofs:manifest.

Not claimed here