Why useful agent memory is about judgment, not recall.
OpenClawBrain began as a graph-first memory idea. The hard lesson was that retrieval is not enough. The system became a memory authority layer: local evidence, scoped memories, learned routing, stale-memory handling, and proof that explains why memory was used or kept quiet.
Memory quality is an authority problem before it is a storage problem.
The graph matters. Search matters. But the product behavior comes from deciding whether remembered context still has permission to guide the current turn.
A local learning loop around an AI agent.
OpenClawBrain records scoped evidence, builds a local graph, learns compact route policies, and injects only the memories that clear routing, relevance, and safety gates.
Local evidence graph
SQLite stores memory nodes, typed edges, full-text search, injection rows, route decisions, proof events, and route-policy-v3 learning tables.
Learned route function
The active route_fn decides whether memory should be searched, injected, omitted, or abstained from for the current turn.
Proof-first operations
Captures, rejections, retrievals, injections, teacher critiques, candidate reports, and rollbacks leave inspectable evidence.
The architecture changed because the first versions taught the wrong lessons.
The repo history moved from Python simulation and eval scaffolding into a native OpenClaw plugin, then into SQLite graph memory, route learning, and route-policy-v3 production serving.
Brain Ground Zero tested graph memory, route functions, recurring workflows, relational drift, sparse feedback, and RAG baselines.
Trace admission, judges, ledgers, thresholds, and result pages made evidence rigorous, but not yet installable.
OpenClaw hooks, status routes, and static injection proved the runtime path. A classifier plus notes was not enough.
Memory nodes, edges, FTS, route decisions, injection rows, audits, and proof events gave the system a durable substrate.
Shadow decisions, replay, calibration, candidate reports, champion/challenger promotion, abstention, and rollback became production routing.
The v3 loop separates runtime serving from slower learning.
Prompt-time behavior stays compact. Heavier semantic work happens after the turn, where it can be audited, replayed, and promoted without making every prompt slower.
The rejected mental models are the real guide.
OpenClawBrain improved when the design stopped treating memory as a bigger notes file and started treating it as a gated, measured route decision.
The graph is the product
Wrong shape. The graph is the evidence substrate. The route function creates product behavior by deciding when the graph should matter.
More context is safer
Wrong incentive. More context can distract, stale, slow, or conflict. The best memory injection is tiny and timed.
Capture every useful thing
Too loose. Automatic capture needs source authority, scope, redaction, dedupe, proof rows, and rejection records.
The LLM can own memory
Too risky. The LLM proposes meaning; code validates, scopes, stores, replays, promotes, and rolls back.
A clean plan is enough
Not in production. The system only advanced when each plan produced a route, table, proof, test, package, install, endpoint, or live page.
A benchmark ships the product
No. Simulation proved the mechanism. The native plugin still had to solve hooks, latency, packaging, scanning, install paths, and inspection.
SQLite stores the graph and the evidence needed to debug it.
The local database is not a hosted graph service. It is an inspectable runtime file with tables for memory, search, route decisions, proof, and learning.
| Plane | Artifacts | Why it exists |
|---|---|---|
| Memory graph | memory_nodes, memory_edges, memory_search | Scoped facts, corrections, workflows, supersession, typed relationships, and FTS retrieval. |
| Serving proof | memory_injections, route_decisions, proof_events | Records what was retrieved, injected, omitted, accepted, rejected, or abstained from. |
| Distillation | distillation_runs, audit rows, validated operations | Keeps LLM proposals separate from code-owned memory writes. |
| v3 learning | route frames, shadow decisions, calibration examples, eval cases, family stats, candidate reports | Lets candidate policies learn from positive examples, negative examples, and correct silence before promotion. |
| Rollback | policy snapshots, lineage, fallback status | Allows v3 to serve first while v2 and heuristics remain rollback paths. |
The visible memory should be small. The evidence behind it should be rich.
A durable correction should become a scoped memory, not a transcript paste. Later it should appear only when the route function expects it to help.
User correction
Actually, use pnpm in this repo.
OpenClawBrain detects a high-confidence user correction, redacts and scopes it to the project, stores a memory node, updates FTS, and records proof.
Later prompt-time injection
<openclawbrain_context> Relevant memory: - Must follow: Use pnpm instead of npm in this repo. </openclawbrain_context>
The user sees only the useful part. The system keeps route decisions, selected IDs, omitted IDs, proof rows, outcome resolution, and learning examples.
Prompt-time memory stays boring on purpose.
Memory should not tax every turn. The route function keeps serving compact while background work handles deeper labeling, replay, and promotion.
Tier 0
Local route decision only. No model call. Often the answer is no memory.
Tier 1
Cached route plus SQLite search. Still local and bounded.
Tier 2
One limited planner call for ambiguous high-signal turns.
Tier 3
Background distillation, teacher critique, replay, calibration, and promotion.
Fallback
v3 serves first, v2 can roll back, heuristics are last resort.
Abstention
Correct silence is a production behavior, not a crash path.
How to ask agents to build systems like this.
The project made progress when prompts demanded evidence surfaces, install-path proof, and explicit failure handling instead of abstract smartness.
Ask for the first proof row
Do not start with "ultimate memory." Start with the first hook, route, schema, proof row, test, install, and live verification.
Preserve one invariant
For OpenClawBrain: LLM decides semantic meaning; code enforces trust boundaries; SQLite stores graph and evidence.
Request failure tables
Name over-capture, under-capture, stale retrieval, scope leak, prompt pollution, latency, scan, install, and rollback failure.
Separate public claim from ambition
The public claim should lag the private dream until install, proof, and rollback are real.
Force repo-history honesty
Use git log, tags, closeouts, tests, package versions, and live endpoints. Separate what shipped from what was planned.
Define done as live proof
Done means tests pass, package works, temp install succeeds, runtime loads, site deploys, and live URLs contain the new copy.
Current public status is deliberately narrow.
OpenClawBrain is published as package openclawbrain, version 0.2.33, with route-policy-v3 as the production route brain, Memory Authority as the resolver between retrieval and injection, Memory Graph Maintenance as the long-term curator, and an OpenClawBrain-owned Codex Telegram bridge for recent messages, watches, handoffs, trusted bound-thread replies, and active-turn steering.
openclaw plugins install clawhub:openclawbrain@0.2.33 --force
openclaw plugins enable openclawbrain
openclaw gateway restart
openclaw plugins inspect openclawbrain --runtime
openclaw doctor
/brain graph health
/brain graph dry-run
/brain graph proposals
