OpenClawBrain gives agents a simple habit: carry useful context forward, but do not let old memory silently overpower the current user, current project, or current environment.
Plain English: find candidate memory, decide whether it has authority, use a tiny helpful slice, then leave proof.
The top-level behavior is intentionally small. The agent should not feel like it is dragging a giant notes file into every turn.
user turn
↓
route-policy-v3: should memory help?
↓
search: SQLite FTS + memory graph
↓
authority: current, scoped, safe, non-superseded?
↓
select: choose a tiny relevant evidence slice
↓
inject: bounded XML context or abstain
↓
learn: update evidence after the turn
Prompt-time work stays small. Heavier teaching, replay, and policy updates happen after the turn when they can.
OpenClawBrain stores memory and route-learning evidence as local SQLite records, not as a blob of transcript text. The important objects are small and inspectable.
Durable corrections, preferences, workflows, project rules, and contextual facts with scope and freshness metadata.
Validity state and authority events explain why relevant memories were injected, weakened, verified, confirmed, suppressed, or withheld.
Redacted examples of when memory helped, missed, abstained, or should have routed differently.
Candidate policies score real turns without taking over, so promotion is based on evidence instead of hope.
Replayable cases, labels, family stats, and candidate reports keep policy updates auditable.
The warehouse is designed around evidence, not private transcript replay. Route-policy-v3 stores compact objects that explain routing behavior without making raw chat the source of truth.
The LLM is used for semantic judgment. It can notice that a messy correction is actually durable feedback, or that a failed route is a training example.
But the LLM does not directly own memory or production routing. Code redacts, validates, scopes, dedupes, thresholds, stores, replays, calibrates, and promotes only what passes.
The active production route brain is route-policy-v3. It is a compact learned route_fn compiled from evidence, not a vague prompt that asks the model to “remember better.”
active v3 match with calibrated confidence
-> route memory and record proof
v3 abstains or has no safe match
-> fall back to v2 policy
v2 misses or rollback is needed
-> use legacy heuristics as last resort
Abstention is a feature. If the system does not have enough support, it stays quiet instead of polluting the prompt.
Turns produce signals: corrections, accepted help, rejected memory, route misses, tool outcomes, and operator handoffs.
The teacher converts those signals into route frames with action family, scope, canonical action key, risk flags, and evidence ids.
Candidate snapshots score traffic without serving. The system records what they would have done.
Stored eval cases measure whether the candidate improves useful routing without causing prompt noise or unsafe sync work.
Family-aware thresholds and confidence floors decide when a route is strong enough to serve.
Promotion creates an active v3 snapshot with lineage. Rollback keeps the previous policy path available.
OpenClawBrain 0.2.33 defaults policy-v3 updates to gated_active. That means the system can collect, distill, shadow, replay, and activate a validated snapshot, but each step has explicit gates.
Candidate snapshots must parse, match schema, avoid unsafe broad retrieval, respect sync budget limits, and carry rollback lineage.
Cold-start action families remain under-supported until enough evidence exists; they can inform shadowing but cannot drive active routing too early.
Confidence is judged by action family. A policy can be strong for repo workflow routing while still abstaining on personal preference routing.
Activation cooldowns prevent rapid policy churn. If a candidate is weaker or risky, the current active snapshot stays in place.
The runtime is intentionally conservative. If the learned route_fn cannot make a supported decision, it does less.
policy snapshot missing or invalid
-> skip v3 and use fallback path
family confidence below floor
-> abstain and keep prompt clean
candidate shows harm in replay
-> do not promote
active policy regresses
-> roll back to prior snapshot, then v2 or heuristics
Only the selected evidence slice enters the prompt. It is bounded, scoped, and formatted as operator context, not hidden transcript mass.
<openclawbrain_context>
Relevant memory:
- Must follow: Use pnpm instead of npm in this repo.
- Routing rule: Keep Telegram replies to operator summaries.
</openclawbrain_context>
The rest stays local in SQLite and proof surfaces.