OpenClawBrain
How it works

Remember what matters. Check whether it still applies.

OpenClawBrain gives agents a simple habit: carry useful context forward, but do not let old memory silently overpower the current user, current project, or current environment.

Plain English: find candidate memory, decide whether it has authority, use a tiny helpful slice, then leave proof.

The runtime loop

The top-level behavior is intentionally small. The agent should not feel like it is dragging a giant notes file into every turn.

user turn
 ↓
route-policy-v3: should memory help?
 ↓
search: SQLite FTS + memory graph
 ↓
authority: current, scoped, safe, non-superseded?
 ↓
select: choose a tiny relevant evidence slice
 ↓
inject: bounded XML context or abstain
 ↓
learn: update evidence after the turn

Prompt-time work stays small. Heavier teaching, replay, and policy updates happen after the turn when they can.

The learning warehouse

OpenClawBrain stores memory and route-learning evidence as local SQLite records, not as a blob of transcript text. The important objects are small and inspectable.

Memory nodes

Durable corrections, preferences, workflows, project rules, and contextual facts with scope and freshness metadata.

Authority rows

Validity state and authority events explain why relevant memories were injected, weakened, verified, confirmed, suppressed, or withheld.

Route frames

Redacted examples of when memory helped, missed, abstained, or should have routed differently.

Shadow decisions

Candidate policies score real turns without taking over, so promotion is based on evidence instead of hope.

Eval cases

Replayable cases, labels, family stats, and candidate reports keep policy updates auditable.

What gets stored

The warehouse is designed around evidence, not private transcript replay. Route-policy-v3 stores compact objects that explain routing behavior without making raw chat the source of truth.

Why the LLM boundary matters

The LLM is used for semantic judgment. It can notice that a messy correction is actually durable feedback, or that a failed route is a training example.

But the LLM does not directly own memory or production routing. Code redacts, validates, scopes, dedupes, thresholds, stores, replays, calibrates, and promotes only what passes.

route-policy-v3

The active production route brain is route-policy-v3. It is a compact learned route_fn compiled from evidence, not a vague prompt that asks the model to “remember better.”

active v3 match with calibrated confidence
  -> route memory and record proof

v3 abstains or has no safe match
  -> fall back to v2 policy

v2 misses or rollback is needed
  -> use legacy heuristics as last resort

Abstention is a feature. If the system does not have enough support, it stays quiet instead of polluting the prompt.

How policies improve

1. Harvest

Turns produce signals: corrections, accepted help, rejected memory, route misses, tool outcomes, and operator handoffs.

2. Distill

The teacher converts those signals into route frames with action family, scope, canonical action key, risk flags, and evidence ids.

3. Shadow

Candidate snapshots score traffic without serving. The system records what they would have done.

4. Replay

Stored eval cases measure whether the candidate improves useful routing without causing prompt noise or unsafe sync work.

5. Calibrate

Family-aware thresholds and confidence floors decide when a route is strong enough to serve.

6. Promote or roll back

Promotion creates an active v3 snapshot with lineage. Rollback keeps the previous policy path available.

Promotion gates

OpenClawBrain 0.2.33 defaults policy-v3 updates to gated_active. That means the system can collect, distill, shadow, replay, and activate a validated snapshot, but each step has explicit gates.

Validation

Candidate snapshots must parse, match schema, avoid unsafe broad retrieval, respect sync budget limits, and carry rollback lineage.

Support

Cold-start action families remain under-supported until enough evidence exists; they can inform shadowing but cannot drive active routing too early.

Calibration

Confidence is judged by action family. A policy can be strong for repo workflow routing while still abstaining on personal preference routing.

Cooldown

Activation cooldowns prevent rapid policy churn. If a candidate is weaker or risky, the current active snapshot stays in place.

Failure behavior

The runtime is intentionally conservative. If the learned route_fn cannot make a supported decision, it does less.

policy snapshot missing or invalid
  -> skip v3 and use fallback path

family confidence below floor
  -> abstain and keep prompt clean

candidate shows harm in replay
  -> do not promote

active policy regresses
  -> roll back to prior snapshot, then v2 or heuristics

What gets injected

Only the selected evidence slice enters the prompt. It is bounded, scoped, and formatted as operator context, not hidden transcript mass.

<openclawbrain_context>
Relevant memory:
- Must follow: Use pnpm instead of npm in this repo.
- Routing rule: Keep Telegram replies to operator summaries.
</openclawbrain_context>

The rest stays local in SQLite and proof surfaces.