Embedding retrieval pilot (vector search vs `memory.list`)
Goal: Replace wide memory.list scans with top-k semantic retrieval for one workflow at a time.
Embedding retrieval pilot (vector search vs memory.list)
Goal: Replace wide memory.list scans with top-k semantic retrieval for one workflow at a time.
Prerequisites
- SQLite memory (
AINL_MEMORY_DB) with rows in the namespace you index (AINL_EMBEDDING_INDEX_NAMESPACE, defaultworkflow). - Sidecar DB:
AINL_EMBEDDING_MEMORY_DB(default/tmp/ainl_embedding_memory.sqlite3). AINL_EMBEDDING_MODE: start withstubfor CI/dry runs; useopenaifor real vectors; uselocalfor dependency-free offline top-k (hashing-based; rough relevance, not model-semantic).
What gets indexed (so vector search actually returns text)
embedding_workflow_index embeds the SQLite memory record payload (not just metadata).
For session bootstrap vector retrieval, this repo’s proactive session summarizer writes the actual bullet summary text into payload.summary for workflow.session_summary records, so embedding_workflow_search can return payload_snapshot.summary for use in token-aware startup.
Operator commands
# Dry-run wrapper (compiles + exercises bridge path)
python3 openclaw/bridge/run_wrapper_ainl.py embedding-memory-pilot --dry-run
Bridge verbs (from BridgeTokenBudgetAdapter): embedding_workflow_index, embedding_workflow_search.
Enable vector search for session bootstrap (optional; safe fallback)
- Set real embedding mode:
AINL_EMBEDDING_MODE=openai
- Run the pilot indexer at least once (so the embedding sidecar has refs):
python3 openclaw/bridge/run_wrapper_ainl.py embedding-memory-pilot
- Ensure the profile enables the startup embedding path:
AINL_STARTUP_USE_EMBEDDINGS=1(already set inopenclaw-default/cost-tightprofiles)
When real vectors are enabled, token_aware_startup_context.lang will try embedding top-k first; if hits are empty, it falls back to reading MEMORY.md (so it shouldn’t break chat sessions).
Safe rollout
- Index a bounded batch (
embedding_workflow_indexlimit defaults apply). - Query with a fixed prompt template; inspect
hitspayloads before feeding an LLM. - Swap one call site from
memory.list→ search +memory.getfor returned refs only. - Measure prompt token delta before expanding scope.
“Embed on write” (in-lane pattern for high-signal kinds)
AINL does not rewrite workflows at runtime, but you can keep retrieval quality high by indexing right after writes for specific record kinds:
- When
proactive_session_summarizer.langwritesworkflow.session_summary(withpayload.summary), ensureembedding_workflow_indexruns on a cadence (weekly is the pilot default; increase if you need fresher retrieval). - For other high-signal records (decisions, preferences, settings), write them into SQLite
memoryunder a stable kind/id, then index that namespace/kind on a schedule using the same bridge verb.
If you want a reusable include for “search → lines,” use modules/common/vector_retrieval.ainl (Call vec/VEC_SEARCH or vec/VEC_LINES) and keep k=3–5 unless evidence says otherwise.
See TOKEN_AND_USAGE_OBSERVABILITY.md and openclaw/bridge/README.md.
