English | Français | Español | Deutsch | Italiano | Português | Nederlands | Polski | Русский | 日本語 | 中文 | العربية | 한국어
Permanent memory for AI agents. Single binary, zero dependencies, MCP native.
ICM gives your AI agent a real memory — not a note-taking tool, not a context manager, a memory.
ICM (Infinite Context Memory) ┌──────────────────────┬─────────────────────────┐ │ MEMORIES (Topics) │ MEMOIRS (Knowledge) │ │ │ │ │ Episodic, temporal │ Permanent, structured │ │ │ │ │ ┌───┐ ┌───┐ ┌───┐ │ ┌───┐ │ │ │ m │ │ m │ │ m │ │ │ C │──depends_on──┐ │ │ └─┬─┘ └─┬─┘ └─┬─┘ │ └───┘ │ │ │ │decay │ │ │ │ refines ┌─▼─┐│ │ ▼ ▼ ▼ │ ┌─▼─┐ │ C ││ │ weight decreases │ │ C │──part_of──>└───┘│ │ over time unless │ └───┘ │ │ accessed/critical │ Concepts + Relations │ ├──────────────────────┴─────────────────────────┤ │ SQLite + FTS5 + sqlite-vec │ │ Hybrid search: BM25 (30%) + cosine (70%) │ └─────────────────────────────────────────────────┘ Two memory models:
- Memories — store/recall with temporal decay by importance. Critical memories never fade, low-importance ones decay naturally. Filter by topic or keyword.
- Memoirs — permanent knowledge graphs. Concepts linked by typed relations (
depends_on,contradicts,superseded_by, ...). Filter by label. - Feedback — record corrections when AI predictions are wrong. Search past mistakes before making new predictions. Closed-loop learning.
# Homebrew (macOS / Linux) brew tap rtk-ai/tap && brew install icm # Quick install curl -fsSL https://raw.githubusercontent.com/rtk-ai/icm/main/install.sh | sh # From source cargo install --path crates/icm-cli# Auto-detect and configure all supported tools icm initConfigures 14 tools in one command:
| Tool | Config file | Format |
|---|---|---|
| Claude Code | ~/.claude.json | JSON |
| Claude Desktop | ~/Library/.../claude_desktop_config.json | JSON |
| Cursor | ~/.cursor/mcp.json | JSON |
| Windsurf | ~/.codeium/windsurf/mcp_config.json | JSON |
| VS Code / Copilot | ~/Library/.../Code/User/mcp.json | JSON |
| Gemini Code Assist | ~/.gemini/settings.json | JSON |
| Zed | ~/.zed/settings.json | JSON |
| Amp | ~/.config/amp/settings.json | JSON |
| Amazon Q | ~/.aws/amazonq/mcp.json | JSON |
| Cline | VS Code globalStorage | JSON |
| Roo Code | VS Code globalStorage | JSON |
| Kilo Code | VS Code globalStorage | JSON |
| OpenAI Codex CLI | ~/.codex/config.toml | TOML |
| OpenCode | ~/.config/opencode/opencode.json | JSON |
Or manually:
# Claude Code claude mcp add icm -- icm serve # Compact mode (shorter responses, saves tokens) claude mcp add icm -- icm serve --compact # Any MCP client: command = "icm", args = ["serve"]icm init --mode skillInstalls slash commands and rules for Claude Code (/recall, /remember), Cursor (.mdc rule), Roo Code (.md rule), and Amp (/icm-recall, /icm-remember).
icm init --mode hookInstalls all 3 extraction layers as Claude Code hooks:
Claude Code hooks:
| Hook | Event | What it does |
|---|---|---|
icm hook pre | PreToolUse | Auto-allow icm CLI commands (no permission prompt) |
icm hook post | PostToolUse | Extract facts from tool output every 15 calls |
icm hook compact | PreCompact | Extract memories from transcript before context compression |
icm hook prompt | UserPromptSubmit | Inject recalled context at the start of each prompt |
OpenCode plugin (auto-installed to ~/.config/opencode/plugins/icm.js):
| OpenCode event | ICM Layer | What it does |
|---|---|---|
tool.execute.after | Layer 0 | Extract facts from tool output |
experimental.session.compacting | Layer 1 | Extract from conversation before compaction |
session.created | Layer 2 | Recall context at session start |
ICM can be used via CLI (icm commands) or MCP server (icm serve). Both access the same database.
| CLI | MCP | |
|---|---|---|
| Latency | ~30ms (direct binary) | ~50ms (JSON-RPC stdio) |
| Token cost | 0 (hook-based, invisible) | ~20-50 tokens/call (tool schema) |
| Setup | icm init --mode hook | icm init --mode mcp |
| Works with | Claude Code, OpenCode (via hooks/plugins) | All 14 MCP-compatible tools |
| Auto-extraction | Yes (hooks trigger icm extract) | Yes (MCP tools call store) |
| Best for | Power users, token savings | Universal compatibility |
icm dashboard # or: icm tuiInteractive TUI with 5 tabs: Overview, Topics, Memories, Health, Memoirs. Keyboard navigation (vim-style: j/k, g/G, Tab, 1-5), live search (/), auto-refresh.
Requires the tui feature (enabled by default). Build without: cargo install --path crates/icm-cli --no-default-features --features embeddings.
# Store icm store -t "my-project" -c "Use PostgreSQL for the main DB" -i high -k "db,postgres" # Recall icm recall "database choice" icm recall "auth setup" --topic "my-project" --limit 10 icm recall "architecture" --keyword "postgres" # Manage icm forget <memory-id> icm consolidate --topic "my-project" icm topics icm stats # Extract facts from text (rule-based, zero LLM cost) echo "The parser uses Pratt algorithm" | icm extract -p my-project# Create a memoir icm memoir create -n "system-architecture" -d "System design decisions" # Add concepts with labels icm memoir add-concept -m "system-architecture" -n "auth-service" \ -d "Handles JWT tokens and OAuth2 flows" -l "domain:auth,type:service" # Link concepts icm memoir link -m "system-architecture" --from "api-gateway" --to "auth-service" -r depends-on # Search with label filter icm memoir search -m "system-architecture" "authentication" icm memoir search -m "system-architecture" "service" --label "domain:auth" # Inspect neighborhood icm memoir inspect -m "system-architecture" "auth-service" -D 2 # Export graph (formats: json, dot, ascii, ai) icm memoir export -m "system-architecture" -f ascii # Box-drawing with confidence bars icm memoir export -m "system-architecture" -f dot # Graphviz DOT (color = confidence level) icm memoir export -m "system-architecture" -f ai # Markdown optimized for LLM context icm memoir export -m "system-architecture" -f json # Structured JSON with all metadata # Generate SVG visualization icm memoir export -m "system-architecture" -f dot | dot -Tsvg > graph.svg| Tool | Description |
|---|---|
icm_memory_store | Store with auto-dedup (>85% similarity → update instead of duplicate) |
icm_memory_recall | Search by query, filter by topic and/or keyword |
icm_memory_update | Edit a memory in-place (content, importance, keywords) |
icm_memory_forget | Delete a memory by ID |
icm_memory_consolidate | Merge all memories of a topic into one summary |
icm_memory_list_topics | List all topics with counts |
icm_memory_stats | Global memory statistics |
icm_memory_health | Per-topic hygiene audit (staleness, consolidation needs) |
icm_memory_embed_all | Backfill embeddings for vector search |
| Tool | Description |
|---|---|
icm_memoir_create | Create a new memoir (knowledge container) |
icm_memoir_list | List all memoirs |
icm_memoir_show | Show memoir details and all concepts |
icm_memoir_add_concept | Add a concept with labels |
icm_memoir_refine | Update a concept's definition |
icm_memoir_search | Full-text search, optionally filtered by label |
icm_memoir_search_all | Search across all memoirs |
icm_memoir_link | Create typed relation between concepts |
icm_memoir_inspect | Inspect concept and graph neighborhood (BFS) |
icm_memoir_export | Export graph (json, dot, ascii, ai) with confidence levels |
| Tool | Description |
|---|---|
icm_feedback_record | Record a correction when an AI prediction was wrong |
icm_feedback_search | Search past corrections to inform future predictions |
icm_feedback_stats | Feedback statistics: total count, breakdown by topic, most applied |
part_of · depends_on · related_to · contradicts · refines · alternative_to · caused_by · instance_of · superseded_by
Episodic memory (Topics) captures decisions, errors, preferences. Each memory has a weight that decays over time based on importance:
| Importance | Decay | Prune | Behavior |
|---|---|---|---|
critical | none | never | Never forgotten, never pruned |
high | slow (0.5x rate) | never | Fades slowly, never auto-deleted |
medium | normal | yes | Standard decay, pruned when weight < threshold |
low | fast (2x rate) | yes | Quickly forgotten |
Decay is access-aware: frequently recalled memories decay slower (decay / (1 + access_count × 0.1)). Applied automatically on recall (if >24h since last decay).
Memory hygiene is built-in:
- Auto-dedup: storing content >85% similar to an existing memory in the same topic updates it instead of creating a duplicate
- Consolidation hints: when a topic exceeds 7 entries,
icm_memory_storewarns the caller to consolidate - Health audit:
icm_memory_healthreports per-topic entry count, average weight, stale entries, and consolidation needs - No silent data loss: critical and high-importance memories are never auto-pruned
Semantic memory (Memoirs) captures structured knowledge as a graph. Concepts are permanent — they get refined, never decayed. Use superseded_by to mark obsolete facts instead of deleting them.
With embeddings enabled, ICM uses hybrid search:
- FTS5 BM25 (30%) — full-text keyword matching
- Cosine similarity (70%) — semantic vector search via sqlite-vec
Default model: intfloat/multilingual-e5-base (768d, 100+ languages). Configurable in your config file:
[embeddings] # enabled = false # Disable entirely (no model download) model = "intfloat/multilingual-e5-base" # 768d, multilingual (default) # model = "intfloat/multilingual-e5-small" # 384d, multilingual (lighter) # model = "intfloat/multilingual-e5-large" # 1024d, multilingual (best accuracy) # model = "Xenova/bge-small-en-v1.5" # 384d, English-only (fastest) # model = "jinaai/jina-embeddings-v2-base-code" # 768d, code-optimizedTo skip the embedding model download entirely, use any of these:
icm --no-embeddings serve # CLI flag ICM_NO_EMBEDDINGS=1 icm serve # Environment variableOr set enabled = false in your config file. ICM will fall back to FTS5 keyword search (still works, just no semantic matching).
Changing the model automatically re-creates the vector index (existing embeddings are cleared and can be regenerated with icm_memory_embed_all).
Single SQLite file. No external services, no network dependency.
~/Library/Application Support/dev.icm.icm/memories.db # macOS ~/.local/share/dev.icm.icm/memories.db # Linux C:\Users\<user>\AppData\Local\icm\icm\data\memories.db # Windows icm config # Show active configConfig file location (platform-specific, or $ICM_CONFIG):
~/Library/Application Support/dev.icm.icm/config.toml # macOS ~/.config/icm/config.toml # Linux C:\Users\<user>\AppData\Roaming\icm\icm\config\config.toml # Windows See config/default.toml for all options.
ICM extracts memories automatically via three layers:
Layer 0: Pattern hooks Layer 1: PreCompact Layer 2: UserPromptSubmit (zero LLM cost) (zero LLM cost) (zero LLM cost) ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ PostToolUse hook │ │ PreCompact hook │ │ UserPromptSubmit │ │ │ │ │ │ │ │ • Bash errors │ │ Context about to │ │ User sends prompt │ │ • git commits │ │ be compressed → │ │ → icm recall │ │ • config changes │ │ extract memories │ │ → inject context │ │ • decisions │ │ from transcript │ │ │ │ • preferences │ │ before they're │ │ Agent starts with │ │ • learnings │ │ lost forever │ │ relevant memories │ │ • constraints │ │ │ │ already loaded │ │ │ │ Same patterns + │ │ │ │ Rule-based, no LLM│ │ --store-raw fallbk│ │ │ └──────────────────┘ └──────────────────┘ └──────────────────┘ | Layer | Status | LLM cost | Hook command | Description |
|---|---|---|---|---|
| Layer 0 | Implemented | 0 | icm hook post | Rule-based keyword extraction from tool output |
| Layer 1 | Implemented | 0 | icm hook compact | Extract from transcript before context compression |
| Layer 2 | Implemented | 0 | icm hook prompt | Inject recalled memories on each user prompt |
All 3 layers are installed automatically by icm init --mode hook.
ICM Benchmark (1000 memories, 384d embeddings) ────────────────────────────────────────────────────────── Store (no embeddings) 1000 ops 34.2 ms 34.2 µs/op Store (with embeddings) 1000 ops 51.6 ms 51.6 µs/op FTS5 search 100 ops 4.7 ms 46.6 µs/op Vector search (KNN) 100 ops 59.0 ms 590.0 µs/op Hybrid search 100 ops 95.1 ms 951.1 µs/op Decay (batch) 1 ops 5.8 ms 5.8 ms/op ────────────────────────────────────────────────────────── Apple M1 Pro, in-memory SQLite, single-threaded. icm bench --count 1000
Multi-session workflow with a real Rust project (12 files, ~550 lines). Sessions 2+ show the biggest gains as ICM recalls instead of re-reading files.
ICM Agent Benchmark (10 sessions, model: haiku, 3 runs averaged) ══════════════════════════════════════════════════════════════════ Without ICM With ICM Delta Session 2 (recall) Turns 5.7 4.0 -29% Context (input) 99.9k 67.5k -32% Cost $0.0298 $0.0249 -17% Session 3 (recall) Turns 3.3 2.0 -40% Context (input) 74.7k 41.6k -44% Cost $0.0249 $0.0194 -22% ══════════════════════════════════════════════════════════════════ icm bench-agent --sessions 10 --model haiku
Agent recalls specific facts from a dense technical document across sessions. Session 1 reads and memorizes; sessions 2+ answer 10 factual questions without the source text.
ICM Recall Benchmark (10 questions, model: haiku, 5 runs averaged) ══════════════════════════════════════════════════════════════════════ No ICM With ICM ────────────────────────────────────────────────────────────────────── Average score 5% 68% Questions passed 0/10 5/10 ══════════════════════════════════════════════════════════════════════ icm bench-recall --model haiku
Same test with local models — pure context injection, no tool use needed.
Model Params No ICM With ICM Delta ───────────────────────────────────────────────────────── qwen2.5:14b 14B 4% 97% +93% mistral:7b 7B 4% 93% +89% llama3.1:8b 8B 4% 93% +89% qwen2.5:7b 7B 4% 90% +86% phi4:14b 14B 6% 79% +73% llama3.2:3b 3B 0% 76% +76% gemma2:9b 9B 4% 76% +72% qwen2.5:3b 3B 2% 58% +56% ───────────────────────────────────────────────────────── scripts/bench-ollama.sh qwen2.5:14b
Standard academic benchmark — 500 questions across 6 memory abilities, from the LongMemEval paper (ICLR 2025).
LongMemEval Results — ICM (oracle variant, 500 questions) ════════════════════════════════════════════════════════════════ Category Retrieval Answer (Sonnet) ──────────────────────────────────────────────────────────────── single-session-user 100.0% 91.4% temporal-reasoning 100.0% 85.0% single-session-assistant 100.0% 83.9% multi-session 100.0% 81.2% knowledge-update 100.0% 80.8% single-session-preference 100.0% 50.0% ──────────────────────────────────────────────────────────────── OVERALL 100.0% 82.0% ════════════════════════════════════════════════════════════════ - Retrieval = does ICM find the right information? 100% across all categories.
- Answer = can the LLM produce the correct answer from retrieved context? Depends on the LLM, not ICM.
- The retrieval score is the ICM benchmark. The answer score reflects the downstream LLM capability.
scripts/bench-longmemeval.py --judge claude --workers 8
All benchmarks use real API calls — no mocks, no simulated responses, no cached answers.
- Agent benchmark: Creates a real Rust project in a tempdir. Runs N sessions with
claude -p --output-format json. Without ICM: empty MCP config. With ICM: real MCP server + auto-extraction + context injection. - Knowledge retention: Uses a fictional technical document (the "Meridian Protocol"). Scores answers by keyword matching against expected facts. 120s timeout per invocation.
- Isolation: Each run uses its own tempdir and fresh SQLite DB. No session persistence.
| Document | Description |
|---|---|
| Technical Architecture | Crate structure, search pipeline, decay model, sqlite-vec integration, testing |
| User Guide | Installation, topic organization, consolidation, extraction, troubleshooting |
| Product Overview | Use cases, benchmarks, comparison with alternatives |
