Lightweight, model-agnostic chat history compression utilities for AI assistants. Bring Your Own Model (BYOM) and use simple strategies to keep conversations concise while preserving context.
Simple token-based compression that removes the oldest messages when your conversation exceeds the token threshold. Always preserves system messages and the most recent messages to maintain context continuity.
AI-powered compression that uses your own chat model to create concise summaries of older conversation segments. The summary is injected as a system message, preserving the conversation flow while drastically reducing token usage.
- OpenAI: see examples/OPENAI_EXAMPLE.md (copy-paste snippet; BYOM, no deps added here).
- LangChain: see examples/LANGCHAIN_COMPRESS_HISTORY.md.
- Trim strategy: token-aware trimming based on your model's max tokens and a threshold.
- Summarize strategy: token-aware summarization of older messages using your own chat model.
- Framework agnostic: plug in any model wrapper implementing a minimal
invoke()interface. - Optional LangChain adapter with a one-call helper for compressing histories.
npm install slimcontextUpgrading from an earlier version? See the Migration notes in the changelog:
- CHANGELOG: ./CHANGELOG.md#migration
Provide a model that implements:
interface SlimContextMessage { role: 'system' | 'user' | 'assistant' | 'tool' | 'human'; content: string; } interface SlimContextModelResponse { content: string; } interface SlimContextChatModel { invoke(messages: SlimContextMessage[]): Promise<SlimContextModelResponse>; }slimcontext handles message arrays shaped as:
interface SlimContextMessage { role: 'system' | 'user' | 'assistant' | 'tool' | 'human'; content: string; }import { TrimCompressor, SlimContextMessage } from 'slimcontext'; // Configure token-aware trimming const compressor = new TrimCompressor({ // Optional: defaults shown maxModelTokens: 8192, // your model's context window thresholdPercent: 0.7, // start trimming after 70% of maxModelTokens minRecentMessages: 2, // always keep at least last 2 messages // Optional estimator; default is a len/4 heuristic // estimateTokens: (m) => yourCustomTokenCounter(m), }); let history: SlimContextMessage[] = [ { role: 'system', content: 'You are a helpful assistant.' }, // ... conversation grows ]; history = await compressor.compress(history);import { SummarizeCompressor, SlimContextMessage, SlimContextChatModel, SlimContextModelResponse, } from 'slimcontext'; class MyModel implements SlimContextChatModel { async invoke(messages: SlimContextMessage[]): Promise<SlimContextModelResponse> { // Call out to your LLM provider (OpenAI, Anthropic, etc.) const userContent = messages.find((m) => m.role === 'user')?.content || ''; return { content: 'Summary: ' + userContent.slice(0, 100) }; } } const model = new MyModel(); const compressor = new SummarizeCompressor({ model, // Optional: defaults shown maxModelTokens: 8192, thresholdPercent: 0.7, // summarize once total tokens exceed 70% minRecentMessages: 4, // keep at least last 4 messages verbatim // estimateTokens: (m) => yourCustomTokenCounter(m), // prompt: '...custom summarization instructions...' }); let history: SlimContextMessage[] = [ { role: 'system', content: 'You are a helpful assistant.' }, // ... conversation grows ]; history = await compressor.compress(history);Notes about summarization behavior
- When the estimated total tokens exceed the threshold, the oldest portion (excluding a leading system message) is summarized into a single system message inserted before the recent tail.
- The most recent
minRecentMessagesare always preserved verbatim.
You can chain strategies depending on token thresholds or other heuristics.
- See examples/OPENAI_EXAMPLE.md for an OpenAI copy-paste snippet.
- See examples/LANGCHAIN_COMPRESS_HISTORY.md for a one-call LangChain history compression helper.
If you already use LangChain chat models, you can use the built-in adapter. It’s exported in two ways:
- Namespaced:
import { langchain } from 'slimcontext' - Direct path:
import * as langchain from 'slimcontext/adapters/langchain'
Common helpers:
compressLangChainHistory(history, options)– one-call compression for LangChainBaseMessage[].toSlimModel(llm)– wrap a LangChainBaseChatModelforSummarizeCompressor.
Example (one-call history compression):
import { AIMessage, HumanMessage, SystemMessage } from '@langchain/core/messages'; import { ChatOpenAI } from '@langchain/openai'; import { langchain } from 'slimcontext'; const lc = new ChatOpenAI({ model: 'gpt-5-mini', temperature: 0 }); const history = [ new SystemMessage('You are helpful.'), new HumanMessage('Please summarize the discussion so far.'), new AIMessage('Certainly!'), // ...more messages ]; const compact = await langchain.compressLangChainHistory(history, { strategy: 'summarize', llm: lc, // BaseChatModel maxModelTokens: 8192, thresholdPercent: 0.8, // summarize beyond 80% of context window minRecentMessages: 4, });See examples/LANGCHAIN_COMPRESS_HISTORY.md for a fuller copy-paste example.
TrimCompressor({ maxModelTokens?, thresholdPercent?, estimateTokens?, minRecentMessages? })SummarizeCompressor({ model, maxModelTokens?, thresholdPercent?, estimateTokens?, minRecentMessages?, prompt? })
SlimContextMessageSlimContextChatModelSlimContextCompressorSlimContextModelResponse
MIT

