Skip to content

feat: cache tool schema token counts#12382

Draft
danny-avila wants to merge 1 commit intodevfrom
claude/wizardly-lichterman
Draft

feat: cache tool schema token counts#12382
danny-avila wants to merge 1 commit intodevfrom
claude/wizardly-lichterman

Conversation

@danny-avila
Copy link
Owner

Summary

  • Adds time-based caching (30min TTL) for tool schema token counts using the existing Keyv/Redis infrastructure, avoiding expensive recalculation on every agent run
  • Cache is keyed by {provider}:{fingerprint} where fingerprint = sorted tool names + count, so agents sharing the same tools share the cached value
  • New reusable utility module (packages/api/src/agents/toolTokens.ts) with getToolFingerprint, computeToolSchemaTokens, and getOrComputeToolTokens
  • buildAgentContext in createRun is now async with Promise.all for parallel cache lookups in multi-agent runs

Companion PR

Requires danny-avila/agents@8e0ff93 (@librechat/agents changes: toolSchemaTokens on AgentInputs, exported multiplier constants, fromConfig() short-circuit)

Test plan

  • Verify first agent run computes + caches tool tokens, second run hits cache
  • Verify tool set change (add/remove tool) causes cache miss and recomputation
  • Verify two agents sharing the same tools share the cached entry
  • Verify Anthropic and non-Anthropic providers cache independently (different multipliers)
  • Existing tests pass with no regressions
Add time-based caching (30min TTL) for tool schema token counts using the existing Keyv/Redis infrastructure. Cache is keyed by provider and a lightweight fingerprint (sorted tool names + count), so agents sharing the same tool set share the cached value. New utility module (toolTokens.ts) provides reusable functions: - getToolFingerprint: stable fingerprint from tool names - computeToolSchemaTokens: mirrors AgentContext.calculateInstructionTokens - getOrComputeToolTokens: cache lookup with compute-on-miss In createRun, buildAgentContext is now async with Promise.all for parallel cache lookups in multi-agent runs. Pre-computed tokens are passed via AgentInputs.toolSchemaTokens, skipping calculateInstructionTokens in @librechat/agents entirely on cache hit.
Copilot AI review requested due to automatic review settings March 24, 2026 17:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces caching for tool schema token counts in the agents run pipeline to avoid repeated expensive token counting across runs/agents, using the existing Keyv/Redis cache infrastructure.

Changes:

  • Add a new cache namespace (CacheKeys.TOOL_TOKENS) with a 30-minute TTL.
  • Introduce packages/api/src/agents/toolTokens.ts to fingerprint tools, compute tool-schema token counts, and read/write cached values.
  • Make buildAgentContext async and resolve multi-agent inputs in parallel via Promise.all, passing toolSchemaTokens into AgentInputs.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
packages/data-provider/src/config.ts Adds CacheKeys.TOOL_TOKENS constant for the new cache namespace.
packages/api/src/agents/toolTokens.ts New utility for tool fingerprinting, token computation, and Keyv-backed caching.
packages/api/src/agents/run.ts Computes (or fetches) toolSchemaTokens during agent context building; parallelizes context building.
api/cache/getLogStores.js Registers a TOOL_TOKENS cache store in backend cache namespaces with 30-min TTL.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +145 to +149
const cacheKey = `${provider}:${fingerprint}`;
const cache = getCache();

const cached = (await cache.get(cacheKey)) as number | undefined;
if (cached != null && cached > 0) {
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache key only includes {provider}:{fingerprint}, but the computed token count also depends on getToolTokenMultiplier() which can vary based on clientOptions.model (e.g., OpenRouter Claude vs OpenRouter non-Claude). This can cause incorrect cache hits across different models under the same provider. Include the effective multiplier (or an isAnthropic/model discriminator) in the cache key so cached values can’t be reused with a different multiplier.

Copilot uses AI. Check for mistakes.
const cacheKey = `${provider}:${fingerprint}`;
const cache = getCache();

const cached = (await cache.get(cacheKey)) as number | undefined;
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache.get(cacheKey) is awaited without a try/catch. Since this cache is an optimization, a transient Keyv/Redis read failure should not fail the run; instead, fall back to computing tokens and optionally log at debug level. Wrap the read in a try/catch and treat read errors as cache misses.

Suggested change
const cached = (await cache.get(cacheKey)) as number | undefined;
let cached: number | undefined;
try {
cached = (await cache.get(cacheKey)) as number | undefined;
} catch {
/** Swallow cache read errors and treat as a cache miss */
cached = undefined;
}
Copilot uses AI. Check for mistakes.
Comment on lines 398 to 403
const agentInput: AgentInputs = {
provider,
reasoningKey,
toolDefinitions,
toolSchemaTokens,
agentId: agent.id,
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change adds toolSchemaTokens to AgentInputs, which requires the companion @librechat/agents update mentioned in the PR description. Since this PR doesn’t bump the @librechat/agents version / lockfile, it’s likely to break type-check/build until that dependency is updated. Consider updating the dependency version here (or gating usage) so the PR is self-contained.

Copilot uses AI. Check for mistakes.
Comment on lines +127 to +139
export async function getOrComputeToolTokens({
tools,
toolDefinitions,
provider,
clientOptions,
tokenCounter,
}: {
tools?: GenericTool[];
toolDefinitions?: LCTool[];
provider: Providers;
clientOptions?: ClientOptions;
tokenCounter: TokenCounter;
}): Promise<number> {
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New caching/token-counting logic is introduced here but there are no unit tests covering fingerprinting, multiplier selection, cache hit/miss behavior, or error fallback. Since packages/api/src/agents already has Jest coverage, add targeted tests for getToolFingerprint, computeToolSchemaTokens, and getOrComputeToolTokens (including the provider/model multiplier split and cache read/write failure scenarios).

Copilot uses AI. Check for mistakes.
CacheKeys.ADMIN_OAUTH_EXCHANGE,
Time.THIRTY_SECONDS,
),
[CacheKeys.TOOL_TOKENS]: standardCache(CacheKeys.TOOL_TOKENS, Time.THIRTY_MINUTES),
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds a TOOL_TOKENS store to getLogStores, but the new tool-token caching logic in packages/api/src/agents/toolTokens.ts instantiates its own standardCache(CacheKeys.TOOL_TOKENS, ...) instance and nothing in api/ currently calls getLogStores(CacheKeys.TOOL_TOKENS). In non-Redis mode this means the periodic in-memory TTL cleanup here won’t touch the cache instance actually used for tool token caching. Consider either removing this entry if unused, or centralizing TOOL_TOKENS cache construction so both paths share the same Keyv instance when USE_REDIS=false.

Suggested change
[CacheKeys.TOOL_TOKENS]: standardCache(CacheKeys.TOOL_TOKENS, Time.THIRTY_MINUTES),
Copilot uses AI. Check for mistakes.
@danny-avila danny-avila marked this pull request as draft March 24, 2026 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants