semantic-caching

Here are 13 public repositories matching this topic...

zzbright1998 / SentenceKV

Official implementation of "SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching" (COLM 2025). A novel KV cache compression method that organizes cache at sentence level using semantic similarity.

natural-language-processing transformers memory-efficiency efficient-inference inference-optimization kv-cache llm semantic-caching colm2025

Updated Sep 29, 2025
Python

renswickd / semantic-prompt-cache

Star

This app leverages Semantic Caching to minimize inference latency and reduce API costs by reusing semantically similar prompt responses.

optimization ttl-cache rag mistral-api semantic-caching

Updated Jul 4, 2025
Python

AzureManagedRedis / semantic-caching-demo-and-calculator

Star

Semantic caching demo with real-time streaming and a cost & sizing calculator, powered by Azure Managed Redis and Azure OpenAI.

demo azure-managed-redis semantic-caching cost-modeling

Updated Nov 12, 2025
Python

AP3008 / Janus

Star

Rust Local Token Compression Proxy for coding agents, built solo for GenAI Genesis 2026. 🏆 1st Google Sustainability Hack

rust redis local proxy-server tui tokio deduplication ratatui axum-framework token-compression semantic-caching

Updated Mar 16, 2026
Rust

redislabsdev / langcache-customer-data-eval

Star

Evaluate how a semantic cache performs on your dataset by computing key KPIs over a threshold sweep and producing plots/CSVs:

redis evaluation vector-database semantic-caching

Updated Mar 11, 2026
Python

sensoris / semcache-python

Star

Python library for the Semcache API

python ai openai llm anthropic semantic-caching

Updated Jun 9, 2025
Python

maichanks / llm-cost-optimizer

Star

LLM cost monitoring and optimization toolkit

redis monitoring budget cost-optimization llm openrouter prompt-compression semantic-caching token-tracking ai-cost openclaw api-cost-management

Updated Mar 16, 2026
JavaScript

sensoris / semcache-node

Star

Node SDK for the Semcache API

node js openai llm semantic-caching

Updated Jun 18, 2025
JavaScript

nunoferna / aegis-llm

Star

LLMOps API Gateway in Go. Optimizes GenAI workloads with Qdrant semantic caching, Redis rate-limiting, and OpenTelemetry metrics.

docker kubernetes redis golang api-gateway proxy rate-limiting gemini openai cloud-native helm-chart opentelemetry qdrant llm anthropic semantic-caching

Updated Mar 15, 2026
Go

Clement-Okolo / Semantic-Cache

Star

Semantic caching for LLM responses using Redis Vector DB, LangChain, and HuggingFace embeddings, parses PDFs, generates FAQs with Groq, and serves similarity-based answers without redundant LLM calls.

chunking long-term-memory vector-database semantic-caching llamacloud live-caching batch-caching

Updated Feb 28, 2026
Jupyter Notebook

Nagpal45 / memoria

Star

Semantic LLM Gateway featuring intelligent prompt routing (basic MoE), L1/L2 semantic caching (Redis + pgvector), fault-tolerant model fallbacks, and real-time streaming telemetry. Built to reduce AI inference latency and optimize API compute costs.