turboquant

Here are 16 public repositories matching this topic...

artalis-io / bitnet.c

Minimal, zero-dependency LLM inference in pure C11. CPU-first with NEON/AVX2 SIMD. Flash MoE (pread + LRU expert cache). TurboQuant 3-bit KV compression (8.9x less memory per session). 20+ GGUF quant formats. Compiles to WASM.

c neon wasm inference simd moe avx2 quantization kv-cache cpu-inference llm gguf turboquant

Updated Mar 28, 2026
C

jjang-ai / exploitbot

Star

No bs theatricals. Real automated pentesting. Mac only.

Updated Mar 26, 2026
Python

back2matching / turboquant

Star

First open-source TurboQuant KV cache compression for LLM inference. Drop-in for HuggingFace. pip install turboquant.

machine-learning compression gpu transformers inference pytorch quantization vram huggingface kv-cache llm turboquant

Updated Mar 27, 2026
Python

LostBeard / SpawnDev.ILGPU.ML

Sponsor

Star

Hardware-agnostic machine learning infrastructure for .NET. Implements high-performance neural network layers in C# that are transpiled to run on WebGPU, CUDA, OpenCL, WebGL, CPU, and Wasm via SpawnDev.ILGPU. Optimized for Blazor WebAssembly and native GPU execution.

Updated Mar 28, 2026
WGSL

tushu1232 / turboquant-server

Star

Turbo Index

google hpc gpu information-theory pytorch nearest-neighbor-search quantization vector-quantization kv-cache large-language-models llm-inference turboquant turboindexer

Updated Mar 25, 2026
Python

amitshekhariitbhu / turboquant-experiment

Star

KV Cache with PagedAttention vs PagedAttention + TurboQuant - experiments across token sizes comparing memory, latency, and accuracy.

inference large-language-models llm llms llm-inference kvcache kvcache-optimization kvcache-compression turboquant

Updated Mar 26, 2026
Python

yzamari / turboQuantPlayground

Star

TurboQuant (ICLR 2026) ported to Apple Silicon — KV cache compression with MLX Metal kernels + PyTorch CPU

machine-learning deep-learning metal transformers inference pytorch attention quantization mlx iclr kv-cache apple-silicon llm llm-inference turboquant

Updated Mar 27, 2026
Python

wjddusrb03 / commitmind

Star

CommitMind: Semantic search for Git commit history powered by TurboQuant vector compression (ICLR 2026). Search commits by meaning, not just keywords.

python git nlp machine-learning embeddings code-search developer-tools quantization semantic-search git-history cli-tool sentence-embeddings commit-history commit-search iclr2026 natural-language-search turboquant vector-compression

Updated Mar 28, 2026
Python

GenauraApp / TurboQuant

Star

Near-optimal vector quantization with zero metadata overhead — PyTorch SDK based on Google Research ICLR 2026

Updated Mar 25, 2026
Python

back2matching / turboquant-vectors

Star

Compress embeddings 6x instantly with TurboQuant. First pip package using Google's TurboQuant (ICLR 2026) for vector search. 71.9% recall vs FAISS PQ 13.3%.

machine-learning compression numpy embeddings quantization faiss rag vector-search turboquant

Updated Mar 26, 2026
Python

devYRPauli / turboquant-m1pro-evaluation

Star

TurboQuant KV cache compression evaluation on Apple M1 Pro 16GB. Two-round study: MLX path (100% needle at 16K) and llama.cpp Metal path. Five implementation bugs found and fixed.

quantization kv-cache apple-silicon llm turboquant

Updated Mar 27, 2026
Python

wjddusrb03 / chatmind

Star

ChatMind: Semantic search for Discord & KakaoTalk chat messages. Search by meaning, not keywords. Powered by TurboQuant compression (ICLR 2026).

multilingual python nlp discord embeddings developer-tools semantic-search kakaotalk cli-tool sentence-embeddings chat-export iclr2026 natural-language-search turboquant vector-compression chat-search message-search

Updated Mar 28, 2026
Python

ShipItAndPray / mcp-turboquant

Star

MCP server for LLM quantization. Compress any model to GGUF/GPTQ/AWQ in one tool call. First MCP server for model compression.

mcp quantization llm gguf mcp-server turboquant

Updated Mar 25, 2026
JavaScript

wjddusrb03 / diffmind

Star

AI Code Review Memory - learns from your team's bug history and warns when similar patterns appear

python git ai developer-tools code-review semantic-search bug-detection turboquant

Updated Mar 28, 2026
Python

Sunnyztj / turboquant-memory

Star

TurboQuant (ICLR 2026) vector quantization for memory/RAG embedding compression | 5-8x压缩 98%+召回率 | numpy only, no GPU

numpy vector-quantization rag embedding-compression memory-search iclr2026 openclaw hadamard-transform turboquant

Updated Mar 27, 2026
Python

wjddusrb03 / langchain-turboquant

Star

LangChain VectorStore with TurboQuant compression (ICLR 2026) - 6x memory reduction, training-free, no GPU required. The first LangChain integration for Google Research's TurboQuant algorithm.

python machine-learning compression quantization embedding similarity-search rag memory-optimization vector-database kv-cache google-research llm langchain vector-store turboquant

Updated Mar 27, 2026
Python

Improve this page

Add a description, image, and links to the turboquant topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the turboquant topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

turboquant

Here are 16 public repositories matching this topic...

artalis-io / bitnet.c

jjang-ai / exploitbot

back2matching / turboquant

LostBeard / SpawnDev.ILGPU.ML

tushu1232 / turboquant-server

amitshekhariitbhu / turboquant-experiment

yzamari / turboQuantPlayground

wjddusrb03 / commitmind

GenauraApp / TurboQuant

back2matching / turboquant-vectors

devYRPauli / turboquant-m1pro-evaluation

wjddusrb03 / chatmind

ShipItAndPray / mcp-turboquant

wjddusrb03 / diffmind

Sunnyztj / turboquant-memory

wjddusrb03 / langchain-turboquant

Improve this page

Add this topic to your repo