ambitiousCC (Q's repo)

ambitiousCC/README.md

Pinned Loading

fastllm fastllm Public

Forked from ztxz16/fastllm

fastllm是后端无依赖的高性能大模型推理库。同时支持张量并行推理稠密模型和混合模式推理MOE模型，任意10G以上显卡即可推理满血DeepSeek。双路9004/9005服务器+单显卡部署DeepSeek满血满精度原版模型，单并发20tps；INT4量化模型单并发30tps，多并发可达60+。

C++
kvcache-ai/ktransformers kvcache-ai/ktransformers Public

A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

Python 16.8k 1.2k
chitu chitu Public

Forked from thu-pacman/chitu

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

Python
sglang sglang Public

Forked from sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

Python
vllm vllm Public

Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python
vllm-ascend vllm-ascend Public

Forked from vllm-project/vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

Python