wangraying (Rui Wang) / Starred · GitHub

Stars

github / copilot-cli

GitHub Copilot CLI brings the power of Copilot coding agent directly to your terminal.

Shell 9,498 1,294 Updated Mar 20, 2026

ROCm / hipify_torch

Python 24 18 Updated Mar 5, 2026

ROCm / cupy

Forked from cupy/cupy

A NumPy-compatible array library accelerated by CUDA

Python 7 4 Updated Feb 27, 2026

microsoft / mscclpp

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 492 90 Updated Mar 21, 2026

optuna / optuna

A hyperparameter optimization framework

Python 13,728 1,281 Updated Mar 19, 2026

openai / gpt-oss

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 19,931 2,064 Updated Jan 13, 2026

SafeAILab / EAGLE

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).

Python 2,231 264 Updated Feb 20, 2026

bytedance / InfiniStore

KV cache store for distributed LLM inference

C++ 399 34 Updated Nov 13, 2025

HazyResearch / Megakernels

Kernels, of the mega variety :)

Python 693 46 Updated Mar 22, 2026

ByteDance-Seed / Triton-distributed

Distributed Compiler based on Triton for Parallel Systems

Python 1,394 132 Updated Mar 11, 2026

llumnix-project / llumnix-ray

Efficient and easy multi-instance LLM serving

Python 535 46 Updated Mar 12, 2026

cloneofsimo / ptx-tutorial-by-aislop

PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)

TeX 66 2 Updated Mar 24, 2025

ai-dynamo / dynamo

A Datacenter Scale Distributed Inference Serving Framework

Rust 6,367 944 Updated Mar 22, 2026

basicmi / AI-Chip

A list of ICs and IPs for AI, Machine Learning and Deep Learning.

PHP 1,702 279 Updated Jun 5, 2024

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,271 98 Updated Aug 28, 2025

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,974 288 Updated May 15, 2025

ByteDance-Seed / FlexPrefill

Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference

Python 165 9 Updated Oct 13, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 6,273 840 Updated Mar 22, 2026

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 9,060 1,128 Updated Feb 9, 2026

deepseek-ai / FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,527 1,005 Updated Feb 6, 2026

harvard-edge / cs249r_book

Machine Learning Systems

JavaScript 22,862 2,719 Updated Mar 22, 2026

pytorch / torchtitan

A PyTorch native platform for training generative AI models

Python 5,171 754 Updated Mar 22, 2026

microsoft / vattention

Dynamic Memory Management for Serving LLMs without PagedAttention

C 466 39 Updated May 30, 2025

punica-ai / punica

Serving multiple LoRA finetuned LLM as one

Python 1,148 61 Updated May 8, 2024

FasterDecoding / Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Jupyter Notebook 2,722 194 Updated Jun 25, 2024

modelscope / dash-infer

DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including CUDA, x86 and ARMv9.

C 274 28 Updated Aug 6, 2025

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 5,866 587 Updated Feb 1, 2026

gpu-mode / Triton-Puzzles

Puzzles for learning Triton

Jupyter Notebook 2,343 207 Updated Mar 18, 2026

meta-pytorch / attention-gym

Helpful tools and examples for working with flex-attention

Python 1,161 75 Updated Feb 8, 2026

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 2,881 264 Updated Mar 22, 2026