Skip to content
View jfc4050's full-sized avatar
:octocat:
:octocat:

Block or report jfc4050

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing …

Cuda 989 227 Updated Mar 19, 2026

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,240 130 Updated Mar 21, 2026

Tile primitives for speedy kernels

Cuda 3,242 261 Updated Mar 17, 2026

Machine Learning Engineering Open Book

Python 17,480 1,108 Updated Mar 16, 2026

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,474 1,742 Updated Mar 18, 2026