- Santa Clara
- https://www.linkedin.com/in/rdspring1
- @ryanspring13
Stars
PyTorch native quantization and sparsity for training and inference
MSLK (Meta Superintelligence Labs Kernels) is a collection of PyTorch GPU operator libraries that are designed and optimized for GenAI training and inference, such as FP8 row-wise quantization and …
Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!
Companion code for Grokking Megakernels: fuse an entire LLM forward pass into a single CUDA kernel
Helpful kernel tutorials and examples for tile-based GPU programming
FlashInfer: Kernel Library for LLM Serving
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…
Triton-based Symmetric Memory operators and examples
CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
High-performance MinHash implementation in Rust with Python bindings for efficient similarity estimation and deduplication of large datasets
Distributed Compiler based on Triton for Parallel Systems
Helpful tools and examples for working with flex-attention
A high-throughput and memory-efficient inference and serving engine for LLMs
Customized matrix multiplication kernels
DFloat11 [NeurIPS '25]: Lossless Compression of LLMs and DiTs for Efficient GPU Inference
Agent2Agent (A2A) is an open protocol enabling communication and interoperability between opaque agentic applications.
A Python framework for accelerated simulation, data generation and spatial computing.
Open source implementation of AlphaFold3





