Young768 (menelaus) / Starred

Stars

Lightning-AI / lightning-thunder

PyTorch compiler that accelerates training and inference. Get built-in optimizations for performance, memory, parallelism, and easily write your own.

Python 1,449 109 Updated Mar 17, 2026

iree-org / iree-nvgpu

MLIR 48 16 Updated Mar 5, 2024

tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone

C++ 194,314 75,252 Updated Mar 22, 2026

WongHuLin / Raptor-T

sparse block-attention optimization

Python 1 1 Updated May 9, 2025

facebookincubator / AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Python 4,710 382 Updated Mar 16, 2026

stuntgoat / kmeans

K Means Clustering with Python

Python 234 140 Updated Jun 6, 2023

flame / how-to-optimize-gemm

C 2,001 364 Updated Jul 29, 2023

jax-ml / jax

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Python 35,180 3,484 Updated Mar 22, 2026

colesbury / nogil

Multithreaded Python without the GIL

Python 2,917 103 Updated May 20, 2025

Tony-Tan / CUDA_Freshman

Cuda 2,716 506 Updated Jan 16, 2024

kaiyuyue / torchshard

Slicing a PyTorch Tensor Into Parallel Shards

Python 300 15 Updated Jun 7, 2025

bindog / pytorch-model-parallel

A memory balanced and communication efficient FullyConnected layer with CrossEntropyLoss model parallel implementation in PyTorch

Python 84 20 Updated Jun 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

menelaus Young768

Achievements