Skip to content
View Young768's full-sized avatar
🏠
Working from home
🏠
Working from home

Block or report Young768

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

PyTorch compiler that accelerates training and inference. Get built-in optimizations for performance, memory, parallelism, and easily write your own.

Python 1,449 109 Updated Mar 17, 2026
MLIR 48 16 Updated Mar 5, 2024

An Open Source Machine Learning Framework for Everyone

C++ 194,314 75,252 Updated Mar 22, 2026

sparse block-attention optimization

Python 1 1 Updated May 9, 2025

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Python 4,710 382 Updated Mar 16, 2026

K Means Clustering with Python

Python 234 140 Updated Jun 6, 2023

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Python 35,180 3,484 Updated Mar 22, 2026

Multithreaded Python without the GIL

Python 2,917 103 Updated May 20, 2025

Slicing a PyTorch Tensor Into Parallel Shards

Python 300 15 Updated Jun 7, 2025

A memory balanced and communication efficient FullyConnected layer with CrossEntropyLoss model parallel implementation in PyTorch

Python 84 20 Updated Jun 11, 2020