Skip to content
View basujindal's full-sized avatar

Block or report basujindal

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

The best ChatGPT that $100 can buy.

Python 49,955 6,542 Updated Mar 17, 2026

Fast and memory-efficient exact attention

Python 22,895 2,541 Updated Mar 22, 2026

WhatsApp MCP server

Go 5,441 941 Updated Jul 13, 2025

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,478 1,743 Updated Mar 18, 2026

Fastest kernels written from scratch

Cuda 561 69 Updated Sep 18, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 6,273 840 Updated Mar 22, 2026

A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS

255 13 Updated May 6, 2025

[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Python 726 34 Updated Dec 2, 2024

The code powering searchthearxiv.com, a simple semantic search engine for more than 300,000 ML papers on arXiv.

Python 171 15 Updated Apr 21, 2025

Intel CPU undervolting and throttling configuration tool

C 1,057 71 Updated Aug 24, 2023

Guide to linux undervolting for Haswell and never Intel CPUs

394 13 Updated Apr 4, 2018

[NeurIPS 2024] Simple and Effective Masked Diffusion Language Model

Python 660 92 Updated Sep 29, 2025

Exploring Hacker News by mapping and analyzing 40 million posts and comments for fun

TypeScript 211 9 Updated May 14, 2025

A JAX research toolkit for building, editing, and visualizing neural networks.

Python 1,872 70 Updated Jun 22, 2025

LLM training in simple, raw C/CUDA

Cuda 29,234 3,441 Updated Jun 26, 2025

This repository contains integer operators on GPUs for PyTorch.

Python 237 56 Updated Sep 29, 2023

PyTorch compiler that accelerates training and inference. Get built-in optimizations for performance, memory, parallelism, and easily write your own.

Python 1,449 109 Updated Mar 17, 2026

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 1,041 86 Updated Sep 4, 2024

Open-Sora: Democratizing Efficient Video Production for All

Python 28,734 2,917 Updated Apr 30, 2025

Grok open release

Python 51,529 8,475 Updated Aug 30, 2024

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 1,098 110 Updated Dec 30, 2024

Optimized Stable Diffusion modified to run on lower GPU VRAM

Jupyter Notebook 3,099 455 Updated Sep 20, 2023

AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. Includes AI personas, AGI functions, world-class Beam multi-model chats, text-to-image, voice, response streamin…

TypeScript 6,898 1,569 Updated Mar 22, 2026

Stop messing around with finicky sampling parameters and just use DRµGS!

HTML 360 22 Updated Jun 1, 2024

#1 PDF Application on GitHub that lets you edit PDFs on any device anywhere

TypeScript 75,673 6,441 Updated Mar 21, 2026

Turn (almost) any Python command line program into a full GUI application with one line

Python 22,025 1,044 Updated Mar 12, 2026

Simple, free and efficient ad-blocker and privacy guard for Windows, macOS and Linux.

Go 3,912 121 Updated Mar 22, 2026

Distribute and run LLMs with a single file.

C++ 23,865 1,282 Updated Mar 19, 2026

Fast, collaborative live terminal sharing over the web

Rust 7,418 280 Updated Jun 19, 2025

Display and control your Android device

C 137,380 12,805 Updated Mar 20, 2026
Next