Stars
DeepEP: an efficient expert-parallel communication library
🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
Ongoing research training transformer models at scale
This is a text generation method which returns a generator, streaming out each token in real-time during inference, based on Huggingface/Transformers.
Large Language Model Text Generation Inference
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
Language-agnostic persistent background job server
A GPU performance profiling tool for PyTorch models
mtbrandy / pytorch
Forked from pytorch/pytorchPlease visit https://github.com/IBM/pytorch-large-model-support for the latest information on PyTorch LMS.
CUDA Templates and Python DSLs for High-Performance Linear Algebra
Minimal PyPI server for uploading & downloading packages with pip/easy_install
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
Implementation of https://arxiv.org/abs/1904.00962
Tutorial code on how to build your own Deep Learning System in 2k Lines
🧠 Laws, Theories, Principles and Patterns for developers and technologists.
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
Repo for counting stars and contributing. Press F to pay respect to glorious developers.
