Tabrizian (Iman Tabrizian)

Pinned Loading

NVIDIA/TensorRT-LLM NVIDIA/TensorRT-LLM Public

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 12.3k 1.9k
triton-inference-server/server triton-inference-server/server Public

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Python 10.1k 1.7k
triton-inference-server/python_backend triton-inference-server/python_backend Public

Triton backend that enables pre-process, post-processing and other logic to be implemented in Python.

C++ 660 187
learning-to-quantize learning-to-quantize Public

Code for "Adaptive Gradient Quantization for Data-Parallel SGD", published in NeurIPS 2020.

Jupyter Notebook 30 5
triton-inference-server/model_analyzer triton-inference-server/model_analyzer Public

Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.

Python 499 80