Skip to content
View Tabrizian's full-sized avatar

Organizations

@NVIDIA @nuxt-community @kubeflow @triton-inference-server

Block or report Tabrizian

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. NVIDIA/TensorRT-LLM NVIDIA/TensorRT-LLM Public

    TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

    Python 12.3k 1.9k

  2. triton-inference-server/server triton-inference-server/server Public

    The Triton Inference Server provides an optimized cloud and edge inferencing solution.

    Python 10.1k 1.7k

  3. triton-inference-server/python_backend triton-inference-server/python_backend Public

    Triton backend that enables pre-process, post-processing and other logic to be implemented in Python.

    C++ 660 187

  4. learning-to-quantize learning-to-quantize Public

    Code for "Adaptive Gradient Quantization for Data-Parallel SGD", published in NeurIPS 2020.

    Jupyter Notebook 30 5

  5. triton-inference-server/model_analyzer triton-inference-server/model_analyzer Public

    Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.

    Python 499 80