GitHub · Where software is built

Labels Milestones New issue

CUDA SETUP ERROR: Missing dependency: libnvJitLink.so.13 - Google Colab

#1905

· tanvircr7 opened

on Mar 25, 2026

Params4bit.getattr breaks torch.compile - use @property instead

#1904

· kbabiuchx opened

on Mar 23, 2026

Question: intentional FP16-only path for int8_vectorwise_quant / LLM.int8 activation quant? (BF16 support + removing casts)

#1868

· sanghyunna opened

on Feb 16, 2026

Default LLM.int8() mixed-precision decomposition causes 17-147% energy overhead across consumer and datacenter GPUs

#1867

· hongping-zh opened

on Feb 15, 2026

gemv_4bit silently produces wrong results when weight is quantized in (in_features, out_features) layout

#1862

· TimDettmers opened

on Feb 14, 2026

[Feature Gap] CUDA compared to other backends like XPU/CPU

#1852

· jiqing-feng opened

on Jan 30, 2026

[Performance/Energy] 4-bit NF4 shows significant energy efficiency penalty on Blackwell (RTX 5090) for small models

#1851

· hongping-zh opened

on Jan 29, 2026

Failed to quant MoE models with fused expert weights in transformers v5

Hugging Face Integration

#1849

· ITcarrot opened

on Jan 25, 2026

# 70B 4-bit LLM decode bottlenecked by HIP kernel (`kgemm_4bit_inference_naive`) efficiency — 49% vs 91% memory bandwidth on ROCm/gfx1151

#1842

· BellaDoggie opened

on Jan 19, 2026

Support quantizing tensors when numel() > INT_MAX

#1785

· matthewdouglas opened

on Oct 22, 2025

·

Reduce CUDA build matrix

#1778

· matthewdouglas opened

on Oct 3, 2025

·

Can't get llm_int8_skip_modules to work: 'Parameter' object has no attribute 'SCB'

#1634

· redbrain opened

on May 12, 2025