Skip to content

Pull requests: huggingface/text-generation-inference

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Set uv UV_PYTHON_INSTALL_DIR explicitly
#3197 opened Apr 27, 2025 by sebastianliebscher Loading… updated Nov 27, 2025
1 of 5 tasks
2
2
feat: prefer latest outlines core and compile grammar in router
#3340 opened Nov 14, 2025 by drbh Draft updated Nov 19, 2025
fix: bump flake and update grammar logit processor
#3338 opened Nov 6, 2025 by drbh Draft updated Nov 6, 2025
Trtllm backend improvements
#3231 opened May 17, 2025 by leejuyuu Loading… updated Oct 12, 2025
1 of 5 tasks
feat: expose GPU energy consumption (mJ) in responses
#3315 opened Aug 28, 2025 by JulienDelavande Loading… updated Aug 28, 2025
2 of 5 tasks
feat: support logit bias in chat request
#3186 opened Apr 22, 2025 by drbh Loading… updated Aug 14, 2025
**Add dedicated CPU-only Dockerfile and update documentation for CPU/…
#3310 opened Aug 7, 2025 by jakubgajski Loading… updated Aug 7, 2025
2 of 5 tasks
support qwen3 on nvidia
#3302 opened Jul 23, 2025 by icyxp Loading… updated Jul 29, 2025
Update links Inferentia refer docs
#3154 opened Apr 9, 2025 by guspan-tanadi Loading… updated Jul 21, 2025
1 of 5 tasks
Retrieve the correct cached model batch size in Neuron config checker for Neuron Backend
#3300 opened Jul 19, 2025 by jimburtoft Loading… updated Jul 19, 2025
3 tasks
Update quantization kernels
#3288 opened Jul 7, 2025 by danieldk Draft updated Jul 18, 2025
5 tasks
Attempt to fix CI errors
#3292 opened Jul 8, 2025 by danieldk Loading… updated Jul 8, 2025
5 tasks
feat: allow json_schema in response format and add test
#3276 opened Jun 25, 2025 by drbh Loading… updated Jul 7, 2025
fix: enable defs references in tool calls
#3291 opened Jul 7, 2025 by drbh Loading… updated Jul 7, 2025
wip: comment out prepend full_text
#3079 opened Mar 7, 2025 by jrc2139 Draft updated Jun 18, 2025
1 of 5 tasks
Disable mamba in CPU platform
#3266 opened Jun 13, 2025 by casassg Loading… updated Jun 13, 2025
3 of 5 tasks
feat: improve llava next pooling for granite vision
#3255 opened Jun 4, 2025 by drbh Loading… updated Jun 6, 2025
display available cached versions in TGI server error message of Neuron backend
#3063 opened Feb 26, 2025 by jimburtoft Loading… updated May 21, 2025
4 tasks
Fix typos
#3210 opened May 6, 2025 by omahs Loading… updated May 6, 2025
1 of 5 tasks
Kvrouter that will increase the kv-cache hits in case of multiple routing strategy
#2965 opened Jan 29, 2025 by Narsil Loading… updated Apr 30, 2025
5 tasks
feat: lock updated kernel versions
#3201 opened Apr 29, 2025 by drbh Loading… updated Apr 29, 2025
README: minimum Python version is 3.10
#3194 opened Apr 25, 2025 by Frenzie Loading… updated Apr 25, 2025
1 of 5 tasks
Fix flashinfer plan call to use positional arguments for #3165
#3166 opened Apr 11, 2025 by ruckc Loading… updated Apr 17, 2025
2 of 5 tasks
Add chunked attn for L4
#3162 opened Apr 10, 2025 by mht-sharma Draft updated Apr 11, 2025
2 of 7 tasks
ProTip! no:milestone will show everything without a milestone.