fix(router): use max_completion_tokens for OpenAI GPT-5+ validation by cluster2600 · Pull Request #575 · NVIDIA/OpenShell

cluster2600 · 2026-03-24T20:57:17Z

Summary

Resolves #517 — openshell inference set fails for OpenAI GPT-5 models because the validation probe sends the deprecated max_tokens parameter, which GPT-5+ rejects with HTTP 400.

Send max_completion_tokens as the primary parameter in the OpenAI chat completions validation probe
Automatically fall back to max_tokens when the backend returns HTTP 400 (for legacy or self-hosted backends)
Extract try_validation_request() helper to avoid duplicating the request/response classification logic

Root Cause

OpenAI introduced max_completion_tokens as a replacement for max_tokens starting with the o1 series. GPT-5 and later models reject max_tokens entirely, returning HTTP 400. The validation probe only sent max_tokens, so inference setup would fail for any GPT-5+ model even though the endpoint was perfectly healthy.

graph TD subgraph "Before (broken)" A["validation_probe()"] -->|"max_tokens: 32"| B[OpenAI API] B -->|"HTTP 400: unsupported parameter"| C["ValidationFailure ❌"] end subgraph "After (fixed)" D["validation_probe()"] -->|"max_completion_tokens: 32"| E[OpenAI API] E -->|"HTTP 200"| F["ValidatedEndpoint ✅"] E -->|"HTTP 400"| G{fallback_body?} G -->|"yes"| H["retry with max_tokens: 32"] H -->|"HTTP 200"| I["ValidatedEndpoint ✅"] G -->|"no"| J["ValidationFailure ❌"] end

Changes

File	Change
`crates/openshell-router/src/backend.rs`	Add `fallback_body` field to `ValidationProbe`; update `openai_chat_completions` probe to use `max_completion_tokens` with `max_tokens` fallback; extract `try_validation_request()` helper; add 3 new tests
`crates/openshell-server/src/inference.rs`	Update existing test expectation from `max_tokens` to `max_completion_tokens`

Test Plan

cargo test -p openshell-router — 11 passed, 0 failed
New test: verify_openai_chat_uses_max_completion_tokens — primary probe succeeds with max_completion_tokens
New test: verify_openai_chat_falls_back_to_max_tokens — HTTP 400 on primary triggers retry with max_tokens
New test: verify_non_chat_completions_no_fallback — non-chat protocols (e.g. anthropic_messages) do not retry on 400

sequenceDiagram participant CLI as openshell inference set participant Router as Privacy Router participant Backend as OpenAI API CLI->>Router: verify_backend_endpoint() Router->>Backend: POST /v1/chat/completions<br/>{"max_completion_tokens": 32} alt GPT-5+ model Backend->>Router: HTTP 200 Router->>CLI: ValidatedEndpoint ✅ else Legacy backend Backend->>Router: HTTP 400 (unknown param) Router->>Backend: POST /v1/chat/completions<br/>{"max_tokens": 32} Backend->>Router: HTTP 200 Router->>CLI: ValidatedEndpoint ✅ end

github-actions · 2026-03-24T21:03:51Z

All contributors have signed the DCO ✍️ ✅
_{Posted by the DCO Assistant Lite bot.}

github-actions · 2026-03-24T21:03:51Z

Thank you for your interest in contributing to OpenShell, @cluster2600.

This project uses a vouch system for first-time contributors. Before submitting a pull request, you need to be vouched by a maintainer.

To get vouched:

Open a Vouch Request discussion.
Describe what you want to change and why.
Write in your own words — do not have an AI generate the request.
A maintainer will comment /vouch if approved.
Once vouched, open a new PR (preferred) or reopen this one after a few minutes.

See CONTRIBUTING.md for details.

…robe OpenAI GPT-5 models reject the legacy max_tokens parameter and require max_completion_tokens. The inference validation probe now sends max_completion_tokens as the primary parameter, with an automatic fallback to max_tokens when the backend returns HTTP 400 (for legacy/self-hosted backends that only support the older parameter). Closes NVIDIA#517 Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>

cluster2600 requested a review from a team as a code owner March 24, 2026 20:57

github-actions bot closed this Mar 24, 2026

pimlock reopened this Mar 25, 2026

cluster2600 force-pushed the fix/517-max-completion-tokens branch from 3c89e9b to 44217f7 Compare March 25, 2026 18:19

style(router): fix cargo fmt import order and line length

d4f81c4

johntmyers approved these changes Mar 25, 2026

View reviewed changes

johntmyers merged commit 0e5ebb6 into NVIDIA:main Mar 25, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(router): use max_completion_tokens for OpenAI GPT-5+ validation#575

fix(router): use max_completion_tokens for OpenAI GPT-5+ validation#575
johntmyers merged 2 commits intoNVIDIA:mainfrom
cluster2600:fix/517-max-completion-tokens

cluster2600 commented Mar 24, 2026

github-actions bot commented Mar 24, 2026 •

edited

Loading

github-actions bot commented Mar 24, 2026

Uh oh!

Labels

3 participants

Conversation

cluster2600 commented Mar 24, 2026

Summary

Root Cause

Changes

Test Plan

github-actions bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

Labels

3 participants

github-actions bot commented Mar 24, 2026 •

edited

Loading