Skip to content

fix(router): use max_completion_tokens for OpenAI GPT-5+ validation#575

Merged
johntmyers merged 2 commits intoNVIDIA:mainfrom
cluster2600:fix/517-max-completion-tokens
Mar 25, 2026
Merged

fix(router): use max_completion_tokens for OpenAI GPT-5+ validation#575
johntmyers merged 2 commits intoNVIDIA:mainfrom
cluster2600:fix/517-max-completion-tokens

Conversation

@cluster2600
Copy link
Contributor

Summary

Resolves #517openshell inference set fails for OpenAI GPT-5 models because the validation probe sends the deprecated max_tokens parameter, which GPT-5+ rejects with HTTP 400.

  • Send max_completion_tokens as the primary parameter in the OpenAI chat completions validation probe
  • Automatically fall back to max_tokens when the backend returns HTTP 400 (for legacy or self-hosted backends)
  • Extract try_validation_request() helper to avoid duplicating the request/response classification logic

Root Cause

OpenAI introduced max_completion_tokens as a replacement for max_tokens starting with the o1 series. GPT-5 and later models reject max_tokens entirely, returning HTTP 400. The validation probe only sent max_tokens, so inference setup would fail for any GPT-5+ model even though the endpoint was perfectly healthy.

graph TD subgraph "Before (broken)" A["validation_probe()"] -->|"max_tokens: 32"| B[OpenAI API] B -->|"HTTP 400: unsupported parameter"| C["ValidationFailure ❌"] end subgraph "After (fixed)" D["validation_probe()"] -->|"max_completion_tokens: 32"| E[OpenAI API] E -->|"HTTP 200"| F["ValidatedEndpoint ✅"] E -->|"HTTP 400"| G{fallback_body?} G -->|"yes"| H["retry with max_tokens: 32"] H -->|"HTTP 200"| I["ValidatedEndpoint ✅"] G -->|"no"| J["ValidationFailure ❌"] end 
Loading

Changes

File Change
crates/openshell-router/src/backend.rs Add fallback_body field to ValidationProbe; update openai_chat_completions probe to use max_completion_tokens with max_tokens fallback; extract try_validation_request() helper; add 3 new tests
crates/openshell-server/src/inference.rs Update existing test expectation from max_tokens to max_completion_tokens

Test Plan

  • cargo test -p openshell-router — 11 passed, 0 failed
  • New test: verify_openai_chat_uses_max_completion_tokens — primary probe succeeds with max_completion_tokens
  • New test: verify_openai_chat_falls_back_to_max_tokens — HTTP 400 on primary triggers retry with max_tokens
  • New test: verify_non_chat_completions_no_fallback — non-chat protocols (e.g. anthropic_messages) do not retry on 400
sequenceDiagram participant CLI as openshell inference set participant Router as Privacy Router participant Backend as OpenAI API CLI->>Router: verify_backend_endpoint() Router->>Backend: POST /v1/chat/completions<br/>{"max_completion_tokens": 32} alt GPT-5+ model Backend->>Router: HTTP 200 Router->>CLI: ValidatedEndpoint ✅ else Legacy backend Backend->>Router: HTTP 400 (unknown param) Router->>Backend: POST /v1/chat/completions<br/>{"max_tokens": 32} Backend->>Router: HTTP 200 Router->>CLI: ValidatedEndpoint ✅ end 
Loading
@cluster2600 cluster2600 requested a review from a team as a code owner March 24, 2026 20:57
@github-actions
Copy link

github-actions bot commented Mar 24, 2026

All contributors have signed the DCO ✍️ ✅
Posted by the DCO Assistant Lite bot.

@github-actions
Copy link

Thank you for your interest in contributing to OpenShell, @cluster2600.

This project uses a vouch system for first-time contributors. Before submitting a pull request, you need to be vouched by a maintainer.

To get vouched:

  1. Open a Vouch Request discussion.
  2. Describe what you want to change and why.
  3. Write in your own words — do not have an AI generate the request.
  4. A maintainer will comment /vouch if approved.
  5. Once vouched, open a new PR (preferred) or reopen this one after a few minutes.

See CONTRIBUTING.md for details.

@github-actions github-actions bot closed this Mar 24, 2026
@pimlock pimlock reopened this Mar 25, 2026
…robe OpenAI GPT-5 models reject the legacy max_tokens parameter and require max_completion_tokens. The inference validation probe now sends max_completion_tokens as the primary parameter, with an automatic fallback to max_tokens when the backend returns HTTP 400 (for legacy/self-hosted backends that only support the older parameter). Closes NVIDIA#517 Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
@cluster2600 cluster2600 force-pushed the fix/517-max-completion-tokens branch from 3c89e9b to 44217f7 Compare March 25, 2026 18:19
@johntmyers johntmyers merged commit 0e5ebb6 into NVIDIA:main Mar 25, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants