Skip to content

Fix split_predict generating invalid ONNX models with missing elem_type#244

Draft
Copilot wants to merge 8 commits intomainfrom
copilot/fix-split-predict-bug
Draft

Fix split_predict generating invalid ONNX models with missing elem_type#244
Copilot wants to merge 8 commits intomainfrom
copilot/fix-split-predict-bug

Conversation

Copy link
Contributor

Copilot AI commented Dec 31, 2025

The split_predict pass generates invalid ONNX models when intermediate values lack explicit value_info. Graph inputs end up with UNDEFINED elem_type (value 0), causing validation failures in onnx.checker and ONNX Runtime load errors.

Changes

Core fix in onnxoptimizer/passes/split.h:

  • Added inferElemType() helper that infers type from producing node's inputs when elem_type is UNDEFINED
  • Modified split_predict to use type inference instead of blindly copying UNDEFINED elem_type via copyMetadata()
  • Modified split_init to ensure output values have valid elem_type before registration

Type inference heuristic:

// For operators where output type matches input type (Add, Sub, Mul, etc.) // Check producing node's inputs for a known elem_type for (const Value* input : producer->inputs()) { if (input->elemType() != TensorProto_DataType_UNDEFINED) { return input->elemType(); } }

Tests:

  • Added Python test: test_split_predict_preserves_elem_type()
  • Added C++ test: SplitPredictPreservesElemType

Both tests verify optimized models pass ONNX validation and have valid elem_type for all inputs.

Limitations

Type inference works for common operators where output type matches input type. Operators with different output types (Shape, Cast) require proper value_info in the original model.

Original prompt

This section details on the original issue you should resolve

<issue_title>[BUG] The pass "split_predict" generates an invalid optimized ONNX model</issue_title>
<issue_description>split_predict pass generates an invalid ONNX model (missing tensor elem_type) — fails onnx.checker and cannot be loaded by ONNX Runtime

Describe the bug
Hi ONNX Optimizer maintainers, thanks for the project!

When optimizing a valid ONNX model named model.onnx using only the split_predict pass, the resulting model.opt.onnx becomes invalid ONNX. It fails onnx.checker.check_model() with:

onnx.onnx_cpp2py_export.checker.ValidationError: Field 'elem_type' of 'type' is required but missing.

and ONNX Runtime fails to load it:

onnxruntime.capi.onnxruntime_pybind11_state.Fail: ... failed: Invalid tensor data type 0.

This suggests the pass may be producing incomplete/incorrect type information (elem_type) for some tensor(s) in the optimized graph.

Environment

OS: Ubuntu 20.04 LTS Python: 3.9.6 onnx: 1.19.0 onnxruntime: 1.19.2 onnxoptimizer: 0.3.13 

To Reproduce

  1. Download and unzip the attached archive, then cd into the extracted directory

split_predict_repro.tar.gz

tar -xzvf split_predict_repro.tar.gz cd split_predict_repro 
  1. Create a Python environment (Python 3.9.6) and install dependencies:
python3.9 -m venv .venv source .venv/bin/activate pip install -U pip pip install -r requirements.txt 
  1. Run the split_predict pass on the case model and save model.opt.onnx:

python optimize_model.py --case ./case_00051_seed20654888

This script loads model.onnx and runs optimizer.optimize(model, ["split_predict"]), then saves model.opt.onnx.

  1. Validate the optimized model with ONNX checker:

python ./check_model.py

check_model.py loads ./case_00051_seed20654888/model.opt.onnx and runs checker.check_model(m).

  1. Try to run the optimized model with ONNX Runtime using stored oracle inputs:

python ./run_optimized_model.py

This script creates an InferenceSession from ./case_00051_seed20654888/model.opt.onnx and feeds inputs from oracle.pkl.

Expected behavior

  • model.opt.onnx should remain a valid ONNX model after split_predict optimization.
  • onnx.checker.check_model() should pass.
  • ONNX Runtime should be able to create an InferenceSession for model.opt.onnx.

Actual behavior

ONNX checker fails:

onnx.onnx_cpp2py_export.checker.ValidationError: Field 'elem_type' of 'type' is required but missing.

ONNX Runtime fails to load:

onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model ... failed: Invalid tensor data type 0.

Attachments

I’m attaching:

  • requirements.txt (dependency versions)
  • optimize_model.py (runs split_predict and saves model.opt.onnx)
  • check_model.py (runs ONNX checker on model.opt.onnx)
  • run_optimized_model.py (loads model.opt.onnx via ORT and runs with oracle.pkl)
  • the minimal failing folder ./case_00051_seed20654888 containing model.onnx, oracle.pkl, and the generated model.opt.onnx.

Thanks for your help!</issue_description>

Comments on the Issue (you are @copilot in this section)

@andife I wonder could you try the newest release v0.3.19, on Mar 3, 2024 or even the current github version?

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 7 commits December 31, 2025 21:09
- Added inferElemType() helper function to infer missing elem_type from inputs - Modified split_predict to infer elem_type when creating new graph inputs/outputs - This fixes the bug where split_predict generates invalid ONNX models with missing elem_type Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
- Add null check for producer node - Improve documentation to explain limitations and supported operators - Clarify that this is a heuristic for common cases Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
- Clarify what happens when elem_type inference fails - Use TensorProto.UNDEFINED constant in Python test instead of magic number - Add comment explaining UNDEFINED value in C++ test Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
- Use named constant throughout split.h instead of magic number 0 - Update C++ test to use ONNX_NAMESPACE::TensorProto_DataType_UNDEFINED - Improves code readability and maintainability Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
- Add null pointer check for Value parameter to prevent segfaults - Clarify comment in Python test about TensorProto.UNDEFINED Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix invalid optimized ONNX model generation by split_predict pass Fix split_predict generating invalid ONNX models with missing elem_type Dec 31, 2025
Copilot AI requested a review from justinchuby December 31, 2025 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants