Skip to content

Conversation

@veralauee
Copy link

Convert quantization aware trained model from TF to ONNX has several issues --

  1. QuantizeLinear and DequantizeLinear are fused into conv layer, but the downstream compiler(e.g., TensorRT) needs the Q/DQ layers to determine whether to use int8 or not. See issue QDQ node for weight tensor of Con2D undergoes Constant folding (enabled for node using tf type=FakeQuantWithMinMaxVarsPerChannel) #1972 . We need to keep Q/DQ layer unfused. QuantizeLinear and DequantizeLinear are corresponding to FakeQuantWithMinMaxVars in TensorFlow, so excluding it from can_fold in tf_utils.py can solve it.
  2. Need to allow narrow_range in quantized nodes. TensorRT maps [min, max] to [-127, 127](see Page 12) , which needs 0 in fp32 to be mapped to 0 in int8. Also see narrow_range=True in TensorRT/tools/tensorflow-quantization here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants