Reproducing results from paper

Hi,

I would like to reproduce your results from the paper "Integer-only Zero-shot Quantization for Efficient Speech Recognition" for int8 (or even int4 if possible) QuartzNet 15x5 on an A10 and A100 Nvidia GPU with additional measurements for the throughput.

I was trying to use the Q-ASR repo for that but I cannot find the TensorRT export, is that published somewhere else? If I understand the code in the repo correctly, then the execution in inference.py does not make use of the tensor cores of the GPU. Am I overlooking something here?

Kind regards

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing results from paper #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Reproducing results from paper #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions