Skip to content

Reproducing results from paper #4

@auphelia

Description

@auphelia

Hi,

I would like to reproduce your results from the paper "Integer-only Zero-shot Quantization for Efficient Speech Recognition" for int8 (or even int4 if possible) QuartzNet 15x5 on an A10 and A100 Nvidia GPU with additional measurements for the throughput.

I was trying to use the Q-ASR repo for that but I cannot find the TensorRT export, is that published somewhere else? If I understand the code in the repo correctly, then the execution in inference.py does not make use of the tensor cores of the GPU. Am I overlooking something here?

Kind regards

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions