Stack Exchange network consists of 183 Q&A communities including Stack Overflow, the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.
Thanks for you reply. I understand that the GPU is under utilized. Question is, can I increase the throughput in anyway? Because now it seems like there is a lower bound for processing for Gemma 3 on an A100 GPU and it's quite high.