Skip to content

Conversation

@JoeZijunZhou
Copy link
Contributor

  • Number of RPC handlers worker threads should be at least equal to the decoding batch size to fully saturate the decoding queue.
  • Default threads to the total number of concurrent allowed decodes, to make sure we can fully saturate the model.
  • Set default minimum to 64.
  • Add error handling when queue is out of capacity.
@JoeZijunZhou JoeZijunZhou merged commit ccdb782 into main Mar 5, 2024
@JoeZijunZhou JoeZijunZhou deleted the zijun/optimize-thread branch March 5, 2024 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant