Skip to content

Conversation

@JoeZijunZhou
Copy link
Contributor

@JoeZijunZhou JoeZijunZhou commented Mar 10, 2024

  • The decode thread sends complete signal after it completes all token generation, but before the grpc channel consumes all the tokens in the return_channel queue of the ActiveRequest. It means that the grpc server channel exit the response streaming before it streams back all the generated tokens (some tokens may be left in the return_channel queue of the ActiveRequest).
  • Adding a return_channel empty check resolves the issue.
  • Use generated_token_list directly in benchmark script; tokenizing the joint list caused minor diff of token number.
@JoeZijunZhou JoeZijunZhou requested a review from rwitten March 10, 2024 07:55
@FanhaiLu1 FanhaiLu1 self-requested a review March 11, 2024 02:48
@JoeZijunZhou JoeZijunZhou merged commit 2b9db52 into main Mar 14, 2024
@JoeZijunZhou JoeZijunZhou deleted the zijun/token-drop branch March 14, 2024 21:04
@JoeZijunZhou JoeZijunZhou restored the zijun/token-drop branch March 14, 2024 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants