Fix output token drop issue #9

JoeZijunZhou · 2024-03-10T07:55:09Z

The decode thread sends complete signal after it completes all token generation, but before the grpc channel consumes all the tokens in the return_channel queue of the ActiveRequest. It means that the grpc server channel exit the response streaming before it streams back all the generated tokens (some tokens may be left in the return_channel queue of the ActiveRequest).
Adding a return_channel empty check resolves the issue.
Use generated_token_list directly in benchmark script; tokenizing the joint list caused minor diff of token number.

Fix output token drop issue

57b6414

JoeZijunZhou requested a review from rwitten March 10, 2024 07:55

Add comments and format

41fcaa6

FanhaiLu1 self-requested a review March 11, 2024 02:48

Fix benchmarks

d1f09c7

rwitten approved these changes Mar 13, 2024

View reviewed changes

JoeZijunZhou merged commit 2b9db52 into main Mar 14, 2024

JoeZijunZhou deleted the zijun/token-drop branch March 14, 2024 21:04

JoeZijunZhou restored the zijun/token-drop branch March 14, 2024 21:04

Provide feedback