Why do not you calculate the real output token for throughput? #493

wqh17101 · 2023-07-12T01:51:23Z

wqh17101
Jul 12, 2023

Lines 177 to 180 in c894836

    
           total_num_tokens = sum( 
        
               prompt_len + output_len 
        
               for _, prompt_len, output_len in requests 
        
           )

You can see that you just use the tokens' length of request in the dataset , not the real output length for the model to eval.

Answered by zhuohan123

Jul 18, 2023

I believe the main reason is that prompt tokens also take compute. There are different ways to measure throughput for LLMs, but I believe the trends between systems should be similar under different metrics. Move this issue to discussions for future questions.

View full answer

Godricly · 2023-07-13T11:42:57Z

Godricly
Jul 13, 2023

The token returned also contains input prompt. I don't think computing the output will solve you problem.

vllm/vllm/entrypoints/api_server.py

Lines 64 to 67 in c6dfc3c

    
           prompt = final_output.prompt 
        
           text_outputs = [prompt + output.text for output in final_output.outputs] 
        
           ret = {"text": text_outputs} 
        
           return JSONResponse(ret)

0 replies

zhuohan123 · 2023-07-18T06:48:56Z

zhuohan123
Jul 18, 2023
Maintainer

I believe the main reason is that prompt tokens also take compute. There are different ways to measure throughput for LLMs, but I believe the trends between systems should be similar under different metrics. Move this issue to discussions for future questions.

1 reply

zhaoyang-star Jul 24, 2023

Yes. It is an open discussion whether or not take prompt tokens into account. As far as I know, the latency metric is not included the length of prompt tokens.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why do not you calculate the real output token for throughput? #493

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Why do not you calculate the real output token for throughput? #493

wqh17101 Jul 12, 2023

Replies: 2 comments · 1 reply

Godricly Jul 13, 2023

zhuohan123 Jul 18, 2023 Maintainer

zhaoyang-star Jul 24, 2023

wqh17101
Jul 12, 2023

Replies: 2 comments 1 reply

Godricly
Jul 13, 2023

zhuohan123
Jul 18, 2023
Maintainer