Tutorial: measuring time to first token (TTFT) and time between tokens (TBT) #14115
immaixq
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This is a demonstration on how to estimate the time to first token (TTFT) and the time between tokens (TBT) using
llama-bench
.Setting Up
The
llama-bench
tool is built by default when you compile the llama.cpp project.llama-bench
can perform three types of tests using the following arguments:-p
: prompt processing-n
: token generation-pg
: prompt processing and token generationEach test can be repeated a desired number of times using the
-r
flag. The result will be the average of all tests performed.What happens before a token is chosen as a response from LLM
llama-bench
): sampling algorithm to pick the next most likely token based on probabilities derived from the scoresTime to First Token (TTFT)
To get the time to first token, we can use the
-p
test, which covers the time taken to process the prompt and the logic computation to get the potential first token.The
avg_ns
returned from the execution can indicate the TTFT. This latency represents the average time taken (ns) to process the n prompt tokens and prepare the logits for the first potential output token excluding sampling time.To get the TTFT, we convert this
avg_ns
:163327916 / 1000000
to 163.33 ms.Therefore, for this specific test with a 100-token prompt, the TTFT, that is the time until the model has prepared the logits for the 101st token and is ready for sampling, is approximately 163.33 ms.
Time Between Tokens (TBT)
The time between token (TBT) measures the average time it takes to generate each subsequent token after the first token.
To measure the time between token, we can use
-n
test, which is the time taken to generatex
tokens and take the reciprocal ofavg_ts
or takeavg_ns/n_gen
to get the average time to generate a single token.For example,
-n 10
performs a test that generates 10 tokens,Using
avg_ts
, TBT =(1 / 25.660775 token/second) * 1000 = 38.9 ms/token
.Using
avg_ns
, TBT =(38,986,809.1 ns/ 10 token) / 1000000 = 38.9 ms/token
.Beta Was this translation helpful? Give feedback.
All reactions