Tutorial: measuring time to first token (TTFT) and time between tokens (TBT) #14115

immaixq · 2025-06-11T03:16:39Z

immaixq
Jun 11, 2025

This is a demonstration on how to estimate the time to first token (TTFT) and the time between tokens (TBT) using llama-bench.

Setting Up

The llama-bench tool is built by default when you compile the llama.cpp project.

llama-bench -h

# if llama-bench is built successfully, you should see the following..
usage: llama-bench [options]

options:
  -h, --help
  -m, --model <filename>                    (default: models/7B/ggml-model-q4_0.gguf)
  -p, --n-prompt <n>                        (default: 512)
  -n, --n-gen <n>                           (default: 128)
  -pg <pp,tg>                               (default: )
  -b, --batch-size <n>                      (default: 2048)
  -ub, --ubatch-size <n>                    (default: 512)
  -ctk, --cache-type-k <t>                  (default: f16)
  -ctv, --cache-type-v <t>                  (default: f16)
  -t, --threads <n>                         (default: 4)
  -C, --cpu-mask <hex,hex>                  (default: 0x0)
...

llama-bench can perform three types of tests using the following arguments:

-p: prompt processing
-n: token generation
-pg: prompt processing and token generation

Each test can be repeated a desired number of times using the -r flag. The result will be the average of all tests performed.

What happens before a token is chosen as a response from LLM

Prompt processing: this is when the LLM tries to understand the input prompt
Next token prediction: this is when the LLM predicts what the next token should be
- Logic computation: forward pass to calculate a list of scores for all token in model’s vocabulary
- Sampling ( not measured by llama-bench): sampling algorithm to pick the next most likely token based on probabilities derived from the scores

Time to First Token (TTFT)

To get the time to first token, we can use the -p test, which covers the time taken to process the prompt and the logic computation to get the potential first token.

llama-bench -m <model_path> -p <prompt_length> -o json

# output json
{
    "n_prompt": 100,
    "n_gen": 0,
    "test_time": "2025-06-05T09:31:09Z",
    "avg_ns": 163327916,
    "stddev_ns": 6844086,
    "avg_ts": 613.081377,
    "stddev_ts": 24.347449,
    "samples_ns": [ 175529000, 160741208, 159486875, 160000792, 160881709 ],
    "samples_ts": [ 569.706, 622.118, 627.011, 624.997, 621.575 ]
   }

The avg_ns returned from the execution can indicate the TTFT. This latency represents the average time taken (ns) to process the n prompt tokens and prepare the logits for the first potential output token excluding sampling time.

To get the TTFT, we convert this avg_ns: 163327916 / 1000000 to 163.33 ms.

Therefore, for this specific test with a 100-token prompt, the TTFT, that is the time until the model has prepared the logits for the 101st token and is ready for sampling, is approximately 163.33 ms.

Time Between Tokens (TBT)

The time between token (TBT) measures the average time it takes to generate each subsequent token after the first token.

To measure the time between token, we can use -n test, which is the time taken to generate x tokens and take the reciprocal of avg_ts or take avg_ns/n_gen to get the average time to generate a single token.

For example, -n 10 performs a test that generates 10 tokens,

llama-bench -m <model_path> -n <number of tokens to generate> -o json

# output json
{
    "n_gen": 10,
    "avg_ns": 389868091,
    "stddev_ns": 9010942,
    "avg_ts": 25.660775,
}

Using avg_ts, TBT = (1 / 25.660775 token/second) * 1000 = 38.9 ms/token.
Using avg_ns, TBT = (38,986,809.1 ns/ 10 token) / 1000000 = 38.9 ms/token.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tutorial: measuring time to first token (TTFT) and time between tokens (TBT) #14115

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Tutorial: measuring time to first token (TTFT) and time between tokens (TBT) #14115

Uh oh!

Uh oh!

immaixq Jun 11, 2025

Setting Up

Time to First Token (TTFT)

Time Between Tokens (TBT)

Replies: 0 comments

immaixq
Jun 11, 2025