add vllm support for token ids as input #3280

wybryan · 2025-04-11T15:12:18Z

What does this PR do?

This PR adds support for vLLM server & client to accept token ids as input.

The vLLM engine does support token ids as input as opposed to text under the hood, this PR makes the feature available to end users. The rationale behind is that in certain training user cases user wants precise control of what exact input tokens are used for vLLM, therefore adding this capability to let user have full control of input tokens before sending to vLLM.

Who can review?

Anyone in the community is free to review the PR once the tests have passed.

wybryan · 2025-04-12T08:33:07Z

Hi @qgallouedec, is it possible for you to review this PR please?

qgallouedec · 2025-05-19T23:57:47Z

Hi, so sorry for the late review. We are trying to mimic as much as possible de vLLM server. Do you know if it supports it? If so, how?

wybryan · 2025-05-24T07:55:35Z

The vLLM engine natively supports input token ids instead of input strings. My PR just expose this feature from the wrapper in TRL. > Hi, so sorry for the late review. We are trying to mimic as much as possible de vLLM server. Do you know if it supports it? If so, how?

wybryan · 2025-05-24T08:02:55Z

The rationale is that sometimes we want the training code take care of tokenization, i.e, we may manipulate directly at token ids and we want to the vLLM rollout generation takes the same manipulated token ids directly as opposed to original input string and do standard tokenization inside vLLM which will cause inconsistency between training and rollout generation.

That's what this PR is about, to feed raw token ids directly to vLLM (the vLLM engine supports this already but not accessible without this PR).

add vllm support for token ids as input

d3643fc

wybryan marked this pull request as draft April 12, 2025 04:37

wybryan marked this pull request as ready for review April 12, 2025 04:37

Merge branch 'main' into main

f980c99

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add vllm support for token ids as input #3280

add vllm support for token ids as input #3280

wybryan commented Apr 11, 2025

Uh oh!

wybryan commented Apr 12, 2025

Uh oh!

qgallouedec commented May 19, 2025

Uh oh!

wybryan commented May 24, 2025

Uh oh!

wybryan commented May 24, 2025

Uh oh!

Uh oh!

add vllm support for token ids as input #3280

Are you sure you want to change the base?

add vllm support for token ids as input #3280

Conversation

wybryan commented Apr 11, 2025

What does this PR do?

Who can review?

Uh oh!

wybryan commented Apr 12, 2025

Uh oh!

qgallouedec commented May 19, 2025

Uh oh!

wybryan commented May 24, 2025

Uh oh!

wybryan commented May 24, 2025

Uh oh!

Uh oh!