Always pass add_generation_prompt=True to apply_chat_template #1416
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Always pass
add_generation_prompt=True
toapply_chat_template
.This PR passes
add_generation_prompt=True
toapply_chat_template
, regardless of whethertools
are provided or not, specifically to:Note that before this PR, we always passed
add_generation_prompt=True
regardless oftools
to:The goal is to serve as a basis to continue the discussion on how we want to handle prompt formatting. See previous discussion: #993 (comment)
Based on my own (non-expert) understanding of the transformers documentation, setting
add_generation_prompt=True
appends a generation prompt, which prepares the input so the model knows it should generate a response from the assistant. This seems to apply broadly in conversational contexts, which is the case when working with agents, regardless of whether tool calls are used. I would say:add_generation_prompt=True
when:add_generation_prompt=True
does not requiretools=...
add_generation_prompt=True
is requiredIn a discussion with @molbap, he mentioned that in conversational contexts, regardless of whether tools are passed to
model.generate
, it is generally recommended to include the generation prompt. This is because the model's output is conditioned on special tokens (like<|assistant|>
,<|bos|>
, or<|eos|>
defined in its config). Omitting them can often lead to suboptimal generation results.Fix #993.