Skip to content

[fix] Add 1 and draft_token_num to seq_len when overlap scheduling is enabled during memory estimation #5343

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

HuiGao-NV
Copy link
Collaborator

When estimate memory consumption, we need to a tmp kv cache. We need to leave one more token space to finish forward action with overlap enabled.

@HuiGao-NV HuiGao-NV requested review from a team as code owners June 19, 2025 00:17
@HuiGao-NV
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9423 [ run ] triggered by Bot

@QiJune QiJune requested a review from yweng0828 June 19, 2025 01:38
@tensorrt-cicd
Copy link
Collaborator

PR_Github #9423 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6916 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@HuiGao-NV
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9502 [ run ] triggered by Bot

@HuiGao-NV HuiGao-NV changed the title [fix] Add one to seq_len for overlap during memory estimation [fix] Add 1 and draft_token_num to seq_len when overlap scheduling is enabled during memory estimation Jun 19, 2025
@tensorrt-cicd
Copy link
Collaborator

PR_Github #9502 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6971 completed with status: 'FAILURE'

@HuiGao-NV HuiGao-NV force-pushed the extra_token_for_overlap branch from 6be9345 to d7cf3d9 Compare June 20, 2025 01:53
@HuiGao-NV
Copy link
Collaborator Author

Previous CI failed collected cases for RTX.
"[2025-06-19T11:50:47.135Z] ===================== 15797 deselected, 1 warning in 7.26s ====================="
Need to rerun.

@HuiGao-NV
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9542 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9542 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7002 completed with status: 'SUCCESS'

@HuiGao-NV HuiGao-NV requested a review from QiJune June 20, 2025 09:45
@HuiGao-NV HuiGao-NV enabled auto-merge (squash) June 20, 2025 09:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants