Improve fmha_bwd tests performance #2376
Draft
+29
−33
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed changes
tile_example_fmha_bwd
takes way more time thantile_example_fmha_fwd
for the same parameters even considering that bwd does more work.This make it practical to run it only only very small seqlens.
The main bottleneck is a computation of
ds_hp_host_ref
. First I optimized its inner loop by avoiding allocation and copying of indices (std::vector). Then I optimized its outer loop by using ParallelTensorFunctor instead of ForEach.After that, the rest bottlenecks are copies and conversion of several large tensors
{nhead, real_seqlen_q, real_seqlen_k}
that are implemented with ForEach, I replaced them withCopyAsType
.Before:
After:
I.e. from 7 min to 7 sec. Now bwd's runtime is comparable to fwd.
The
Run CK_TILE_FMHA Tests
CI job takes about 5 hours. Let's see if this change decreases its duration as well...Checklist
Please put an
x
into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-format
on all changed filesDiscussion
If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered