-
Notifications
You must be signed in to change notification settings - Fork 13.7k
[AMDGPU] Fix machine verification failure from INIT_EXEC lowering #98333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fix machine verification failure from INIT_EXEC lowering since it was moved from SILowerControlFlow to SIWholeQuadMode in llvm#94452.
@llvm/pr-subscribers-backend-amdgpu Author: Jay Foad (jayfoad) ChangesFix machine verification failure from INIT_EXEC lowering since it was Full diff: https://github.com/llvm/llvm-project/pull/98333.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp b/llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
index b39f3ba970600..08460070d5f1a 100644
--- a/llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
+++ b/llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
@@ -1676,6 +1676,8 @@ bool SIWholeQuadMode::runOnMachineFunction(MachineFunction &MF) {
if (!(GlobalFlags & (StateWQM | StateStrict)) && LowerToCopyInstrs.empty() &&
LowerToMovInstrs.empty() && KillInstrs.empty()) {
lowerLiveMaskQueries();
+ if (!InitExecInstrs.empty())
+ LIS->removeAllRegUnitsForPhysReg(AMDGPU::EXEC);
return !InitExecInstrs.empty() || !LiveMaskQueries.empty();
}
diff --git a/llvm/test/CodeGen/AMDGPU/wqm.ll b/llvm/test/CodeGen/AMDGPU/wqm.ll
index c621904ff727b..40d95e47f2410 100644
--- a/llvm/test/CodeGen/AMDGPU/wqm.ll
+++ b/llvm/test/CodeGen/AMDGPU/wqm.ll
@@ -3463,6 +3463,38 @@ bb:
ret void
}
+; Test a case that failed machine verification.
+define amdgpu_gs void @wqm_init_exec_switch(i32 %arg) {
+; GFX9-W64-LABEL: wqm_init_exec_switch:
+; GFX9-W64: ; %bb.0:
+; GFX9-W64-NEXT: s_mov_b64 exec, 0
+; GFX9-W64-NEXT: v_cmp_lt_i32_e32 vcc, 0, v0
+; GFX9-W64-NEXT: s_and_saveexec_b64 s[0:1], vcc
+; GFX9-W64-NEXT: s_xor_b64 s[0:1], exec, s[0:1]
+; GFX9-W64-NEXT: s_andn2_saveexec_b64 s[0:1], s[0:1]
+; GFX9-W64-NEXT: s_endpgm
+;
+; GFX10-W32-LABEL: wqm_init_exec_switch:
+; GFX10-W32: ; %bb.0:
+; GFX10-W32-NEXT: s_mov_b32 exec_lo, 0
+; GFX10-W32-NEXT: s_mov_b32 s0, exec_lo
+; GFX10-W32-NEXT: v_cmpx_lt_i32_e32 0, v0
+; GFX10-W32-NEXT: s_xor_b32 s0, exec_lo, s0
+; GFX10-W32-NEXT: s_andn2_saveexec_b32 s0, s0
+; GFX10-W32-NEXT: s_endpgm
+ call void @llvm.amdgcn.init.exec(i64 0)
+ switch i32 %arg, label %bb1 [
+ i32 0, label %bb3
+ i32 1, label %bb2
+ ]
+bb1:
+ ret void
+bb2:
+ ret void
+bb3:
+ ret void
+}
+
declare void @llvm.amdgcn.exp.f32(i32, i32, float, float, float, float, i1, i1) #1
declare void @llvm.amdgcn.image.store.1d.v4f32.i32(<4 x float>, i32, i32, <8 x i32>, i32, i32) #1
|
@@ -1676,6 +1676,8 @@ bool SIWholeQuadMode::runOnMachineFunction(MachineFunction &MF) { | |||
if (!(GlobalFlags & (StateWQM | StateStrict)) && LowerToCopyInstrs.empty() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am now even more convinced that we should not bother with this fast path. We should just make sure that the general path is fast when there is no work to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll run compile time testing on a version without it -- it's possible there really is little difference already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing I can see that we might be able to skip over in some cases is the two for (auto BII : Blocks)
loops below.
@@ -1717,7 +1719,7 @@ bool SIWholeQuadMode::runOnMachineFunction(MachineFunction &MF) { | |||
LIS->removeAllRegUnitsForPhysReg(AMDGPU::SCC); | |||
|
|||
// If we performed any kills then recompute EXEC | |||
if (!KillInstrs.empty()) | |||
if (!KillInstrs.empty() || !InitExecInstrs.empty()) | |||
LIS->removeAllRegUnitsForPhysReg(AMDGPU::EXEC); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be necessary in the first place. Reserved registers, like exec, do not have liveness tracked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I see stuff like this in the dumps:
# After SI Whole Quad Mode
********** INTERVALS **********
EXEC_LO_LO16 [16r,16d:0) 0@16r
EXEC_LO_HI16 [16r,16d:0) 0@16r
EXEC_HI_LO16 [16r,16d:0) 0@16r
EXEC_HI_HI16 [16r,16d:0) 0@16r
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those shouldn't have been created in the first place. Where did they come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this test case they are formed in Register Coalescer when it attempts to merge a copy of EXEC with a virtual and asks for an interval on a EXEC reg unit.
Reserved registers still have intervals, they are a sequence of a dead defs, one for each assignment to the register.
You can see this in LiveIntervals::computeRegUnitRange
.
I do not know the use for this, but assume it is for interference calculations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…vm#98333) Fix machine verification failure from INIT_EXEC lowering since it was moved from SILowerControlFlow to SIWholeQuadMode in llvm#94452.
Fix machine verification failure from INIT_EXEC lowering since it was
moved from SILowerControlFlow to SIWholeQuadMode in #94452.