Skip to content

[AMDGPU] Fix machine verification failure from INIT_EXEC lowering #98333

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 11, 2024

Conversation

jayfoad
Copy link
Contributor

@jayfoad jayfoad commented Jul 10, 2024

Fix machine verification failure from INIT_EXEC lowering since it was
moved from SILowerControlFlow to SIWholeQuadMode in #94452.

Fix machine verification failure from INIT_EXEC lowering since it was
moved from SILowerControlFlow to SIWholeQuadMode in llvm#94452.
@llvmbot
Copy link
Member

llvmbot commented Jul 10, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Jay Foad (jayfoad)

Changes

Fix machine verification failure from INIT_EXEC lowering since it was
moved from SILowerControlFlow to SIWholeQuadMode in #94452.


Full diff: https://github.com/llvm/llvm-project/pull/98333.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp (+2)
  • (modified) llvm/test/CodeGen/AMDGPU/wqm.ll (+32)
diff --git a/llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp b/llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
index b39f3ba970600..08460070d5f1a 100644
--- a/llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
+++ b/llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
@@ -1676,6 +1676,8 @@ bool SIWholeQuadMode::runOnMachineFunction(MachineFunction &MF) {
   if (!(GlobalFlags & (StateWQM | StateStrict)) && LowerToCopyInstrs.empty() &&
       LowerToMovInstrs.empty() && KillInstrs.empty()) {
     lowerLiveMaskQueries();
+    if (!InitExecInstrs.empty())
+      LIS->removeAllRegUnitsForPhysReg(AMDGPU::EXEC);
     return !InitExecInstrs.empty() || !LiveMaskQueries.empty();
   }
 
diff --git a/llvm/test/CodeGen/AMDGPU/wqm.ll b/llvm/test/CodeGen/AMDGPU/wqm.ll
index c621904ff727b..40d95e47f2410 100644
--- a/llvm/test/CodeGen/AMDGPU/wqm.ll
+++ b/llvm/test/CodeGen/AMDGPU/wqm.ll
@@ -3463,6 +3463,38 @@ bb:
   ret void
 }
 
+; Test a case that failed machine verification.
+define amdgpu_gs void @wqm_init_exec_switch(i32 %arg) {
+; GFX9-W64-LABEL: wqm_init_exec_switch:
+; GFX9-W64:       ; %bb.0:
+; GFX9-W64-NEXT:    s_mov_b64 exec, 0
+; GFX9-W64-NEXT:    v_cmp_lt_i32_e32 vcc, 0, v0
+; GFX9-W64-NEXT:    s_and_saveexec_b64 s[0:1], vcc
+; GFX9-W64-NEXT:    s_xor_b64 s[0:1], exec, s[0:1]
+; GFX9-W64-NEXT:    s_andn2_saveexec_b64 s[0:1], s[0:1]
+; GFX9-W64-NEXT:    s_endpgm
+;
+; GFX10-W32-LABEL: wqm_init_exec_switch:
+; GFX10-W32:       ; %bb.0:
+; GFX10-W32-NEXT:    s_mov_b32 exec_lo, 0
+; GFX10-W32-NEXT:    s_mov_b32 s0, exec_lo
+; GFX10-W32-NEXT:    v_cmpx_lt_i32_e32 0, v0
+; GFX10-W32-NEXT:    s_xor_b32 s0, exec_lo, s0
+; GFX10-W32-NEXT:    s_andn2_saveexec_b32 s0, s0
+; GFX10-W32-NEXT:    s_endpgm
+  call void @llvm.amdgcn.init.exec(i64 0)
+  switch i32 %arg, label %bb1 [
+    i32 0, label %bb3
+    i32 1, label %bb2
+  ]
+bb1:
+  ret void
+bb2:
+  ret void
+bb3:
+  ret void
+}
+
 declare void @llvm.amdgcn.exp.f32(i32, i32, float, float, float, float, i1, i1) #1
 declare void @llvm.amdgcn.image.store.1d.v4f32.i32(<4 x float>, i32, i32, <8 x i32>, i32, i32) #1
 

@@ -1676,6 +1676,8 @@ bool SIWholeQuadMode::runOnMachineFunction(MachineFunction &MF) {
if (!(GlobalFlags & (StateWQM | StateStrict)) && LowerToCopyInstrs.empty() &&
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am now even more convinced that we should not bother with this fast path. We should just make sure that the general path is fast when there is no work to do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll run compile time testing on a version without it -- it's possible there really is little difference already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing I can see that we might be able to skip over in some cases is the two for (auto BII : Blocks) loops below.

@@ -1717,7 +1719,7 @@ bool SIWholeQuadMode::runOnMachineFunction(MachineFunction &MF) {
LIS->removeAllRegUnitsForPhysReg(AMDGPU::SCC);

// If we performed any kills then recompute EXEC
if (!KillInstrs.empty())
if (!KillInstrs.empty() || !InitExecInstrs.empty())
LIS->removeAllRegUnitsForPhysReg(AMDGPU::EXEC);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be necessary in the first place. Reserved registers, like exec, do not have liveness tracked

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I see stuff like this in the dumps:

# After SI Whole Quad Mode
********** INTERVALS **********
EXEC_LO_LO16 [16r,16d:0) 0@16r
EXEC_LO_HI16 [16r,16d:0) 0@16r
EXEC_HI_LO16 [16r,16d:0) 0@16r
EXEC_HI_HI16 [16r,16d:0) 0@16r

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those shouldn't have been created in the first place. Where did they come from?

Copy link
Contributor

@perlfu perlfu Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this test case they are formed in Register Coalescer when it attempts to merge a copy of EXEC with a virtual and asks for an interval on a EXEC reg unit.

Reserved registers still have intervals, they are a sequence of a dead defs, one for each assignment to the register.
You can see this in LiveIntervals::computeRegUnitRange.
I do not know the use for this, but assume it is for interference calculations.

Copy link
Contributor

@perlfu perlfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jayfoad jayfoad merged commit d4e46f0 into llvm:main Jul 11, 2024
7 checks passed
@jayfoad jayfoad deleted the wqm-init-exec-switch branch July 11, 2024 08:18
aaryanshukla pushed a commit to aaryanshukla/llvm-project that referenced this pull request Jul 14, 2024
…vm#98333)

Fix machine verification failure from INIT_EXEC lowering since it was
moved from SILowerControlFlow to SIWholeQuadMode in llvm#94452.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants