Skip to content

Commit ea88384

Browse files
mshockwavelukel97
andauthored
[RISCV][InsertVSETVLI] Remove redundant vsetvli by coalescing blocks from bottom up (#141298)
I ran into a relatively rare case in RISCVInsertVSETVLIPass, where right after the `emitVSETVLI` phase but before the `coalesceVSETVLIs` phase, we have two blocks that look like this: ``` bb.first: %46:gprnox0 = PseudoVSETIVLI %30:gprnox0, 199 /* e8, mf2, ta, ma */, implicit-def $vl, implicit-def $vtype %76:gpr = PseudoVSETVLIX0 killed $x0, ..., implicit-def $vl, implicit-def $vtype $v10m2 = PseudoVMV_V_I_M2 undef renamable $v10m2, 0, -1, 5 /* e32 */, 0 /* tu, mu */, implicit $vl, implicit $vtype ... bb.second: $x0 = PseudoVSETVLI %46, 209 /* e32, m2, ta, ma */, implicit-def $vl, implicit-def $vtype $v10 = PseudoVMV_S_X undef $v10(tied-def 0), undef %53:gpr, $noreg, 5, implicit $vl, implicit $vtype $x0 = PseudoVSETVLI %30, 209 /* e32, m2, ta, ma */, implicit-def $vl, implicit-def $vtype $v8 = PseudoVREDSUM_VS_M2_E32 undef $v8(tied-def 0), killed $v8m2, killed $v10, $noreg, 5, 0, implicit $vl, implicit $vtype ``` After the `coalesceVSETVLIs` phase, it turns into: ``` diff bb.first: - %46:gprnox0 = PseudoVSETIVLI %30:gprnox0, 199 /* e8, mf2, ta, ma */, implicit-def $vl, implicit-def $vtype + dead %46:gprnox0 = PseudoVSETIVLI %30:gprnox0, 199 /* e8, mf2, ta, ma */, implicit-def $vl, implicit-def $vtype %76:gpr = PseudoVSETVLIX0 killed $x0, ..., implicit-def $vl, implicit-def $vtype $v10m2 = PseudoVMV_V_I_M2 undef renamable $v10m2, 0, -1, 5 /* e32 */, 0 /* tu, mu */, implicit $vl, implicit $vtype ... bb.second: - $x0 = PseudoVSETVLI %46, 209 /* e32, m2, ta, ma */, implicit-def $vl, implicit-def $vtype + $x0 = PseudoVSETVLI %30, 209 /* e32, m2, ta, ma */, implicit-def $vl, implicit-def $vtype $v10 = PseudoVMV_S_X undef $v10(tied-def 0), undef %53:gpr, $noreg, 5, implicit $vl, implicit $vtype - $x0 = PseudoVSETVLI %30, 209 /* e32, m2, ta, ma */, implicit-def $vl, implicit-def $vtype $v8 = PseudoVREDSUM_VS_M2_E32 undef $v8(tied-def 0), killed $v8m2, killed $v10, $noreg, 5, 0, implicit $vl, implicit $vtype ``` We forwarded `%30` to any use of `%46` and further reduced the number of VSETVLI we need in `bb.second`. But the problem is, if `bb.first` is processed before `bb.second` -- which is the majority of the cases -- then we're not able to remove the vsetvli which defines the now-dead `%46` in `bb.first` after coalescing `bb.second`. This will produce assembly code like this: ``` vsetvli zero, s0, e8, mf2, ta, ma vsetvli a0, zero, e32, m2, ta, ma vmv.v.i v10, 0 ``` This patch fixes this issue by coalescing the blocks from bottom up such that we can account for dead VSETVLI in the earlier blocks after its uses are eliminated in later blocks. --------- Co-authored-by: Luke Lau <[email protected]>
1 parent 8a21e0f commit ea88384

File tree

2 files changed

+75
-2
lines changed

2 files changed

+75
-2
lines changed

llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@
2626

2727
#include "RISCV.h"
2828
#include "RISCVSubtarget.h"
29+
#include "llvm/ADT/PostOrderIterator.h"
2930
#include "llvm/ADT/Statistic.h"
3031
#include "llvm/CodeGen/LiveDebugVariables.h"
3132
#include "llvm/CodeGen/LiveIntervals.h"
@@ -1840,8 +1841,11 @@ bool RISCVInsertVSETVLI::runOnMachineFunction(MachineFunction &MF) {
18401841
// any cross block analysis within the dataflow. We can't have both
18411842
// demanded fields based mutation and non-local analysis in the
18421843
// dataflow at the same time without introducing inconsistencies.
1843-
for (MachineBasicBlock &MBB : MF)
1844-
coalesceVSETVLIs(MBB);
1844+
// We're visiting blocks from the bottom up because a VSETVLI in the
1845+
// earlier block might become dead when its uses in later blocks are
1846+
// optimized away.
1847+
for (MachineBasicBlock *MBB : post_order(&MF))
1848+
coalesceVSETVLIs(*MBB);
18451849

18461850
// Insert PseudoReadVL after VLEFF/VLSEGFF and replace it with the vl output
18471851
// of VLEFF/VLSEGFF.
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
2+
# RUN: llc -mtriple=riscv64 -mattr=+v -run-pass=liveintervals,riscv-insert-vsetvli %s -o - | FileCheck %s
3+
4+
---
5+
name: coalesce
6+
tracksRegLiveness: true
7+
noPhis: true
8+
body: |
9+
; CHECK-LABEL: name: coalesce
10+
; CHECK: bb.0:
11+
; CHECK-NEXT: successors: %bb.1(0x80000000)
12+
; CHECK-NEXT: {{ $}}
13+
; CHECK-NEXT: [[DEF:%[0-9]+]]:gprnox0 = IMPLICIT_DEF
14+
; CHECK-NEXT: {{ $}}
15+
; CHECK-NEXT: bb.1:
16+
; CHECK-NEXT: successors: %bb.2(0x80000000)
17+
; CHECK-NEXT: {{ $}}
18+
; CHECK-NEXT: dead [[PseudoVSETVLIX0_:%[0-9]+]]:gpr = PseudoVSETVLIX0 killed $x0, 209 /* e32, m2, ta, ma */, implicit-def $vl, implicit-def $vtype
19+
; CHECK-NEXT: renamable $v10m2 = PseudoVMV_V_I_M2 undef renamable $v10m2, 0, -1, 5 /* e32 */, 0 /* tu, mu */, implicit $vl, implicit $vtype
20+
; CHECK-NEXT: {{ $}}
21+
; CHECK-NEXT: bb.2:
22+
; CHECK-NEXT: successors: %bb.3(0x04000000), %bb.2(0x7c000000)
23+
; CHECK-NEXT: liveins: $v10m2, $v12m2
24+
; CHECK-NEXT: {{ $}}
25+
; CHECK-NEXT: BEQ undef %2:gpr, $x0, %bb.2
26+
; CHECK-NEXT: PseudoBR %bb.3
27+
; CHECK-NEXT: {{ $}}
28+
; CHECK-NEXT: bb.3:
29+
; CHECK-NEXT: successors: %bb.1(0x7c000000), %bb.4(0x04000000)
30+
; CHECK-NEXT: liveins: $v8m2
31+
; CHECK-NEXT: {{ $}}
32+
; CHECK-NEXT: $x0 = PseudoVSETVLI [[DEF]], 209 /* e32, m2, ta, ma */, implicit-def $vl, implicit-def $vtype
33+
; CHECK-NEXT: renamable $v10 = PseudoVMV_S_X undef renamable $v10, undef %2:gpr, $noreg, 5 /* e32 */, implicit $vl, implicit $vtype
34+
; CHECK-NEXT: dead renamable $v8 = PseudoVREDSUM_VS_M2_E32 undef renamable $v8, killed undef renamable $v8m2, killed undef renamable $v10, $noreg, 5 /* e32 */, 0 /* tu, mu */, implicit $vl, implicit $vtype
35+
; CHECK-NEXT: BNE undef %3:gpr, $x0, %bb.1
36+
; CHECK-NEXT: PseudoBR %bb.4
37+
; CHECK-NEXT: {{ $}}
38+
; CHECK-NEXT: bb.4:
39+
; CHECK-NEXT: PseudoRET
40+
bb.0:
41+
successors: %bb.1(0x80000000)
42+
43+
%78:gprnox0 = IMPLICIT_DEF
44+
45+
bb.1:
46+
successors: %bb.2(0x80000000)
47+
48+
%46:gprnox0 = PseudoVSETVLI %78, 199 /* e8, mf2, ta, ma */, implicit-def dead $vl, implicit-def dead $vtype
49+
renamable $v10m2 = PseudoVMV_V_I_M2 undef renamable $v10m2, 0, -1, 5 /* e32 */, 0 /* tu, mu */
50+
51+
bb.2:
52+
successors: %bb.3(0x04000000), %bb.2(0x7c000000)
53+
liveins: $v10m2, $v12m2
54+
55+
BEQ undef %54:gpr, $x0, %bb.2
56+
PseudoBR %bb.3
57+
58+
bb.3:
59+
successors: %bb.1(0x7c000000), %bb.4(0x04000000)
60+
liveins: $v8m2
61+
62+
renamable $v10 = PseudoVMV_S_X undef renamable $v10, undef %54:gpr, %46, 5 /* e32 */
63+
dead renamable $v8 = PseudoVREDSUM_VS_M2_E32 undef renamable $v8, killed undef renamable $v8m2, killed undef renamable $v10, %46, 5 /* e32 */, 0 /* tu, mu */
64+
BNE undef %29:gpr, $x0, %bb.1
65+
PseudoBR %bb.4
66+
67+
bb.4:
68+
PseudoRET
69+
...

0 commit comments

Comments
 (0)