Open
Description
IIUC, we have already considered LMUL/SEW in our RVV scheduler, but not chaining. So, we may overestimate the latency of RVV instructions. For example:
vsetvli a0, zero, e8, m8, ta, ma
vadd.vv v3, v2, v1
vmul.vv v5, v4, v3
The part of v3 can be read by vmul.vv
before the commit of vadd.vv
, so the whole latency is lower than the sum of latencies of these two instructions.
We have ReadAdvance
to bypass some cycles, but now we don't apply this mechanism to existing models. Instead, we set the Latency
to LMUL_1
's cycles and occupy the vector unit during the whole execution via setting AcquireAtCycles
and ReleaseAtCycles
(correct me if I understand it wrong). I don't know if this way is OK or good enough.
So some questions here:
- Is current approach good enough?
- Any other thoughts about modeling chaining?