Description
Problem
relayPackets
action in e2e is non-deterministic & the signed_blocks_window
is set too low causing misinterpretation of test results such as:
Closing criteria
Make relayPackets
function wait for at least 1 block to be produced to ensure all Txs have been included in a block.
Make signed_blocks_window
at least 10 (instead of 2) so validators have some room to breathe
Problem details
At the time of writing the relayPackets
helper func in e2e testing looks like this:
func (tr TestRun) relayPackets(
action relayPacketsAction,
verbose bool,
) {
// hermes clear packets ibc0 transfer channel-13
//#nosec G204 -- Bypass linter warning for spawning subprocess with cmd arguments.
cmd := exec.Command("docker", "exec", tr.containerConfig.instanceName, "hermes", "clear", "packets",
"--chain", string(tr.chainConfigs[action.chain].chainId),
"--port", action.port,
"--channel", "channel-"+fmt.Sprint(action.channel),
)
if verbose {
log.Println("relayPackets cmd:", cmd.String())
}
bz, err := cmd.CombinedOutput()
if err != nil {
log.Fatal(err, "\n", string(bz))
}
}
Please notice that the function simply invokes hermes clear packets
and does not have a way of confirming that packet send Txs were included in a block.
This can lead to weird and confusing situations where the packets are relayed, but the state is unmodified becaue not enough time had elapsed (the Tx was not included in a block, no state was modified).
This was especially confusing during Downtime tests for soft opt-out.
In this scenario we have the following steps:
- redelegate stake from a validator so it is in bottom 5% of validator power
- relay info about the redelegation
- initiate downtime by excluding the validator from the network
Here, step 3 would begin before results from 2 were commited to state. When a downtime was initiated, > 2/3 of validator power would be excluded from the network causing the chain to halt.
The solution
Wait a couple blocks after relaying to ensure all operations are completed.
func (tr TestRun) relayPackets(
action relayPacketsAction,
verbose bool,
) {
// hermes clear packets ibc0 transfer channel-13
//#nosec G204 -- Bypass linter warning for spawning subprocess with cmd arguments.
cmd := exec.Command("docker", "exec", tr.containerConfig.instanceName, "hermes", "clear", "packets",
"--chain", string(tr.chainConfigs[action.chain].chainId),
"--port", action.port,
"--channel", "channel-"+fmt.Sprint(action.channel),
)
if verbose {
log.Println("relayPackets cmd:", cmd.String())
}
bz, err := cmd.CombinedOutput()
if err != nil {
log.Fatal(err, "\n", string(bz))
}
tr.waitBlocks(action.chain, 1, 10*time.Second) // wait for block inclusion
}