Skip to content

refactor packet relaying and increase signed_blocks_window in e2e tests #967

Closed
@MSalopek

Description

@MSalopek

Problem

relayPackets action in e2e is non-deterministic & the signed_blocks_window is set too low causing misinterpretation of test results such as:

Closing criteria

Make relayPackets function wait for at least 1 block to be produced to ensure all Txs have been included in a block.
Make signed_blocks_window at least 10 (instead of 2) so validators have some room to breathe

Problem details

At the time of writing the relayPackets helper func in e2e testing looks like this:

func (tr TestRun) relayPackets(
	action relayPacketsAction,
	verbose bool,
) {
	// hermes clear packets ibc0 transfer channel-13
	//#nosec G204 -- Bypass linter warning for spawning subprocess with cmd arguments.
	cmd := exec.Command("docker", "exec", tr.containerConfig.instanceName, "hermes", "clear", "packets",
		"--chain", string(tr.chainConfigs[action.chain].chainId),
		"--port", action.port,
		"--channel", "channel-"+fmt.Sprint(action.channel),
	)
	if verbose {
		log.Println("relayPackets cmd:", cmd.String())
	}

	bz, err := cmd.CombinedOutput()
	if err != nil {
		log.Fatal(err, "\n", string(bz))
	}
}

Please notice that the function simply invokes hermes clear packets and does not have a way of confirming that packet send Txs were included in a block.

This can lead to weird and confusing situations where the packets are relayed, but the state is unmodified becaue not enough time had elapsed (the Tx was not included in a block, no state was modified).

This was especially confusing during Downtime tests for soft opt-out.
In this scenario we have the following steps:

  1. redelegate stake from a validator so it is in bottom 5% of validator power
  2. relay info about the redelegation
  3. initiate downtime by excluding the validator from the network

Here, step 3 would begin before results from 2 were commited to state. When a downtime was initiated, > 2/3 of validator power would be excluded from the network causing the chain to halt.

The solution
Wait a couple blocks after relaying to ensure all operations are completed.

func (tr TestRun) relayPackets(
	action relayPacketsAction,
	verbose bool,
) {
	// hermes clear packets ibc0 transfer channel-13
	//#nosec G204 -- Bypass linter warning for spawning subprocess with cmd arguments.
	cmd := exec.Command("docker", "exec", tr.containerConfig.instanceName, "hermes", "clear", "packets",
		"--chain", string(tr.chainConfigs[action.chain].chainId),
		"--port", action.port,
		"--channel", "channel-"+fmt.Sprint(action.channel),
	)
	if verbose {
		log.Println("relayPackets cmd:", cmd.String())
	}

	bz, err := cmd.CombinedOutput()
	if err != nil {
		log.Fatal(err, "\n", string(bz))
	}

	tr.waitBlocks(action.chain, 1, 10*time.Second)   // wait for block inclusion
}

Metadata

Metadata

Assignees

Labels

scope: testingCode review, testing, making sure the code is following the specification.type: bugIssues that need priority attention -- something isn't workingtype: tech-debtSlows down development in the long run

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions