Skip to content

Data corruption with 519851122b1703b8 ("ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced()") #14753

Closed
@mjguzik

Description

@mjguzik

This is on FreeBSD, but it should not matter.

I don't have a trivial testcase. The workload which runs into it is rather i/o heavy (package building) and the issue randomly pops up. Manifests itself mostly as strip(1) complaining about invalid file format in various ports, trying to build the same port the second time works just fine.

Note things are a little sketchy since there are 2 unrelated data corruption bugs, the other one being block_cloning (fixed since).

That said, grabbing the tree as of 5198511 ("ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced()") results in the non-deterministic corruption. Going one commit below makes it go away. I verified a bunch of times by running the wokload for ~1h -- with the problematic commit first problems show up within ~15 minutes or so. Unfortunately due to the nature of the machinery doing the build it is not easy to grab the corrupted data, but this can be worked out.

At the same time, if you can't repro the problem, access to the affected machine can be arranged.

In the meantime seeing as the commit was supposed to be just an optimization it should be reverted.

The pool uses one drive with 0 magic, created with mere 'zpool create'.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: DefectIncorrect behavior (e.g. crash, hang)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions