Skip to content

PANIC: zfs: adding existent segment to range tree (offset=11f694000 size=7000) and pool is corrupted after reboot #15619

Closed
@mtippmann

Description

@mtippmann

System information

Type Version/Name
Distribution Name Arch Linux
Distribution Version rolling
Kernel Version 6.6.3-arch1-1
Architecture amd64
OpenZFS Version zfs-2.2.99-241_g3e4bef52b0 / zfs-kmod-2.2.99-241_g3e4bef52b0 - git as of 01.12.23

Describe the problem you're observing

get this oops when compiling openwrt on a pool running current git with

zfs_bclone_enabled=1
zfs_dmu_offset_next_sync=1 
Dec 01 12:50:47 futro2 kernel: PANIC: zfs: adding existent segment to range tree (offset=11f694000 size=7000)
Dec 01 12:50:47 futro2 kernel: Showing stack for process 288
Dec 01 12:50:47 futro2 kernel: CPU: 3 PID: 288 Comm: txg_sync Tainted: P     U     OE      6.6.3-arch1-1 #1 6156c717f7d423f5954ce718462aaaaa43b9110d
Dec 01 12:50:47 futro2 kernel: Hardware name: FUJITSU FUTRO S740/D3544-A1, BIOS V5.0.0.13 R1.13.0 for D3544-A1x                    09/23/2022
Dec 01 12:50:47 futro2 kernel: Call Trace:
Dec 01 12:50:47 futro2 kernel:  <TASK>
Dec 01 12:50:47 futro2 kernel:  dump_stack_lvl+0x47/0x60
Dec 01 12:50:47 futro2 kernel:  vcmn_err+0xdf/0x120 [spl 8e72ae35b64a0f5a2b6fea420c9c9e09f33fc00d]
Dec 01 12:50:47 futro2 kernel:  zfs_panic_recover+0x79/0xa0 [zfs 90d504f36e61841082f23aea7ae276b260ab21d6]
Dec 01 12:50:47 futro2 kernel:  range_tree_add_impl+0x28f/0xea0 [zfs 90d504f36e61841082f23aea7ae276b260ab21d6]
Dec 01 12:50:47 futro2 kernel:  ? __pfx_range_tree_add+0x10/0x10 [zfs 90d504f36e61841082f23aea7ae276b260ab21d6]
Dec 01 12:50:47 futro2 kernel:  range_tree_vacate+0x85/0x230 [zfs 90d504f36e61841082f23aea7ae276b260ab21d6]
Dec 01 12:50:47 futro2 kernel:  metaslab_sync_done+0x149/0x540 [zfs 90d504f36e61841082f23aea7ae276b260ab21d6]
Dec 01 12:50:47 futro2 kernel:  vdev_sync_done+0x3a/0x90 [zfs 90d504f36e61841082f23aea7ae276b260ab21d6]
Dec 01 12:50:47 futro2 kernel:  spa_sync+0x893/0x1070 [zfs 90d504f36e61841082f23aea7ae276b260ab21d6]
Dec 01 12:50:47 futro2 kernel:  txg_sync_thread+0x1fe/0x3a0 [zfs 90d504f36e61841082f23aea7ae276b260ab21d6]
Dec 01 12:50:47 futro2 kernel:  ? __pfx_txg_sync_thread+0x10/0x10 [zfs 90d504f36e61841082f23aea7ae276b260ab21d6]
Dec 01 12:50:47 futro2 kernel:  ? __pfx_thread_generic_wrapper+0x10/0x10 [spl 8e72ae35b64a0f5a2b6fea420c9c9e09f33fc00d]
Dec 01 12:50:47 futro2 kernel:  thread_generic_wrapper+0x5b/0x70 [spl 8e72ae35b64a0f5a2b6fea420c9c9e09f33fc00d]
Dec 01 12:50:47 futro2 kernel:  kthread+0xe5/0x120
Dec 01 12:50:47 futro2 kernel:  ? __pfx_kthread+0x10/0x10
Dec 01 12:50:47 futro2 kernel:  ret_from_fork+0x31/0x50
Dec 01 12:50:47 futro2 kernel:  ? __pfx_kthread+0x10/0x10
Dec 01 12:50:47 futro2 kernel:  ret_from_fork_asm+0x1b/0x30
Dec 01 12:50:47 futro2 kernel:  </TASK>

IO hangs and after reboot the pool can't be imported anymore:

IMG_20231201_130329

Describe how to reproduce the problem

This is unfortunatly somewhat tricky - it happens during kernel build when the vsdo library is generated this is done via a c-program - i've already detailled all the steps in #15513 (comment) but this appears to be a slightly different bug. Also #15485 looks similiar?.

I can reproduce it reliable by building OpenWrt:

$ git clone https://github.com/openwrt/openwrt
$ cd openwrt 
$ ./scripts/feeds update -a && ./scripts/feeds install -a 
$ make defconfig
$ make -j$(nproc) 
...
machine hangs 
...

unfortunatly I still haven't figured out how to isolate the vdso generation - but build OpenWrt until the bug is triggered doesn't take that long - requirements for the build are documented here: https://openwrt.org/docs/guide-developer/toolchain/install-buildsystem#linux_gnu-linux_distributions

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: DefectIncorrect behavior (e.g. crash, hang)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions