Skip to content

PANIC at range_tree.c:368:range_tree_remove_impl() #11893

Closed
@olidal

Description

@olidal

System information

Type Version/Name
Distribution Name Debian/proxmox
Distribution Version 6.3.1
Linux Kernel 5.4.106-1-pve
Architecture amd64 (qemu)
ZFS Version 0.8.6-1
SPL Version 0.8.6-1

Describe the problem you're observing

After printing a panic message, all zfs operations are stuck, the machine needs to be rebooted.
See below for panic message.

Note:
This is NOT the zfs version that comes with the distribution.
This is the locally recompiled zfs-0.8-release tagged version, but recompiled with NO error, using the default compilation settings.

Describe how to reproduce the problem

Hard to tell exactly when and why, but seems to happen under significant load, and most probably when trying to destroy snapshots.
After a reboot, it happens again, not immediately, but unpredictably. The stack trace when it reproduces is identical.

I haven't tried yet locally compiled earlier versions, but it seems that earlier versions that came with proxmox kernel 5.4.78 (zfs version 0.8.5) didnt produce this error. (proxmox 6.1.1 or 6.2.1 fresh install without apt update I guess).

Note that it MAY have started after I had once to destroy 13k snapshots that had been accumulating in a dataset. But I don't see why it and I am not totally sure about the timing. Maybe completely unrelated.

Scrub after occurrence completed without error, and it happened again after scrubbing.

Note that most datasets are encrypted. This is a backup server, mostly receiving encrypted snapshots sent in raw replication differential streams. Source streams are also produced using a locally recompiled 0.8 version (I had to downgrade to this version because of another issue on ZFS version 2.0.X that I reported in a separate issue that totally brake our backup scripts).

Include any warning/errors/backtraces from the system logs

[63926.387058] VERIFY3(size != 0) failed (0 != 0)
[63926.387114] PANIC at range_tree.c:368:range_tree_remove_impl()
[63926.387144] Showing stack for process 602
[63926.387156] CPU: 2 PID: 602 Comm: z_wr_iss Tainted: P           O      5.4.106-1-pve #1
[63926.387158] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[63926.387163] Call Trace:
[63926.387188]  dump_stack+0x6d/0x8b
[63926.387227]  spl_dumpstack+0x29/0x2b [spl]
[63926.387235]  spl_panic+0xd3/0xfb [spl]
[63926.387241]  ? __blk_mq_try_issue_directly+0x177/0x1c0
[63926.387264]  ? fletcher_4_incremental_byteswap+0x130/0x130 [zcommon]
[63926.387333]  ? abd_iterate_func+0x7f/0x120 [zfs]
[63926.387338]  ? update_sd_lb_stats+0x104/0x790
[63926.387350]  ? avl_find+0x5f/0x90 [zavl]
[63926.387423]  range_tree_remove_impl+0x310/0x3d0 [zfs]
[63926.387426]  ? _cond_resched+0x19/0x30
[63926.387430]  ? __kmalloc_node+0x1e0/0x330
[63926.387500]  ? metaslab_df_alloc+0x131/0x1d0 [zfs]
[63926.387571]  range_tree_remove+0x10/0x20 [zfs]
[63926.387642]  metaslab_alloc_dva+0x273/0x1100 [zfs]
[63926.387714]  metaslab_alloc+0xb4/0x240 [zfs]
[63926.387790]  zio_dva_allocate+0xd2/0x810 [zfs]
[63926.387793]  ? _cond_resched+0x19/0x30
[63926.387796]  ? mutex_lock+0x12/0x30
[63926.387866]  ? metaslab_class_throttle_reserve+0xd8/0xf0 [zfs]
[63926.387875]  ? tsd_hash_search.isra.5+0x72/0xa0 [spl]
[63926.387883]  ? tsd_get_by_thread+0x2e/0x40 [spl]
[63926.387890]  ? taskq_member+0x18/0x30 [spl]
[63926.387967]  zio_execute+0x99/0xf0 [zfs]
[63926.387975]  taskq_thread+0x2ec/0x4d0 [spl]
[63926.387980]  ? wake_up_q+0x80/0x80
[63926.388057]  ? zio_taskq_member.isra.12.constprop.17+0x70/0x70 [zfs]
[63926.388061]  kthread+0x120/0x140
[63926.388070]  ? task_done+0xb0/0xb0 [spl]
[63926.388072]  ? kthread_park+0x90/0x90
[63926.388075]  ret_from_fork+0x35/0x40

Metadata

Metadata

Assignees

No one assigned

    Labels

    Status: Triage NeededNew issue which needs to be triagedType: DefectIncorrect behavior (e.g. crash, hang)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions