Description
System information
Type | Version/Name |
---|---|
Distribution Name | Debian/proxmox |
Distribution Version | 6.3.1 |
Linux Kernel | 5.4.106-1-pve |
Architecture | amd64 (qemu) |
ZFS Version | 0.8.6-1 |
SPL Version | 0.8.6-1 |
Describe the problem you're observing
After printing a panic message, all zfs operations are stuck, the machine needs to be rebooted.
See below for panic message.
Note:
This is NOT the zfs version that comes with the distribution.
This is the locally recompiled zfs-0.8-release tagged version, but recompiled with NO error, using the default compilation settings.
Describe how to reproduce the problem
Hard to tell exactly when and why, but seems to happen under significant load, and most probably when trying to destroy snapshots.
After a reboot, it happens again, not immediately, but unpredictably. The stack trace when it reproduces is identical.
I haven't tried yet locally compiled earlier versions, but it seems that earlier versions that came with proxmox kernel 5.4.78 (zfs version 0.8.5) didnt produce this error. (proxmox 6.1.1 or 6.2.1 fresh install without apt update I guess).
Note that it MAY have started after I had once to destroy 13k snapshots that had been accumulating in a dataset. But I don't see why it and I am not totally sure about the timing. Maybe completely unrelated.
Scrub after occurrence completed without error, and it happened again after scrubbing.
Note that most datasets are encrypted. This is a backup server, mostly receiving encrypted snapshots sent in raw replication differential streams. Source streams are also produced using a locally recompiled 0.8 version (I had to downgrade to this version because of another issue on ZFS version 2.0.X that I reported in a separate issue that totally brake our backup scripts).
Include any warning/errors/backtraces from the system logs
[63926.387058] VERIFY3(size != 0) failed (0 != 0)
[63926.387114] PANIC at range_tree.c:368:range_tree_remove_impl()
[63926.387144] Showing stack for process 602
[63926.387156] CPU: 2 PID: 602 Comm: z_wr_iss Tainted: P O 5.4.106-1-pve #1
[63926.387158] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[63926.387163] Call Trace:
[63926.387188] dump_stack+0x6d/0x8b
[63926.387227] spl_dumpstack+0x29/0x2b [spl]
[63926.387235] spl_panic+0xd3/0xfb [spl]
[63926.387241] ? __blk_mq_try_issue_directly+0x177/0x1c0
[63926.387264] ? fletcher_4_incremental_byteswap+0x130/0x130 [zcommon]
[63926.387333] ? abd_iterate_func+0x7f/0x120 [zfs]
[63926.387338] ? update_sd_lb_stats+0x104/0x790
[63926.387350] ? avl_find+0x5f/0x90 [zavl]
[63926.387423] range_tree_remove_impl+0x310/0x3d0 [zfs]
[63926.387426] ? _cond_resched+0x19/0x30
[63926.387430] ? __kmalloc_node+0x1e0/0x330
[63926.387500] ? metaslab_df_alloc+0x131/0x1d0 [zfs]
[63926.387571] range_tree_remove+0x10/0x20 [zfs]
[63926.387642] metaslab_alloc_dva+0x273/0x1100 [zfs]
[63926.387714] metaslab_alloc+0xb4/0x240 [zfs]
[63926.387790] zio_dva_allocate+0xd2/0x810 [zfs]
[63926.387793] ? _cond_resched+0x19/0x30
[63926.387796] ? mutex_lock+0x12/0x30
[63926.387866] ? metaslab_class_throttle_reserve+0xd8/0xf0 [zfs]
[63926.387875] ? tsd_hash_search.isra.5+0x72/0xa0 [spl]
[63926.387883] ? tsd_get_by_thread+0x2e/0x40 [spl]
[63926.387890] ? taskq_member+0x18/0x30 [spl]
[63926.387967] zio_execute+0x99/0xf0 [zfs]
[63926.387975] taskq_thread+0x2ec/0x4d0 [spl]
[63926.387980] ? wake_up_q+0x80/0x80
[63926.388057] ? zio_taskq_member.isra.12.constprop.17+0x70/0x70 [zfs]
[63926.388061] kthread+0x120/0x140
[63926.388070] ? task_done+0xb0/0xb0 [spl]
[63926.388072] ? kthread_park+0x90/0x90
[63926.388075] ret_from_fork+0x35/0x40