Description
System information
Type | Version/Name |
---|---|
Distribution Name | NixOS |
Distribution Version | 21.05 |
Kernel Version | 5.10.50 |
Architecture | amd64 |
OpenZFS Version | 2.0.5-1 |
Describe the problem you're observing
Issue #11531 identified a kernel panic triggered by the refactor in 13fac09, from Feb 2020. It was subsequently fixed in a81b812, in June 2021.
The fix was only applied to master and subsequently the 2.1 branch. From git log spelunking, the bug is present in all releases in the 2.0.x track, which - given 2.1's recent arrival, is likely the version in use by many downstream distros that don't do bleeding edge rolling release.
a81b812 states that the revert wasn't trivially clean, so I'd like to request that OpenZFS maintainers backport the fix to the 2.0 branch and make a 2.0.6 release, rather than gamble on downstream distro maintainers trying to apply this patch themselves.
I'm unsure if OpenZFS overlaps maintenance of the latest and previous stable release tracks for a while, or if the project's view is that downstream consumers should upgrade to 2.1 (which is a much larger delta) as soon as it releases, because 2.0.x becomes EOL. I looked around for a backporting or "support lifetime" policy, but failed to find one.
Describe how to reproduce the problem
Various reproduction steps described in #11531. In my case, the reproduction steps were: run a NAS on NixOS 21.05, run a bunch of software on it that does some reasonably heavy I/O, and get a kernel panic within a few days.
Include any warning/errors/backtraces from the system logs
Stack trace from my panicked server, which matches traces from #11531.
VERIFY3(c < SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT) failed (36028797018963967 < 32768)
PANIC at zio.c:341:zio_data_buf_alloc()
Showing stack for process 2235123
CPU: 11 PID: 2235123 Comm: transmission-da Tainted: P O 5.10.50 #1-NixOS
Hardware name: Supermicro SSG-5028R-E1CR12LA-CE010/X10SRH-CLN4F, BIOS 3.2 11/22/2019
Call Trace:
dump_stack+0x6b/0x83
spl_panic+0xd4/0xfc [spl]
? spl_kmem_cache_alloc+0x75/0x790 [spl]
? kmem_cache_alloc+0xda/0x1d0
? spl_kmem_cache_alloc+0x98/0x790 [spl]
? aggsum_add+0x175/0x190 [zfs]
? mutex_lock+0xe/0x30
? aggsum_add+0x175/0x190 [zfs]
zio_data_buf_alloc+0x55/0x60 [zfs]
abd_alloc_linear+0x8a/0xc0 [zfs]
arc_hdr_alloc_abd+0xdf/0x200 [zfs]
arc_hdr_alloc+0x104/0x170 [zfs]
arc_alloc_buf+0x46/0x150 [zfs]
dbuf_hold_copy.constprop.0+0x31/0xa0 [zfs]
dbuf_hold_impl+0x476/0x660 [zfs]
dbuf_hold+0x2c/0x60 [zfs]
dmu_buf_hold_array_by_dnode+0xdd/0x570 [zfs]
dmu_read_uio_dnode+0x49/0x140 [zfs]
? zfs_rangelock_enter_impl+0x269/0x650 [zfs]
dmu_read_uio_dbuf+0x42/0x60 [zfs]
zfs_read+0x130/0x3a0 [zfs]
zpl_iter_read+0xe4/0x190 [zfs]
new_sync_read+0x115/0x1a0
vfs_read+0x14b/0x1a0
__x64_sys_pread64+0x8d/0xc0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f494c8b5fcf
f 77 35 44 89 c7 48 89 44 24 08 e8 7c f4 ff ff 48
RSP: 002b:00007f494b35c8b0 EFLAGS: 00000293 ORIG_RAX: 0000000000000011
RAX: ffffffffffffffda RBX: 0000000000000018 RCX: 00007f494c8b5fcf
RDX: 0000000000004000 RSI: 00007f494574c000 RDI: 0000000000000018
RBP: 00007f494574c000 R08: 0000000000000000 R09: 00007f494b35c990
R10: 00000000ceff0000 R11: 0000000000000293 R12: 00000000ceff0000
R13: 0000000000004000 R14: 00007f49043d6000 R15: 0000000000000000