Description
System information
Type | Version/Name |
---|---|
Distribution Name | CentOS |
Distribution Version | 7.9.2009 (Core) |
Linux Kernel | 3.10.0-1160.11.1 |
Architecture | x86_64 |
ZFS Version | 2.0.1 (kmod) |
SPL Version | 2.0.1 |
Describe the problem you're observing
After updating my NAS from ZFS 0.8.5 to 2.0.1 and performing a zpool upgrade, I started encountering an issue where my docker containers (running local on the NAS) that mounted ZFS volumes would deadlock at 100% CPU as soon as they tried to perform any heavy write operations to the pool. I observed this behavior with both the linuxserver.io Sabnzbd and qBittorrent containers. The containers would appear to function normally until I tried to download a Linux ISO, then, the download would get stuck, container would lock at 100% CPU, and nothing would work to kill or stop the container until I rebooted.
I was able to work around this issue by downgrading ZFS packages to 2.0.0. Everything is working correctly again.
Describe how to reproduce the problem
Create a RAID-Z2 pool using OpenZFS 2.0.1 on CentOS 7.9 (my pool has a both an L2ARC and a SLOG device)
Install Docker CE 20.10 (problem occurs with 19.03 too)
Launch a linuxserver.io Sabnzbd container, passing a ZFS volume to /config and /downloads
Attempt to download a NZB
Download will begin and then immediately deadlock
Include any warning/errors/backtraces from the system logs
There was no relevant log output from the docker application or in syslog, however, I did strace the process while it was locked at 100% CPU and it was repeating this system call over and over:
strace: Process 20733 attached
select(0, NULL, NULL, NULL, {tv_sec=2, tv_usec=562986}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {tv_sec=3, tv_usec=0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {tv_sec=3, tv_usec=0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {tv_sec=3, tv_usec=0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {tv_sec=3, tv_usec=0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {tv_sec=3, tv_usec=0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {tv_sec=3, tv_usec=0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {tv_sec=3, tv_usec=0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {tv_sec=3, tv_usec=0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {tv_sec=3, tv_usec=0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {tv_sec=3, tv_usec=0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {tv_sec=3, tv_usec=0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {tv_sec=3, tv_usec=0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {tv_sec=3, tv_usec=0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {tv_sec=3, tv_usec=0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {tv_sec=3, tv_usec=0}) = 0 (Timeout)