Skip to content

Deadlock when writing to ZFS 2.0.1 filesystem from Docker container #11476

Closed
@Do-while-Bacon

Description

@Do-while-Bacon

System information

Type Version/Name
Distribution Name CentOS
Distribution Version 7.9.2009 (Core)
Linux Kernel 3.10.0-1160.11.1
Architecture x86_64
ZFS Version 2.0.1 (kmod)
SPL Version 2.0.1

Describe the problem you're observing

After updating my NAS from ZFS 0.8.5 to 2.0.1 and performing a zpool upgrade, I started encountering an issue where my docker containers (running local on the NAS) that mounted ZFS volumes would deadlock at 100% CPU as soon as they tried to perform any heavy write operations to the pool. I observed this behavior with both the linuxserver.io Sabnzbd and qBittorrent containers. The containers would appear to function normally until I tried to download a Linux ISO, then, the download would get stuck, container would lock at 100% CPU, and nothing would work to kill or stop the container until I rebooted.

I was able to work around this issue by downgrading ZFS packages to 2.0.0. Everything is working correctly again.

Describe how to reproduce the problem

Create a RAID-Z2 pool using OpenZFS 2.0.1 on CentOS 7.9 (my pool has a both an L2ARC and a SLOG device)
Install Docker CE 20.10 (problem occurs with 19.03 too)
Launch a linuxserver.io Sabnzbd container, passing a ZFS volume to /config and /downloads
Attempt to download a NZB
Download will begin and then immediately deadlock

Include any warning/errors/backtraces from the system logs

There was no relevant log output from the docker application or in syslog, however, I did strace the process while it was locked at 100% CPU and it was repeating this system call over and over:

strace: Process 20733 attached
select(0, NULL, NULL, NULL, {tv_sec=2, tv_usec=562986}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {tv_sec=3, tv_usec=0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {tv_sec=3, tv_usec=0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {tv_sec=3, tv_usec=0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {tv_sec=3, tv_usec=0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {tv_sec=3, tv_usec=0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {tv_sec=3, tv_usec=0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {tv_sec=3, tv_usec=0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {tv_sec=3, tv_usec=0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {tv_sec=3, tv_usec=0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {tv_sec=3, tv_usec=0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {tv_sec=3, tv_usec=0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {tv_sec=3, tv_usec=0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {tv_sec=3, tv_usec=0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {tv_sec=3, tv_usec=0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {tv_sec=3, tv_usec=0}) = 0 (Timeout)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: DefectIncorrect behavior (e.g. crash, hang)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions