Skip to content

Cache-unfriendly filesystem usage, memory fragmentation and ARC #16978

Open
@runderwo

Description

@runderwo

System information

Type Version/Name
Distribution Name Ubuntu/Debian
Distribution Version LTS/latest
Kernel Version 6.8-6.11
Architecture x86_64
OpenZFS Version 2.2.2-2.2.6

Describe the problem you're observing

After moderate uptime of a few weeks, when a program tries to read or index the whole filesystem or a large chunk of it, the system seizes up, becomes unresponsive to input/network for 15-20 minutes. Eventually it recovers to a sluggish but usable state (with the offending process still running, consuming core time and disk I/O) where a tool like atop can be used to observe lingering heavy free page scan activity, - despite up to 10GiB of free/avail memory! (Linux page cache has been zeroed by this time.)

ARC is maxed out at 97% (almost 50% of system RAM according to the default settings).

Examining /proc/buddyinfo, there are no free pages >= 1MiB in the steady state and can be even worse right after the "seizure" with no free pages >= 128KiB.

I suspect the partial recovery is thanks to kcompactd activity. I am thinking that ZFS should drop cached file blocks from ARC not just when the kernel low watermark is reached, but also when higher order free pages become exhausted.

Describe how to reproduce the problem

Simulate normal memory fragmentation on a host, including multiple hibernate/resume cycles, then run duplicity, tracker3-miner, or similar programs which ingest the whole filesystem in a cache-unfriendly and ZFS-unfriendly way while monitoring the situation with atop.

Include any warning/errors/backtraces from the system logs

Dec 22 16:09:02 desktop kernel: zfs: module license 'CDDL' taints kernel.
Dec 22 16:09:02 desktop kernel: Disabling lock debugging due to kernel taint
Dec 22 16:09:02 desktop kernel: zfs: module license taints kernel.
Dec 22 16:09:02 desktop kernel: calling  openzfs_init+0x0/0xce0 [zfs] @ 428
Dec 22 16:09:02 desktop kernel: ZFS: Loaded module v2.2.2-0ubuntu9.1, ZFS pool version 5000, ZFS filesystem version 5
[..]
Jan 22 14:05:06 desktop systemd-journald[935]: Under memory pressure, flushing caches.
[..]
Jan 22 14:16:31 desktop kernel: INFO: task chrome:3547537 blocked for more than 122 seconds.
Jan 22 14:16:31 desktop kernel:       Tainted: P           OE      6.8.0-51-generic #52-Ubuntu
Jan 22 14:16:31 desktop kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.```
[etc, etc]

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: DefectIncorrect behavior (e.g. crash, hang)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions