Skip to content

Scrubbing exhausts all available memory #11574

Closed
@arthurfabre

Description

@arthurfabre

System information

Type Version/Name
Distribution Name Debian
Distribution Version Sid
Linux Kernel 5.10.13
Architecture ppc64le
ZFS Version 2.0.2
SPL Version 2.0.2

I can also reproduce this using:

  • kernel 5.10.13 with ZFS 2.01
  • kernel 5.9.11 with ZFS 2.0.1, 2.0.2

But not with ZFS 0.8.6. I can't reproduce it at all on a similar x86 system.

Describe the problem you're observing

When scrubbing a dataset (4 drive raidz2) memory usage rises until all system memory is exhausted, and the kernel panics.
If the scrub is stopped before the kernel panics (zpool scrub -s), memory usage drops back to the same level as before the scrub was started.

Describe how to reproduce the problem

This script reproduces the problem:

#!/bin/bash

function dump {
    free -m > free."$1".txt
    cat  /proc/spl/kmem/slab > spl-slab."$1".txt
    sudo slabtop -o > slabtop."$1".txt
    sudo cat /proc/slabinfo > slabinfo."$1".txt
    cat /proc/meminfo > meminfo."$1".txt
}

dump before
sudo zpool scrub data
sleep 30
dump during
sudo zpool scrub -s data
dump after

The used memory increases from 8GB to 72GB in 30 seconds, and returns to 8GB after the scrub is stopped. vmalloc seems responsible for the majority of this:

VmallocUsed
Before 2.4 GB (2389248 KB)
During 68 GB (68183296 KB)
After 2.4 GB (2408192 KB)

meminfo.before.txt
slabinfo.before.txt
slabtop.before.txt
spl-slab.before.txt
free.before.txt

meminfo.during.txt
slabinfo.during.txt
slabtop.during.txt
spl-slab.during.txt
free.during.txt

meminfo.after.txt
slabinfo.after.txt
slabtop.after.txt
spl-slab.after.txt
free.after.txt

Include any warning/errors/backtraces from the system logs

Last kernel logs (including OOM killer running) before kernel panics (unfortunately the panic does not get logged to disk):
oom.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    Status: Triage NeededNew issue which needs to be triagedType: DefectIncorrect behavior (e.g. crash, hang)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions