Skip to content

kernel slab high memory usage during scrub OOM kill other applications #11429

Closed
@ufou

Description

@ufou

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 18.04
Linux Kernel 5.4.0-58-generic
Architecture amd64
ZFS Version 0.8.3-1ubuntu12.5
SPL Version 0.8.3-1ubuntu12.5

Describe the problem you're observing

We run the HWE Ubuntu kernel which therefore means we get the 0.8.* version of zfs/spl, our issue is probably the same as:
#8662

We run MySQL (Mariadb, actually) using zfs volumes for data and backup space (separate volumes), we run a scrub from cron every 4 weeks which takes ~4 hours, on our replicas the scrub generally completes without issue but with the primary we have seen MySQL crash (OOM killed on the last crash)

The servers are Intel Xeon Gold, with 512Gb RAM, disks are 6 x Intel S4510 SSD 3.8Tb in 3 x mirrored sets

Describe how to reproduce the problem

Start a scrub on the data volume, then watch meminfo for Unreclaim usage:

zpool scrub mysqldata
Every 2.0s: cat /proc/meminfo | grep claim                                                                                                                          
Mon Jan  4 19:04:27 2021

KReclaimable:    2442512 kB
SReclaimable:    2442512 kB
SUnreclaim:      1932272 kB 

after 30s later:

Every 2.0s: cat /proc/meminfo | grep claim                                                                                                                          
Mon Jan  4 19:05:02 2021

KReclaimable:    2442976 kB
SReclaimable:    2442976 kB
SUnreclaim:      7637196 kB

Then issue the stop:

zpool scrub -s mysqldata

Check again:

Every 2.0s: cat /proc/meminfo | grep claim                                                                                                                          
Mon Jan  4 19:06:05 2021

KReclaimable:    2442976 kB
SReclaimable:    2442976 kB
SUnreclaim:      1970984 kB

I was unable to alter the behaviour of the SUnreclaim meminfo value by changing any of /sys/module/zfs/parameters/zfs_scan_mem_lim_fact, /sys/module/zfs/parameters/zfs_scan_mem_lim_soft_fact or by adding /sys/module/zfs/parameters/zfs_scrub_delay (permission denied as root)

Include any warning/errors/backtraces from the system logs

cat /proc/meminfo | grep claim
KReclaimable:    2453676 kB
SReclaimable:    2453676 kB
SUnreclaim:     16378036 kB
cat /proc/slabinfo  | grep sio_cache
sio_cache_2       2310396 2310528    168   48    2 : tunables    0    0    0 : slabdata  48136  48136      0
sio_cache_1       237122 237122    152   53    2 : tunables    0    0    0 : slabdata   4474   4474      0
sio_cache_0       106508040 106508040    136   30    1 : tunables    0    0    0 : slabdata 3550268 3550268      0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Status: Triage NeededNew issue which needs to be triagedType: DefectIncorrect behavior (e.g. crash, hang)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions