Description
System information
Type | Version/Name |
---|---|
Distribution Name | Ubuntu |
Distribution Version | 18.04 |
Linux Kernel | 5.4.0-58-generic |
Architecture | amd64 |
ZFS Version | 0.8.3-1ubuntu12.5 |
SPL Version | 0.8.3-1ubuntu12.5 |
Describe the problem you're observing
We run the HWE Ubuntu kernel which therefore means we get the 0.8.* version of zfs/spl, our issue is probably the same as:
#8662
We run MySQL (Mariadb, actually) using zfs volumes for data and backup space (separate volumes), we run a scrub from cron every 4 weeks which takes ~4 hours, on our replicas the scrub generally completes without issue but with the primary we have seen MySQL crash (OOM killed on the last crash)
The servers are Intel Xeon Gold, with 512Gb RAM, disks are 6 x Intel S4510 SSD 3.8Tb in 3 x mirrored sets
Describe how to reproduce the problem
Start a scrub on the data volume, then watch meminfo for Unreclaim usage:
zpool scrub mysqldata
Every 2.0s: cat /proc/meminfo | grep claim
Mon Jan 4 19:04:27 2021
KReclaimable: 2442512 kB
SReclaimable: 2442512 kB
SUnreclaim: 1932272 kB
after 30s later:
Every 2.0s: cat /proc/meminfo | grep claim
Mon Jan 4 19:05:02 2021
KReclaimable: 2442976 kB
SReclaimable: 2442976 kB
SUnreclaim: 7637196 kB
Then issue the stop:
zpool scrub -s mysqldata
Check again:
Every 2.0s: cat /proc/meminfo | grep claim
Mon Jan 4 19:06:05 2021
KReclaimable: 2442976 kB
SReclaimable: 2442976 kB
SUnreclaim: 1970984 kB
I was unable to alter the behaviour of the SUnreclaim
meminfo value by changing any of /sys/module/zfs/parameters/zfs_scan_mem_lim_fact
, /sys/module/zfs/parameters/zfs_scan_mem_lim_soft_fact
or by adding /sys/module/zfs/parameters/zfs_scrub_delay
(permission denied as root)
Include any warning/errors/backtraces from the system logs
cat /proc/meminfo | grep claim
KReclaimable: 2453676 kB
SReclaimable: 2453676 kB
SUnreclaim: 16378036 kB
cat /proc/slabinfo | grep sio_cache
sio_cache_2 2310396 2310528 168 48 2 : tunables 0 0 0 : slabdata 48136 48136 0
sio_cache_1 237122 237122 152 53 2 : tunables 0 0 0 : slabdata 4474 4474 0
sio_cache_0 106508040 106508040 136 30 1 : tunables 0 0 0 : slabdata 3550268 3550268 0