MADV_RANDOM causes significant increase in major page faults in Linux 6.4 and later

Prior to https://github.com/torvalds/linux/commit/8788f6781486769d9598dcaedc3fe0eb12fc3e59 MADV_RANDOM had the documented explicit behavior of preventing the kernel from leveraging read ahead, see posix_madvise(2) man page. It also hinted to the kernel to retain pages in page cache. Since this commit, present in 6.4 and later, the implicit behavior is actually inverted and the kernel free pages aggressively because it short circuits the kernel's second chance LRU mechanisms.

In moderate to heavily loaded OpenShift clusters the net outcome is that an etcd compaction triggered by Kube API Server every five minutes which normally takes no more than 900ms could take up to 20s. Hosts that normally had near zero major page faults were seeing upwards of 600 faults per second.

In our testing, removing the MADV_RANDOM hint restored previous performance and had no observable increase in overall memory usage. We recommend removing this mmap hint on all versions of Linux as it seems to have no negative impact on 6.3 and earlier kernels. An alternative would be to keep MADV_RANDOM but enable mlock(2) at the same time.

The hint was originally added here https://github.com/etcd-io/bbolt/commit/88f777f332022ad2b92be5ceccf1863e9fb4d53f

CC @dusk125 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MADV_RANDOM causes significant increase in major page faults in Linux 6.4 and later #939

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MADV_RANDOM causes significant increase in major page faults in Linux 6.4 and later #939

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions