Description
Hello,
On our platform (LGI boxes, ARM / Broadcom soc, e,g BCM72180, and other Broadcom platforms we have) we are experiencing the issue where sometimes wpe GC seems to not able to reclaim the memory after exiting some of the heavier web apps. Some example are Sky apps:
https://stv.prd.sky.ch/store/
https://stv.prd.sky.ch/show/
but similar effect could be observed e.g. with Apple TV+ (https://atve.tv.apple.com/94819831-5404-4438-810e-afb648d6a826/tvw_0e7f0489b969451a9f79941a0d18fad1/). These apps normally needs more than 200MB to work, and we have a container cgroup limit of 550MB, so when the memory is not reclaimed quickly enough, wpe gets killed.
Often when these applications exit, I can see that even after GC Full Collection (seen with JSC_logGC=1) the memory is still not reclaimed. Like here - GC was run, spent quite a lot of time, visited a lot of memory, but still wasn't able to release much ('Full sweep: 142348kb => 141164kb')
Jul 09 10:00:19 E0B7B1-APLSTB-300037462300 wpe.sh[13290]: [GC<0x8aff5050>: START M 142134kb => FullCollection, v=0kb (C:0 M:0 P1:0) o=0 b=15866 i#1:N<CsMsrShDMsm(0)> 1+0 v=109974kb (C:55860 M:0 P1:54113) o=11 b=15866 i#2:N 366+0 v=109990kb (C:55872 M:0 P1:54118) o=13 b=15866 i#3:P 0+0 v=109990kb (C:55872 M:0 P1:54118) o=13 b=15866 i#4:P<WsOJwMsrShCsMsm(0)DDomoCb> => 120237kb, p=2353.949000ms (max 2353.949000), cycle 2353.843000ms END]
Jul 09 10:00:19 E0B7B1-APLSTB-300037462300 wpe.sh[13290]: GC END!
Jul 09 10:00:19 E0B7B1-APLSTB-300037462300 wpe.sh[13290]: [GC<0x8aff5050>: finalize 49.610000ms]
1720512019 209
Jul 09 10:00:19 E0B7B1-APLSTB-300037462300 wpe.sh[13290]: [GC<0x8aff5050>: Full sweep: 142348kb => 141164kb, 78.841000ms]
In our setup, each time we leave any app, we return to metro '#boot' (https://widgets.metrological.com/lightning/liberty/2e3c4fc22f0d35e3eb7fdb47eb7d4658#boot). This 'application' is pretty small, and normally uses maybe 20MB of ram. But even when we wait for some time and then browse to some other webapp - the memory often is not released, keeps on piling up, and so we run OOM after several iterations.
I've tried with some upstream changes, like this one:
Also tried some additional modifications to GC process, like here:
(for example I've modified Source/bmalloc/bmalloc/AvailableMemory.cpp to check for WPE_RAM_SIZE when determining available memory; we are running wpe in lxc container, so there are different limits that what sysinfo returns I believe. but this doesn't seem to fix the problem by itself; also - I've enabled some bmalloc verbose flags & from what I've seen it doesn't seem that bmalloc is to be blamed here)
I was also checking out many JSC options, like:
JSC_forceRAMSize=576716800
JSC_mediumHeapGrowthFactor=1.1
JSC_smallHeapGrowthFactor=1.1
JSC_largeHeapGrowthFactor=1.1
JSC_smallHeapRAMFraction=0.25
JSC_mediumHeapRAMFraction=0.5
JSC_customFullGCCallbackBailThreshold=1.0
JSC_maximumMutatorUtilization=0.6
JSC_minimumGCPauseMS=1
JSC_useStochasticMutatorScheduler=false
JSC_gcIncrementScale=1
JSC_criticalGCMemoryThreshold=0.5
JSC_forceDidDeferGCWork=true
JSC_useGlobalGC=true
Some of these sometimes seem to help for a while, but then - the app runs into the same problem.
I've also added some logic (on thunder plugin side) to force GC in regular intervals after the application returns to #boot app (via webkit_web_context_garbage_collect_javascript_objects that ends up calling WebProcess::garbageCollectJavaScriptObjects); this generally helps, but even this is not 100% reliable. Even if this sometimes makes the situation better, at times I still run into the scenario when the app memory is not reclaimed, even when GC is invoked manually from the inspector (via $vm.gc(), if enabled via JSC_enableDollarVM=true). This behavior seems a bit random - for example it might start working fine after the reboot. I've checked wpe 2.22, and didn't observe these issues; memory is reclaimed pretty quickly when I leave the app.
Looking into web inspector / Timelines / memory tab, I see that most of this 'unreclaimable' memory is still counted as 'JavaScript' - in line with what I can see in the GC run results.
So for example, on our platform, the memory graph for 2.22 versus 2.38 compares like this:
for 2.22: see attached wpe-2.22-graph.png
for 2.38: see attached wpe-2.38-graph,png
So while 2.22 was running the test scenario just fine, 2.38 OOMed pretty quickly.
I've tried similar test on wpe 2.38 with different platform, the 'Video Accelerator' (https://rdkcentral.com/rdk-video-accelerator). Here, there are no containers, but when I check the WPEWebProcess Rss memory after exiting these apps, I see something like this: (checked after each iteration of 'enter the app - exit to #boot')
RssAnon: 14580 kB
RssAnon: 14580 kB
RssAnon: 211136 kB
RssAnon: 215216 kB
RssAnon: 67124 kB
RssAnon: 228000 kB
RssAnon: 228052 kB
RssAnon: 236304 kB
RssAnon: 381584 kB
RssAnon: 234400 kB
RssAnon: 234400 kB
RssAnon: 235620 kB
So sometimes the memory gets cleaned up, sometimes - keeps on piling up. With our 550MB memory limit, 381584 kB in idle state might be enough to run into OOM when we try to start eg. Sky Store app.
Do you have some clues what could cause the difference in GC behaviour in 2.38, maybe there are some other config options we are not aware of?