Closed
Description
The SudoG recycling code is very sloppy about zeroing fields that are no longer needed. We have already fixed elem and selectdone, but next and waitlink are basically never cleared either. This leads in real programs to an arbitrarily large number of SudoG's leaking. Specifically, the waitlink field is set during select but not cleared until the SudoG is next reused for a channel send/recv or select. Uses of a SudoG for sync.Cond don't set or use waitlink at all. Similarly, the next field is left intact on return from releaseSudog, and worse it is left intact when the cache head is nil'ed out. The effect of this is that after a select the entries added to the SudoG cache have a .waitlink list pointing in the opposite direction of the .next list in the cache. Operations by sync.Cond can reorder entries on the cache .next list without affecting the .waitlink list, creating the possibility of a .waitlink pointing at basically the rest of the list. Then if that entry is being used by a sync.Cond.Wait when a GC happens and c.sudogcache is set to nil, nothing is collected, because the entry blocked in sync.Cond.Wait has a waitlist pointing at the whole list that was just "discarded". Even worse, when that sync.Cond.Wait finishes and puts that SudoG into c.sudogcache, the .waitlink will still be dangling back to the earlier list, pinning the whole thing in memory. In programs with the right mix of non-trivial selects and sync.Cond operations end up with allocated SudoGs that are heavily cross-linked by all these dangling pointers. Setting c.sudogcache = nil does make sure that the SudoGs stop being used, but chances are very good that there is a dangling .waitlink (probably many) pointing into the list that keeps a large fraction of it from being collected. And then on the next collection chances are very good that that SudoG keeping the previous generation from being collected itself finds a way not to get collected, causing the leaks to pile up aribtrarily. Below is a simple program that leaks basically all of its SudoG in each iteration of the for { }. Attached is a picture of the SudoGs for the first iteration, showing the next and waitlink links and how nothing gets collected. This is an abstracted version of something I observed in a Google production server testing Go 1.4. package main import ( "runtime" "runtime/debug" "sync" "time" ) func main() { debug.SetGCPercent(1000000) // only GC when we ask for GC release := func() {} for { c := make(chan int) for i := 0; i < 1000; i++ { go func() { select { case <-c: case <-c: case <-c: } }() } time.Sleep(10 * time.Millisecond) release() close(c) // let select put its sudog's into the cache time.Sleep(10 * time.Millisecond) // pick up top sudog var cond1 sync.Cond var mu1 sync.Mutex cond1.L = &mu1 go func() { mu1.Lock() cond1.Wait() mu1.Unlock() }() time.Sleep(1 * time.Millisecond) // pick up next sudog var cond2 sync.Cond var mu2 sync.Mutex cond2.L = &mu2 go func() { mu2.Lock() cond2.Wait() mu2.Unlock() }() time.Sleep(1 * time.Millisecond) // put top sudog back cond1.Broadcast() time.Sleep(1 * time.Millisecond) // drop cache on floor runtime.GC() // release cond2 after select has gotten to run release = func() { cond2.Broadcast() time.Sleep(1 * time.Millisecond) } } }
Attachments:
- IMG_1907.JPG (377479 bytes)