Skip to content

runtime: sudog leak #9110

Closed
Closed
@rsc

Description

@rsc
The SudoG recycling code is very sloppy about zeroing fields that are no longer needed.
We have already fixed elem and selectdone, but next and waitlink are basically never
cleared either. This leads in real programs to an arbitrarily large number of SudoG's
leaking. 

Specifically, the waitlink field is set during select but not cleared until the SudoG is
next reused for a channel send/recv or select. Uses of a SudoG for sync.Cond don't set
or use waitlink at all. Similarly, the next field is left intact on return from
releaseSudog, and worse it is left intact when the cache head is nil'ed out.

The effect of this is that after a select the entries added to the SudoG cache have a
.waitlink list pointing in the opposite direction of the .next list in the cache.
Operations by sync.Cond can reorder entries on the cache .next list without affecting
the .waitlink list, creating the possibility of a .waitlink pointing at basically the
rest of the list. Then if that entry is being used by a sync.Cond.Wait when a GC happens
and c.sudogcache is set to nil, nothing is collected, because the entry blocked in
sync.Cond.Wait has a waitlist pointing at the whole list that was just
"discarded". Even worse, when that sync.Cond.Wait finishes and puts that SudoG
into c.sudogcache, the .waitlink will still be dangling back to the earlier list,
pinning the whole thing in memory. 

In programs with the right mix of non-trivial selects and sync.Cond operations end up
with allocated SudoGs that are heavily cross-linked by all these dangling pointers.
Setting c.sudogcache = nil does make sure that the SudoGs stop being used, but chances
are very good that there is a dangling .waitlink (probably many) pointing into the list
that keeps a large fraction of it from being collected. And then on the next collection
chances are very good that that SudoG keeping the previous generation from being
collected itself finds a way not to get collected, causing the leaks to pile up
aribtrarily.

Below is a simple program that leaks basically all of its SudoG in each iteration of the
for { }.

Attached is a picture of the SudoGs for the first iteration, showing the next and
waitlink links and how nothing gets collected.

This is an abstracted version of something I observed in a Google production server
testing Go 1.4.

package main

import (
    "runtime"
    "runtime/debug"
    "sync"
    "time"
)

func main() {
    debug.SetGCPercent(1000000) // only GC when we ask for GC

    release := func() {}
    for {
        c := make(chan int)
        for i := 0; i < 1000; i++ {
            go func() {
                select {
                case <-c:
                case <-c:
                case <-c:
                }
            }()
        }
        time.Sleep(10 * time.Millisecond)
        release()

        close(c) // let select put its sudog's into the cache
        time.Sleep(10 * time.Millisecond)

        // pick up top sudog
        var cond1 sync.Cond
        var mu1 sync.Mutex
        cond1.L = &mu1
        go func() {
            mu1.Lock()
            cond1.Wait()
            mu1.Unlock()
        }()
        time.Sleep(1 * time.Millisecond)

        // pick up next sudog
        var cond2 sync.Cond
        var mu2 sync.Mutex
        cond2.L = &mu2
        go func() {
            mu2.Lock()
            cond2.Wait()
            mu2.Unlock()
        }()
        time.Sleep(1 * time.Millisecond)

        // put top sudog back
        cond1.Broadcast()
        time.Sleep(1 * time.Millisecond)

        // drop cache on floor
        runtime.GC()

        // release cond2 after select has gotten to run
        release = func() {
            cond2.Broadcast()
            time.Sleep(1 * time.Millisecond)
        }
    }
}

Attachments:

  1. IMG_1907.JPG (377479 bytes)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions