Description
What version of Go are you using (go version
)?
go version go1.8.1 windows/amd64
What operating system and processor architecture are you using (go env
)?
set GOARCH=amd64
set GOBIN=
set GOEXE=.exe
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GOPATH=C:\Users\tmm1\go
set GORACE=
set GOROOT=c:\go
set GOTOOLDIR=c:\go\pkg\tool\windows_amd64
set GCCGO=gccgo
set CC=gcc
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0
set CXX=g++
set CGO_ENABLED=1
set PKG_CONFIG=pkg-config
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
What did you do?
I have a complex application that deals with live video streams that come from a hardware TV tuner device on my home network. The application itself runs on mac, linux, windows and freebsd, but this particular issue only occurs on windows.
The device that produces the video stream is commercially available, produces large uncompressed video streams (~2MB/s), and is very sensitive to latency. It has limited on-board memory, and aggressively drops packets which are not received by the client in a timely fashion.
I've managed to whittle down my app into a simple repro using only stdlib, containing an http server with two routes:
-
/network
, which opens a http connection to the hardware tuner and proxies back the response data -
/file
, which useshttp.ServeFile
to serve up a large video file from disk.
package main
import (
"io"
"log"
"net/http"
)
func main() {
http.HandleFunc("/file", func(w http.ResponseWriter, req *http.Request) {
http.ServeFile(w, req, "video.mpg")
})
http.HandleFunc("/network", func(w http.ResponseWriter, req *http.Request) {
resp, err := http.Get("http://10.0.1.4:5004/auto/v704")
if err != nil {
log.Fatal(err)
}
if resp.StatusCode != 200 {
resp.Body.Close()
log.Fatalf("got http response code %v", resp.StatusCode)
}
defer resp.Body.Close()
w.Header().Set("Transfer-Encoding", "identity")
w.Header().Set("Content-Type", "video/mpeg")
io.Copy(w, resp.Body)
})
http.HandleFunc("/", func(w http.ResponseWriter, req *http.Request) {
w.WriteHeader(404)
})
http.ListenAndServe(":9987", nil)
}
What did you expect to see?
I expect to be able to use curl
against both http endpoints concurrently, to receive data from both the tuner and the file on disk at the same time.
On linux, darwin and freebsd, this code works as expected and there are no problems.
What did you see instead?
On windows, as soon as a request is made to /file
, it starves out the other running goroutine. The end effect is that the goroutine serving the /network
request does not read from its connection to the hardware tuner fast enough, causing the tuner to drop packets and thus impacting the watchability of the video stream.
I have confirmed this behavior with atleast half a dozen other users of my app, who are all using different hardware, different versions of windows (all 64-bit, but mixed between 7, 8 and 10), and have different home network setups.
Further investigation
In an effort to narrow down what's going on, I tried adding several other http handlers and observed the following behaviors:
async file i/o
I tried using the Microsoft/go-winio library to perform async disk i/o when reading the file from disk. This setup still reproduces the behavior, so the scheduler bug triggers with both sync and async i/o apis.
http.HandleFunc("/file-async", func(w http.ResponseWriter, req *http.Request) {
pathp, err := syscall.UTF16PtrFromString("video.mpg")
if err != nil {
log.Fatalf("path failed: %v", err)
}
h, err := syscall.CreateFile(pathp, syscall.GENERIC_READ|syscall.GENERIC_WRITE, 0, nil, syscall.OPEN_EXISTING, syscall.FILE_FLAG_OVERLAPPED, 0)
if err != nil {
log.Fatalf("createfile failed: %v", err)
}
fp, err := winio.MakeOpenFile(h)
if err != nil {
log.Fatalf("makeopenfile failed: %v", err)
}
defer fp.Close()
w.Header().Set("Content-Type", "video/mpeg")
io.Copy(w, fp)
})
disk i/o only (no network)
I thought maybe it was the extra network i/o of serving up the file that was impacting the network connection on the other goroutine, so I swapped in a ioutil.Discard
. I could not reproduce the issue with this setup, so just disk i/o alone is not enough to trigger the scheduler bug.
http.HandleFunc("/file-discard", func(w http.ResponseWriter, req *http.Request) {
fp, err := os.Open("video.mpg")
if err != nil {
log.Fatal(err)
}
defer fp.Close()
w.Header().Set("Content-Type", "video/mpeg")
io.Copy(ioutil.Discard, fp)
})
network i/o only (no disk)
Since just the disk i/o didn't trigger the issue, I thought it must be the network. I added this route to send flood bytes over the network, but it also did not reproduce the issue. So just network i/o is also not enough to trigger the scheduler bug.
http.HandleFunc("/zeros", func(w http.ResponseWriter, req *http.Request) {
data := make([]byte, 32768)
for {
w.Write(data)
}
})
rate limited disk i/o
So it appears the scheduler bug only occurs when a goroutine does both disk and network i/o, like with http.ServeFile
.
Since at this point it seemed like the /file
goroutine was hogging the scheduler and moving bytes too quickly, I tried to insert a rate limiter from fujiwara/shapeio. This worked, and is an effective work-around for the bug. With the rate limit in place, the scheduler does not starve out the other goroutines.
http.HandleFunc("/file-ratelimit", func(w http.ResponseWriter, req *http.Request) {
fp, err := os.Open("video.mpg")
if err != nil {
log.Fatal(err)
}
defer fp.Close()
reader := shapeio.NewReader(fp)
reader.SetRateLimit(4 << 20)
w.Header().Set("Content-Type", "video/mpeg")
io.Copy(w, reader)
})
tracer
I collected a 10s trace from the reproduction. During the first 5s only /network
was active, and then during the last half the /file
goroutine was also running.
http.HandleFunc("/trace", func(w http.ResponseWriter, req *http.Request) {
fp, err := os.OpenFile("trace.out", os.O_CREATE|os.O_WRONLY|os.O_TRUNC, 0644)
if err != nil {
log.Fatal(err)
}
trace.Start(fp)
time.Sleep(10 * time.Second)
trace.Stop()
})
I'm not terribly familiar with the trace output, but here it is in-case someone else can make more sense of it: