Description
Proposal: runtime/trace flight recording
Updated: 23 September 2023
Background
"Flight recording" is a technique in which trace data is kept in a conceptual circular buffer, flushed upon request. The purpose of this technique is to capture traces of interesting program behavior, even when one does not know ahead of time when that will happen. For instance, if the web service fails a health check, or the web service takes an abnormally long time to handle a request. Specifically, the web service can identify such conditions when they happen, but the programmer setting up the environment can't predict when exactly they will occur. Starting tracing after something interesting happens also tends not to be useful, because the program has already executed the interesting part.
The Java ecosystem has had this for years through Java's flight recorder. Once the JVM's flight recorder is enabled, the JVM can obtain a trace representing the last few seconds of time. This trace can come from triggers set up in JMX, or by passing a flag to the JVM that dumps a trace on exit.
With the implementation of #60773 now approaching a stable point, hopefully in Go 1.22 we'll have all traces become series of self-contained partitions. This implementation change presents an opportunity to easily add something similar to the Go execution tracer by always retaining at least one partition that can be snapshotted at any time.
This is also enabled by work in the Go 1.21 release to make traces dramatically cheaper. Because flight recording relies on waiting until something interesting happens, tracing needs to be enabled for a much longer period of time. Enabling flight recording across, for example, a small portion of a production fleet, becomes much more palatable when the tracing itself isn't too expensive.
Design
The core of the design is a new API in the runtime/trace
package to enable flight recording. This means that programs can be instrumented with their own triggers.
package trace
// FlightRecorder represents a flight recording configuration.
//
// Flight recording can be thought of as a moving window over
// the trace data, with the window always containing the most
// recent trace data.
//
// Only one flight recording may be active at any given time.
// This restriction may be removed in the future.
type FlightRecorder struct {
...
}
// NewFlightRecorder creates a new flight recording configuration.
func NewFlightRecorder() *FlightRecorder
// SetMinAge sets a lower bound on the age of an event in the flight recorder's window.
//
// The flight recorder will strive to promptly discard events older than the minimum age,
// but older events may appear in the window snapshot. The age setting will always be
// overridden by SetMaxSize.
//
// The initial minimum age is implementation defined, but can be assumed to be on the order
// of seconds.
//
// Adjustments to this value will not apply to an active flight recorder.
func (*FlightRecorder) SetMinAge(d time.Duration)
// MinAge returns the current MinAge setting.
func (*FlightRecorder) MinAge() time.Duration
// SetMaxSize sets an upper bound on the size of the window in bytes.
//
// This setting takes precedence over SetMinAge.
// However, it does not make any guarantees on the size of the data WriteTo will write,
// nor does it guarantee memory overheads will always stay below MaxBytes. Treat it
// as a hint.
//
// The initial size is implementation defined.
//
// Adjustments to this value will not apply to an active flight recorder.
func (*FlightRecorder) SetMaxBytes(bytes uint64)
// MaxBytes returns the current MaxBytes setting.
func (*FlightRecorder) MaxBytes() uint64
// Start begins flight recording. Only one flight recorder and one call to trace.Start may be active
// at any given time. Returns an error if starting the flight recorder would violate this rule.
func (*FlightRecorder) Start() error
// Stop ends flight recording. It waits until any concurrent WriteTo calls exit.
// Returns an error if the flight recorder is inactive.
func (*FlightRecorder) Stop() error
// Enabled returns true if the flight recorder is active. Specifically, it will return true if Start did
// not return an error, and Stop has not yet been called.
// It is safe to call from multiple goroutines simultaneously.
func (*FlightRecorder) Enabled() bool
// WriteTo takes a snapshots of the circular buffer's contents and writes the execution data to w.
// Returns the number of bytes written and an error. Only one goroutine may execute WriteTo at a time.
// An error is returned upon failure to write to w, if another WriteTo call is already in-progress,
// or if the flight recorder is inactive.
func (*FlightRecorder) WriteTo(w io.Writer) (n int64, err error)
Implementation
Most of the implementation work has already been done. I think it would be OK to ship the implementation in golang.org/x/exp/trace, even though it has some efficiencies (like, having to copy buffers outside the runtime). We could make this more efficient by reference-counting the runtime buffers to avoid a copy.
The main thing that definitely needs to change, however, is that the flight recorder needs to be able to run simultaneously with a call to trace.Start
, which is currently not possible, since golang.org/x/exp/trace
uses trace.Start
itself. This can be implemented by having the trace reader goroutine (misnomer) write buffers to multiple writers. All we need to do is call traceAdvance
simultaneously with adding a new writer, so that the new writer always starts receiving data on a generation boundary.
Discussion
SetMinAge
and SetMaxBytes
could give more rigid guarantees, but it's both complex to implement and not terribly useful. The primary use-case for SetMinAge
is to allow users to ask for longer traces (for example, if a web service's "long request" means something much longer than a handful of seconds). Meanwhile the primary use-case of SetMaxBytes
is to control memory overheads and limit the damage caused by a large SetMinAge
.
WriteTo
could allow multiple goroutines to call it since it could easily serialize them internally. However, this can create some bad situations. For instance, consider some snapshot trigger condition that causes multiple goroutines to call WriteTo
. The call is heavyweight and they'll queue up behind each other; the longest one will likely take quite a while to resolve, and the application will be significantly disrupted. It'll also produce traces that aren't very useful (consisting of short partitions corresponding approximately to the duration of the last WriteTo
call) unless we also allow for multiple goroutines to read the same partition's buffers. However, that's going to be fairly complicated to implement, and also doesn't really add much value either, since it's just duplicate data. The current design encourages callers reduces the risk of run-time panics while also side-stepping these issues by returning an error in this case.
Alternatives considered
External circular buffer
@dominikh has suggested adding a similar feature to gotraceui. Because the partitioning is actually baked into the trace's structure, it's possible for trace consumers to implement something similar themselves. The only thing missing is a standard streaming endpoint for execution traces (see follow-up work).
However, there are a few advantages to adding support directly to the standard library and runtime.
- It opens the door to taking trace snapshots before a program crashes, which is going to be tricky to make work in general from outside the runtime.
- It's more efficient in that no trace data needs to be processed (or sent to another process, or over the network) until it's actually needed.
- Any external circular buffer has to make decisions solely on trace content. There's a minor ease-of-use improvement for those doing ad-hoc debugging, since they don't have to model their conditions in the
runtime/trace
package's annotations, but can instead decide when to grab a snapshot directly. - The possibility for more control over trace partitioning (via
SetPeriod
andSetSize
). Any external solution will be at the mercy of the runtime's defaults.
Despite these advantages, it's likely worth pursuing support for such a use-case even if the API described in this proposal is made available. A shortcoming of this document's proposal is that there's no way to trigger a snapshot explicitly against trace data, only program logic. Handling traces externally also means the ability to perform ad-hoc analyses without the need for additional instrumentation.
Follow-up work
Add support for trace streaming endpoints to net/http/pprof
As stated in the discussion of the "external circular buffer" alternative, we could support that alternative easily and well by just adding a standard debugging endpoint for streaming trace data. It probably makes the most sense to just add new query parameters to the existing trace
endpoint; the details of that can be hashed out elsewhere.
Metadata
Metadata
Assignees
Type
Projects
Status