Description
It looks like the immediate caller of runtime.memmove
is reported correctly in runtime/pprof
profiles, but is missing in perf
's framepointer-based unwinding. I don't know whether this affects callers of other assembly functions, or if it's specific to memmove
. CC @golang/runtime , and probably also of interest to @felixge
What version of Go are you using (go version
)?
$ go1.20 version go version go1.20.1 darwin/arm64
Does this issue reproduce with the latest release?
Yes, go1.20.1 is the latest stable release. I first identified this behavior in production use with go1.19.6.
What operating system and processor architecture are you using (go env
)?
For the reproducer, I'm cross-compiling from darwin/arm64 to linux/amd64. The production behavior I'm reproducing was with linux/amd64 builds for linux/amd64.
go env
Output
$ go1.20 env GO111MODULE="" GOARCH="arm64" GOBIN="" GOCACHE="/Users/rhys/Library/Caches/go-build" GOENV="/Users/rhys/Library/Application Support/go/env" GOEXE="" GOEXPERIMENT="" GOFLAGS="" GOHOSTARCH="arm64" GOHOSTOS="darwin" GOINSECURE="" GOMODCACHE="/Users/rhys/go/pkg/mod" GONOPROXY="*" GONOSUMDB="*" GOOS="darwin" GOPATH="/Users/rhys/go" GOPRIVATE="*" GOPROXY="direct" GOROOT="/Users/rhys/go/version/go1.20" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/Users/rhys/go/version/go1.20/pkg/tool/darwin_arm64" GOVCS="" GOVERSION="go1.20.1" GCCGO="gccgo" AR="ar" CC="clang" CXX="clang++" CGO_ENABLED="1" GOMOD="/dev/null" GOWORK="" CGO_CFLAGS="-O2 -g" CGO_CPPFLAGS="" CGO_CXXFLAGS="-O2 -g" CGO_FFLAGS="-O2 -g" CGO_LDFLAGS="-O2 -g" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/lf/n4ry0qv5639_3z0nhyhy3z7c0000gs/T/go-build3223373974=/tmp/go-build -gno-record-gcc-switches -fno-common"
What did you do?
I used Linux's perf
to report collect data on an app's performance using the -g
flag to collect stack traces using the frame pointer.
$ cat /tmp/frames_test.go
package main
import "testing"
func BenchmarkMemmove(b *testing.B) {
src := make([]byte, 1<<15)
dst := make([]byte, len(src))
for i := 0; i < b.N; i++ {
root(dst, src)
}
}
//go:noinline
func root(dst, src []byte) {
leaf(dst, src)
}
//go:noinline
func leaf(dst, src []byte) {
copy(dst, src)
}
$ GOOS=linux GOARCH=amd64 go1.20 test -c -o /tmp/memmove-fp /tmp/frames_test.go
# [scp to linux/amd64 machine, run the following there]
$ sudo perf record -o /tmp/mm.perf.data -g -F 99 -- ./memmove-fp -test.bench=. -test.cpuprofile=/tmp/mm.pb.gz
goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz
BenchmarkMemmove-4 1511648 795.6 ns/op
PASS
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.028 MB /tmp/mm.perf.data (197 samples) ]
What did you expect to see?
I expected go tool pprof
and perf record -g
+perf script
to agree on the call stacks, including the identity of the immediate caller of runtime.memmove
. I expected the function named command-line-arguments.leaf
to show up in the call stack, with an immediate caller of command-line-arguments.root
and an immediate callee of runtime.memmove
.
What did you see instead?
With go tool pprof
, I see reports of CPU time spent in runtime.memmove
, called by command-line-arguments.leaf
, called by command-line-arguments.root
, as I expect.
With perf record -g
and perf script
, I see the on-CPU time attributed to runtime.memmove
as called by command-line-arguments.root
. The ...leaf
frame is missing, which I consider to be a bug.
$ perf script -i /tmp/mm.perf.data | head -n 26
memmove-fp 1265 1124104.950646: 10101010 cpu-clock:pppH:
468641 runtime.memmove+0x421 (/home/ubuntu/memmove-fp)
4f6d85 command-line-arguments.root+0x25 (/home/ubuntu/memmove-fp)
4f6d05 command-line-arguments.BenchmarkMemmove+0x85 (/home/ubuntu/memmove-fp)
4ada02 testing.(*B).runN+0x102 (/home/ubuntu/memmove-fp)
4ae685 testing.(*B).launch+0x1c5 (/home/ubuntu/memmove-fp)
4ae486 testing.(*B).doBench.func1+0x26 (/home/ubuntu/memmove-fp)
467381 runtime.goexit.abi0+0x1 (/home/ubuntu/memmove-fp)
memmove-fp 1265 1124104.960746: 10101010 cpu-clock:pppH:
468641 runtime.memmove+0x421 (/home/ubuntu/memmove-fp)
4f6d85 command-line-arguments.root+0x25 (/home/ubuntu/memmove-fp)
4f6d05 command-line-arguments.BenchmarkMemmove+0x85 (/home/ubuntu/memmove-fp)
4ada02 testing.(*B).runN+0x102 (/home/ubuntu/memmove-fp)
4ae685 testing.(*B).launch+0x1c5 (/home/ubuntu/memmove-fp)
4ae486 testing.(*B).doBench.func1+0x26 (/home/ubuntu/memmove-fp)
467381 runtime.goexit.abi0+0x1 (/home/ubuntu/memmove-fp)
memmove-fp 1265 1124104.970847: 10101010 cpu-clock:pppH:
46864b runtime.memmove+0x42b (/home/ubuntu/memmove-fp)
4f6d85 command-line-arguments.root+0x25 (/home/ubuntu/memmove-fp)
4f6d05 command-line-arguments.BenchmarkMemmove+0x85 (/home/ubuntu/memmove-fp)
4ada02 testing.(*B).runN+0x102 (/home/ubuntu/memmove-fp)
4ae685 testing.(*B).launch+0x1c5 (/home/ubuntu/memmove-fp)
4ae486 testing.(*B).doBench.func1+0x26 (/home/ubuntu/memmove-fp)
467381 runtime.goexit.abi0+0x1 (/home/ubuntu/memmove-fp)
$ go1.20 tool pprof -peek=runtime.memmove /tmp/mmpbgz
File: memmove-fp
Type: cpu
Time: Mar 2, 2023 at 11:40am (PST)
Duration: 2.12s, Total samples = 2010ms (94.82%)
Showing nodes accounting for 2010ms, 100% of 2010ms total
----------------------------------------------------------+-------------
flat flat% sum% cum cum% calls calls% + context
----------------------------------------------------------+-------------
1960ms 100% | command-line-arguments.leaf
1960ms 97.51% 97.51% 1960ms 97.51% | runtime.memmove
----------------------------------------------------------+-------------
$ go1.20 tool pprof -raw /tmp/mmpbgz
PeriodType: cpu nanoseconds
Period: 10000000
Time: 2023-03-02 11:40:52.391789789 -0800 PST
Duration: 2.11
Samples:
samples/count cpu/nanoseconds
47 470000000: 1 2 3 4 5 6
100 1000000000: 7 2 3 4 5 6
24 240000000: 8 2 3 4 5 6
3 30000000: 9 5 6
6 60000000: 10 2 3 4 5 6
12 120000000: 11 2 3 4 5 6
3 30000000: 12 2 3 4 5 6
1 10000000: 13 2 3 4 5 6
2 20000000: 14 5 6
3 30000000: 15 2 3 4 5 6
Locations
1: 0x468641 M=1 runtime.memmove /usr/local/go/src/runtime/memmove_amd64.s:369 s=35
2: 0x4f6e14 M=1 command-line-arguments.leaf /tmp/frames_test.go:20 s=19
3: 0x4f6d84 M=1 command-line-arguments.root /tmp/frames_test.go:15 s=14
4: 0x4f6d04 M=1 command-line-arguments.BenchmarkMemmove /tmp/frames_test.go:9 s=5
5: 0x4ada01 M=1 testing.(*B).runN /usr/local/go/src/testing/benchmark.go:193 s=181
6: 0x4ae684 M=1 testing.(*B).launch /usr/local/go/src/testing/benchmark.go:334 s=293
7: 0x46864b M=1 runtime.memmove /usr/local/go/src/runtime/memmove_amd64.s:371 s=35
8: 0x468661 M=1 runtime.memmove /usr/local/go/src/runtime/memmove_amd64.s:376 s=35
9: 0x4f6ce8 M=1 command-line-arguments.BenchmarkMemmove /tmp/frames_test.go:8 s=5
10: 0x468657 M=1 runtime.memmove /usr/local/go/src/runtime/memmove_amd64.s:374 s=35
11: 0x46865c M=1 runtime.memmove /usr/local/go/src/runtime/memmove_amd64.s:375 s=35
12: 0x468638 M=1 runtime.memmove /usr/local/go/src/runtime/memmove_amd64.s:367 s=35
13: 0x4685cc M=1 runtime.memmove /usr/local/go/src/runtime/memmove_amd64.s:302 s=35
14: 0x4f6d1d M=1 command-line-arguments.BenchmarkMemmove /tmp/frames_test.go:8 s=5
15: 0x468652 M=1 runtime.memmove /usr/local/go/src/runtime/memmove_amd64.s:373 s=35
Mappings
1: 0x400000/0x4f8000/0x0 /home/ubuntu/memmove-fp [FN]
2: 0x7fff85d4c000/0x7fff85d4e000/0x0 [vdso]
3: 0xffffffffff600000/0xffffffffff601000/0x0 [vsyscall]
$ go1.20 tool objdump -s '\.root$' /tmp/memmove-fp
TEXT command-line-arguments.root(SB) /tmp/frames_test.go
frames_test.go:14 0x4f6d60 493b6610 CMPQ 0x10(R14), SP
frames_test.go:14 0x4f6d64 7629 JBE 0x4f6d8f
frames_test.go:14 0x4f6d66 4883ec38 SUBQ $0x38, SP
frames_test.go:14 0x4f6d6a 48896c2430 MOVQ BP, 0x30(SP)
frames_test.go:14 0x4f6d6f 488d6c2430 LEAQ 0x30(SP), BP
frames_test.go:14 0x4f6d74 4889442440 MOVQ AX, 0x40(SP)
frames_test.go:14 0x4f6d79 48897c2458 MOVQ DI, 0x58(SP)
frames_test.go:15 0x4f6d7e 6690 NOPW
frames_test.go:15 0x4f6d80 e85b000000 CALL command-line-arguments.leaf(SB)
frames_test.go:16 0x4f6d85 488b6c2430 MOVQ 0x30(SP), BP
frames_test.go:16 0x4f6d8a 4883c438 ADDQ $0x38, SP
frames_test.go:16 0x4f6d8e c3 RET
frames_test.go:14 0x4f6d8f 4889442408 MOVQ AX, 0x8(SP)
frames_test.go:14 0x4f6d94 48895c2410 MOVQ BX, 0x10(SP)
frames_test.go:14 0x4f6d99 48894c2418 MOVQ CX, 0x18(SP)
frames_test.go:14 0x4f6d9e 48897c2420 MOVQ DI, 0x20(SP)
frames_test.go:14 0x4f6da3 4889742428 MOVQ SI, 0x28(SP)
frames_test.go:14 0x4f6da8 4c89442430 MOVQ R8, 0x30(SP)
frames_test.go:14 0x4f6dad e84ee6f6ff CALL runtime.morestack_noctxt.abi0(SB)
frames_test.go:14 0x4f6db2 488b442408 MOVQ 0x8(SP), AX
frames_test.go:14 0x4f6db7 488b5c2410 MOVQ 0x10(SP), BX
frames_test.go:14 0x4f6dbc 488b4c2418 MOVQ 0x18(SP), CX
frames_test.go:14 0x4f6dc1 488b7c2420 MOVQ 0x20(SP), DI
frames_test.go:14 0x4f6dc6 488b742428 MOVQ 0x28(SP), SI
frames_test.go:14 0x4f6dcb 4c8b442430 MOVQ 0x30(SP), R8
frames_test.go:14 0x4f6dd0 eb8e JMP command-line-arguments.root(SB)
$ go1.20 tool objdump -s '\.leaf$' /tmp/memmove-fp
TEXT command-line-arguments.leaf(SB) /tmp/frames_test.go
frames_test.go:19 0x4f6de0 493b6610 CMPQ 0x10(R14), SP
frames_test.go:19 0x4f6de4 7639 JBE 0x4f6e1f
frames_test.go:19 0x4f6de6 4883ec20 SUBQ $0x20, SP
frames_test.go:19 0x4f6dea 48896c2418 MOVQ BP, 0x18(SP)
frames_test.go:19 0x4f6def 488d6c2418 LEAQ 0x18(SP), BP
frames_test.go:19 0x4f6df4 4889442428 MOVQ AX, 0x28(SP)
frames_test.go:19 0x4f6df9 48897c2440 MOVQ DI, 0x40(SP)
frames_test.go:20 0x4f6dfe 4839de CMPQ BX, SI
frames_test.go:20 0x4f6e01 480f4cde CMOVL SI, BX
frames_test.go:20 0x4f6e05 4839f8 CMPQ DI, AX
frames_test.go:20 0x4f6e08 740b JE 0x4f6e15
frames_test.go:20 0x4f6e0a 4889d9 MOVQ BX, CX
frames_test.go:20 0x4f6e0d 4889fb MOVQ DI, BX
frames_test.go:20 0x4f6e10 e80b14f7ff CALL runtime.memmove(SB)
frames_test.go:21 0x4f6e15 488b6c2418 MOVQ 0x18(SP), BP
frames_test.go:21 0x4f6e1a 4883c420 ADDQ $0x20, SP
frames_test.go:21 0x4f6e1e c3 RET
frames_test.go:19 0x4f6e1f 4889442408 MOVQ AX, 0x8(SP)
frames_test.go:19 0x4f6e24 48895c2410 MOVQ BX, 0x10(SP)
frames_test.go:19 0x4f6e29 48894c2418 MOVQ CX, 0x18(SP)
frames_test.go:19 0x4f6e2e 48897c2420 MOVQ DI, 0x20(SP)
frames_test.go:19 0x4f6e33 4889742428 MOVQ SI, 0x28(SP)
frames_test.go:19 0x4f6e38 4c89442430 MOVQ R8, 0x30(SP)
frames_test.go:19 0x4f6e3d 0f1f00 NOPL 0(AX)
frames_test.go:19 0x4f6e40 e8bbe5f6ff CALL runtime.morestack_noctxt.abi0(SB)
frames_test.go:19 0x4f6e45 488b442408 MOVQ 0x8(SP), AX
frames_test.go:19 0x4f6e4a 488b5c2410 MOVQ 0x10(SP), BX
frames_test.go:19 0x4f6e4f 488b4c2418 MOVQ 0x18(SP), CX
frames_test.go:19 0x4f6e54 488b7c2420 MOVQ 0x20(SP), DI
frames_test.go:19 0x4f6e59 488b742428 MOVQ 0x28(SP), SI
frames_test.go:19 0x4f6e5e 4c8b442430 MOVQ 0x30(SP), R8
frames_test.go:19 0x4f6e63 e978ffffff JMP command-line-arguments.leaf(SB)
$ go1.20 tool objdump -s '^runtime\.memmove$' /tmp/memmove-fp | head -n 30
TEXT runtime.memmove(SB) /usr/local/go/src/runtime/memmove_amd64.s
memmove_amd64.s:39 0x468220 4889c7 MOVQ AX, DI
memmove_amd64.s:40 0x468223 4889de MOVQ BX, SI
memmove_amd64.s:41 0x468226 4889cb MOVQ CX, BX
memmove_amd64.s:54 0x468229 4885db TESTQ BX, BX
memmove_amd64.s:55 0x46822c 0f84f7000000 JE 0x468329
memmove_amd64.s:56 0x468232 4883fb02 CMPQ $0x2, BX
memmove_amd64.s:57 0x468236 0f86e0000000 JBE 0x46831c
memmove_amd64.s:58 0x46823c 4883fb04 CMPQ $0x4, BX
memmove_amd64.s:59 0x468240 0f82e9000000 JB 0x46832f
memmove_amd64.s:60 0x468246 0f86de000000 JBE 0x46832a
memmove_amd64.s:61 0x46824c 4883fb08 CMPQ $0x8, BX
memmove_amd64.s:62 0x468250 0f82e6000000 JB 0x46833c
memmove_amd64.s:63 0x468256 0f84ed000000 JE 0x468349
memmove_amd64.s:64 0x46825c 4883fb10 CMPQ $0x10, BX
memmove_amd64.s:65 0x468260 0f86ea000000 JBE 0x468350
memmove_amd64.s:66 0x468266 4883fb20 CMPQ $0x20, BX
memmove_amd64.s:67 0x46826a 0f86f1000000 JBE 0x468361
memmove_amd64.s:68 0x468270 4883fb40 CMPQ $0x40, BX
memmove_amd64.s:69 0x468274 0f86fc000000 JBE 0x468376
memmove_amd64.s:70 0x46827a 4881fb80000000 CMPQ $0x80, BX
memmove_amd64.s:71 0x468281 0f861a010000 JBE 0x4683a1
memmove_amd64.s:72 0x468287 4881fb00010000 CMPQ $0x100, BX
memmove_amd64.s:73 0x46828e 0f8664010000 JBE 0x4683f8
memmove_amd64.s:75 0x468294 f60558331d0001 TESTB $0x1, runtime.useAVXmemmove(SB)
memmove_amd64.s:76 0x46829b 0f8525030000 JNE 0x4685c6
memmove_amd64.s:81 0x4682a1 4839fe CMPQ DI, SI
memmove_amd64.s:82 0x4682a4 763a JBE 0x4682e0
memmove_amd64.s:88 0x4682a6 4881fb00080000 CMPQ $0x800, BX
memmove_amd64.s:89 0x4682ad 0f8609020000 JBE 0x4684bc
Metadata
Metadata
Assignees
Labels
Type
Projects
Status