Skip to content

runtime: framepointer-based stack unwinding omits memmove's callee #58835

Open
@rhysh

Description

@rhysh

It looks like the immediate caller of runtime.memmove is reported correctly in runtime/pprof profiles, but is missing in perf's framepointer-based unwinding. I don't know whether this affects callers of other assembly functions, or if it's specific to memmove. CC @golang/runtime , and probably also of interest to @felixge

What version of Go are you using (go version)?

$ go1.20 version
go version go1.20.1 darwin/arm64

Does this issue reproduce with the latest release?

Yes, go1.20.1 is the latest stable release. I first identified this behavior in production use with go1.19.6.

What operating system and processor architecture are you using (go env)?

For the reproducer, I'm cross-compiling from darwin/arm64 to linux/amd64. The production behavior I'm reproducing was with linux/amd64 builds for linux/amd64.

go env Output
$ go1.20 env
GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/Users/rhys/Library/Caches/go-build"
GOENV="/Users/rhys/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/rhys/go/pkg/mod"
GONOPROXY="*"
GONOSUMDB="*"
GOOS="darwin"
GOPATH="/Users/rhys/go"
GOPRIVATE="*"
GOPROXY="direct"
GOROOT="/Users/rhys/go/version/go1.20"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/Users/rhys/go/version/go1.20/pkg/tool/darwin_arm64"
GOVCS=""
GOVERSION="go1.20.1"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/lf/n4ry0qv5639_3z0nhyhy3z7c0000gs/T/go-build3223373974=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I used Linux's perf to report collect data on an app's performance using the -g flag to collect stack traces using the frame pointer.

$ cat /tmp/frames_test.go                     
package main

import "testing"

func BenchmarkMemmove(b *testing.B) {
        src := make([]byte, 1<<15)
        dst := make([]byte, len(src))
        for i := 0; i < b.N; i++ {
                root(dst, src)
        }
}

//go:noinline
func root(dst, src []byte) {
        leaf(dst, src)
}

//go:noinline
func leaf(dst, src []byte) {
        copy(dst, src)
}
$ GOOS=linux GOARCH=amd64 go1.20 test -c -o /tmp/memmove-fp /tmp/frames_test.go
# [scp to linux/amd64 machine, run the following there]
$ sudo perf record -o /tmp/mm.perf.data -g -F 99 -- ./memmove-fp -test.bench=. -test.cpuprofile=/tmp/mm.pb.gz
goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz
BenchmarkMemmove-4       1511648               795.6 ns/op
PASS
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.028 MB /tmp/mm.perf.data (197 samples) ]

What did you expect to see?

I expected go tool pprof and perf record -g+perf script to agree on the call stacks, including the identity of the immediate caller of runtime.memmove. I expected the function named command-line-arguments.leaf to show up in the call stack, with an immediate caller of command-line-arguments.root and an immediate callee of runtime.memmove.

What did you see instead?

With go tool pprof, I see reports of CPU time spent in runtime.memmove, called by command-line-arguments.leaf, called by command-line-arguments.root, as I expect.

With perf record -g and perf script, I see the on-CPU time attributed to runtime.memmove as called by command-line-arguments.root. The ...leaf frame is missing, which I consider to be a bug.

$ perf script -i /tmp/mm.perf.data | head -n 26
memmove-fp  1265 1124104.950646:   10101010 cpu-clock:pppH: 
                  468641 runtime.memmove+0x421 (/home/ubuntu/memmove-fp)
                  4f6d85 command-line-arguments.root+0x25 (/home/ubuntu/memmove-fp)
                  4f6d05 command-line-arguments.BenchmarkMemmove+0x85 (/home/ubuntu/memmove-fp)
                  4ada02 testing.(*B).runN+0x102 (/home/ubuntu/memmove-fp)
                  4ae685 testing.(*B).launch+0x1c5 (/home/ubuntu/memmove-fp)
                  4ae486 testing.(*B).doBench.func1+0x26 (/home/ubuntu/memmove-fp)
                  467381 runtime.goexit.abi0+0x1 (/home/ubuntu/memmove-fp)

memmove-fp  1265 1124104.960746:   10101010 cpu-clock:pppH: 
                  468641 runtime.memmove+0x421 (/home/ubuntu/memmove-fp)
                  4f6d85 command-line-arguments.root+0x25 (/home/ubuntu/memmove-fp)
                  4f6d05 command-line-arguments.BenchmarkMemmove+0x85 (/home/ubuntu/memmove-fp)
                  4ada02 testing.(*B).runN+0x102 (/home/ubuntu/memmove-fp)
                  4ae685 testing.(*B).launch+0x1c5 (/home/ubuntu/memmove-fp)
                  4ae486 testing.(*B).doBench.func1+0x26 (/home/ubuntu/memmove-fp)
                  467381 runtime.goexit.abi0+0x1 (/home/ubuntu/memmove-fp)

memmove-fp  1265 1124104.970847:   10101010 cpu-clock:pppH: 
                  46864b runtime.memmove+0x42b (/home/ubuntu/memmove-fp)
                  4f6d85 command-line-arguments.root+0x25 (/home/ubuntu/memmove-fp)
                  4f6d05 command-line-arguments.BenchmarkMemmove+0x85 (/home/ubuntu/memmove-fp)
                  4ada02 testing.(*B).runN+0x102 (/home/ubuntu/memmove-fp)
                  4ae685 testing.(*B).launch+0x1c5 (/home/ubuntu/memmove-fp)
                  4ae486 testing.(*B).doBench.func1+0x26 (/home/ubuntu/memmove-fp)
                  467381 runtime.goexit.abi0+0x1 (/home/ubuntu/memmove-fp)
$ go1.20 tool pprof -peek=runtime.memmove /tmp/mmpbgz           
File: memmove-fp
Type: cpu
Time: Mar 2, 2023 at 11:40am (PST)
Duration: 2.12s, Total samples = 2010ms (94.82%)
Showing nodes accounting for 2010ms, 100% of 2010ms total
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context              
----------------------------------------------------------+-------------
                                            1960ms   100% |   command-line-arguments.leaf
    1960ms 97.51% 97.51%     1960ms 97.51%                | runtime.memmove
----------------------------------------------------------+-------------

$ go1.20 tool pprof -raw /tmp/mmpbgz   
PeriodType: cpu nanoseconds
Period: 10000000
Time: 2023-03-02 11:40:52.391789789 -0800 PST
Duration: 2.11
Samples:
samples/count cpu/nanoseconds
         47  470000000: 1 2 3 4 5 6 
        100 1000000000: 7 2 3 4 5 6 
         24  240000000: 8 2 3 4 5 6 
          3   30000000: 9 5 6 
          6   60000000: 10 2 3 4 5 6 
         12  120000000: 11 2 3 4 5 6 
          3   30000000: 12 2 3 4 5 6 
          1   10000000: 13 2 3 4 5 6 
          2   20000000: 14 5 6 
          3   30000000: 15 2 3 4 5 6 
Locations
     1: 0x468641 M=1 runtime.memmove /usr/local/go/src/runtime/memmove_amd64.s:369 s=35
     2: 0x4f6e14 M=1 command-line-arguments.leaf /tmp/frames_test.go:20 s=19
     3: 0x4f6d84 M=1 command-line-arguments.root /tmp/frames_test.go:15 s=14
     4: 0x4f6d04 M=1 command-line-arguments.BenchmarkMemmove /tmp/frames_test.go:9 s=5
     5: 0x4ada01 M=1 testing.(*B).runN /usr/local/go/src/testing/benchmark.go:193 s=181
     6: 0x4ae684 M=1 testing.(*B).launch /usr/local/go/src/testing/benchmark.go:334 s=293
     7: 0x46864b M=1 runtime.memmove /usr/local/go/src/runtime/memmove_amd64.s:371 s=35
     8: 0x468661 M=1 runtime.memmove /usr/local/go/src/runtime/memmove_amd64.s:376 s=35
     9: 0x4f6ce8 M=1 command-line-arguments.BenchmarkMemmove /tmp/frames_test.go:8 s=5
    10: 0x468657 M=1 runtime.memmove /usr/local/go/src/runtime/memmove_amd64.s:374 s=35
    11: 0x46865c M=1 runtime.memmove /usr/local/go/src/runtime/memmove_amd64.s:375 s=35
    12: 0x468638 M=1 runtime.memmove /usr/local/go/src/runtime/memmove_amd64.s:367 s=35
    13: 0x4685cc M=1 runtime.memmove /usr/local/go/src/runtime/memmove_amd64.s:302 s=35
    14: 0x4f6d1d M=1 command-line-arguments.BenchmarkMemmove /tmp/frames_test.go:8 s=5
    15: 0x468652 M=1 runtime.memmove /usr/local/go/src/runtime/memmove_amd64.s:373 s=35
Mappings
1: 0x400000/0x4f8000/0x0 /home/ubuntu/memmove-fp  [FN]
2: 0x7fff85d4c000/0x7fff85d4e000/0x0 [vdso]  
3: 0xffffffffff600000/0xffffffffff601000/0x0 [vsyscall]
$ go1.20 tool objdump -s '\.root$' /tmp/memmove-fp 
TEXT command-line-arguments.root(SB) /tmp/frames_test.go
  frames_test.go:14     0x4f6d60                493b6610                CMPQ 0x10(R14), SP                      
  frames_test.go:14     0x4f6d64                7629                    JBE 0x4f6d8f                            
  frames_test.go:14     0x4f6d66                4883ec38                SUBQ $0x38, SP                          
  frames_test.go:14     0x4f6d6a                48896c2430              MOVQ BP, 0x30(SP)                       
  frames_test.go:14     0x4f6d6f                488d6c2430              LEAQ 0x30(SP), BP                       
  frames_test.go:14     0x4f6d74                4889442440              MOVQ AX, 0x40(SP)                       
  frames_test.go:14     0x4f6d79                48897c2458              MOVQ DI, 0x58(SP)                       
  frames_test.go:15     0x4f6d7e                6690                    NOPW                                    
  frames_test.go:15     0x4f6d80                e85b000000              CALL command-line-arguments.leaf(SB)    
  frames_test.go:16     0x4f6d85                488b6c2430              MOVQ 0x30(SP), BP                       
  frames_test.go:16     0x4f6d8a                4883c438                ADDQ $0x38, SP                          
  frames_test.go:16     0x4f6d8e                c3                      RET                                     
  frames_test.go:14     0x4f6d8f                4889442408              MOVQ AX, 0x8(SP)                        
  frames_test.go:14     0x4f6d94                48895c2410              MOVQ BX, 0x10(SP)                       
  frames_test.go:14     0x4f6d99                48894c2418              MOVQ CX, 0x18(SP)                       
  frames_test.go:14     0x4f6d9e                48897c2420              MOVQ DI, 0x20(SP)                       
  frames_test.go:14     0x4f6da3                4889742428              MOVQ SI, 0x28(SP)                       
  frames_test.go:14     0x4f6da8                4c89442430              MOVQ R8, 0x30(SP)                       
  frames_test.go:14     0x4f6dad                e84ee6f6ff              CALL runtime.morestack_noctxt.abi0(SB)  
  frames_test.go:14     0x4f6db2                488b442408              MOVQ 0x8(SP), AX                        
  frames_test.go:14     0x4f6db7                488b5c2410              MOVQ 0x10(SP), BX                       
  frames_test.go:14     0x4f6dbc                488b4c2418              MOVQ 0x18(SP), CX                       
  frames_test.go:14     0x4f6dc1                488b7c2420              MOVQ 0x20(SP), DI                       
  frames_test.go:14     0x4f6dc6                488b742428              MOVQ 0x28(SP), SI                       
  frames_test.go:14     0x4f6dcb                4c8b442430              MOVQ 0x30(SP), R8                       
  frames_test.go:14     0x4f6dd0                eb8e                    JMP command-line-arguments.root(SB)

$ go1.20 tool objdump -s '\.leaf$' /tmp/memmove-fp 
TEXT command-line-arguments.leaf(SB) /tmp/frames_test.go
  frames_test.go:19     0x4f6de0                493b6610                CMPQ 0x10(R14), SP                      
  frames_test.go:19     0x4f6de4                7639                    JBE 0x4f6e1f                            
  frames_test.go:19     0x4f6de6                4883ec20                SUBQ $0x20, SP                          
  frames_test.go:19     0x4f6dea                48896c2418              MOVQ BP, 0x18(SP)                       
  frames_test.go:19     0x4f6def                488d6c2418              LEAQ 0x18(SP), BP                       
  frames_test.go:19     0x4f6df4                4889442428              MOVQ AX, 0x28(SP)                       
  frames_test.go:19     0x4f6df9                48897c2440              MOVQ DI, 0x40(SP)                       
  frames_test.go:20     0x4f6dfe                4839de                  CMPQ BX, SI                             
  frames_test.go:20     0x4f6e01                480f4cde                CMOVL SI, BX                            
  frames_test.go:20     0x4f6e05                4839f8                  CMPQ DI, AX                             
  frames_test.go:20     0x4f6e08                740b                    JE 0x4f6e15                             
  frames_test.go:20     0x4f6e0a                4889d9                  MOVQ BX, CX                             
  frames_test.go:20     0x4f6e0d                4889fb                  MOVQ DI, BX                             
  frames_test.go:20     0x4f6e10                e80b14f7ff              CALL runtime.memmove(SB)                
  frames_test.go:21     0x4f6e15                488b6c2418              MOVQ 0x18(SP), BP                       
  frames_test.go:21     0x4f6e1a                4883c420                ADDQ $0x20, SP                          
  frames_test.go:21     0x4f6e1e                c3                      RET                                     
  frames_test.go:19     0x4f6e1f                4889442408              MOVQ AX, 0x8(SP)                        
  frames_test.go:19     0x4f6e24                48895c2410              MOVQ BX, 0x10(SP)                       
  frames_test.go:19     0x4f6e29                48894c2418              MOVQ CX, 0x18(SP)                       
  frames_test.go:19     0x4f6e2e                48897c2420              MOVQ DI, 0x20(SP)                       
  frames_test.go:19     0x4f6e33                4889742428              MOVQ SI, 0x28(SP)                       
  frames_test.go:19     0x4f6e38                4c89442430              MOVQ R8, 0x30(SP)                       
  frames_test.go:19     0x4f6e3d                0f1f00                  NOPL 0(AX)                              
  frames_test.go:19     0x4f6e40                e8bbe5f6ff              CALL runtime.morestack_noctxt.abi0(SB)  
  frames_test.go:19     0x4f6e45                488b442408              MOVQ 0x8(SP), AX                        
  frames_test.go:19     0x4f6e4a                488b5c2410              MOVQ 0x10(SP), BX                       
  frames_test.go:19     0x4f6e4f                488b4c2418              MOVQ 0x18(SP), CX                       
  frames_test.go:19     0x4f6e54                488b7c2420              MOVQ 0x20(SP), DI                       
  frames_test.go:19     0x4f6e59                488b742428              MOVQ 0x28(SP), SI                       
  frames_test.go:19     0x4f6e5e                4c8b442430              MOVQ 0x30(SP), R8                       
  frames_test.go:19     0x4f6e63                e978ffffff              JMP command-line-arguments.leaf(SB)

$ go1.20 tool objdump -s '^runtime\.memmove$' /tmp/memmove-fp | head -n 30
TEXT runtime.memmove(SB) /usr/local/go/src/runtime/memmove_amd64.s
  memmove_amd64.s:39    0x468220                4889c7                  MOVQ AX, DI                             
  memmove_amd64.s:40    0x468223                4889de                  MOVQ BX, SI                             
  memmove_amd64.s:41    0x468226                4889cb                  MOVQ CX, BX                             
  memmove_amd64.s:54    0x468229                4885db                  TESTQ BX, BX                            
  memmove_amd64.s:55    0x46822c                0f84f7000000            JE 0x468329                             
  memmove_amd64.s:56    0x468232                4883fb02                CMPQ $0x2, BX                           
  memmove_amd64.s:57    0x468236                0f86e0000000            JBE 0x46831c                            
  memmove_amd64.s:58    0x46823c                4883fb04                CMPQ $0x4, BX                           
  memmove_amd64.s:59    0x468240                0f82e9000000            JB 0x46832f                             
  memmove_amd64.s:60    0x468246                0f86de000000            JBE 0x46832a                            
  memmove_amd64.s:61    0x46824c                4883fb08                CMPQ $0x8, BX                           
  memmove_amd64.s:62    0x468250                0f82e6000000            JB 0x46833c                             
  memmove_amd64.s:63    0x468256                0f84ed000000            JE 0x468349                             
  memmove_amd64.s:64    0x46825c                4883fb10                CMPQ $0x10, BX                          
  memmove_amd64.s:65    0x468260                0f86ea000000            JBE 0x468350                            
  memmove_amd64.s:66    0x468266                4883fb20                CMPQ $0x20, BX                          
  memmove_amd64.s:67    0x46826a                0f86f1000000            JBE 0x468361                            
  memmove_amd64.s:68    0x468270                4883fb40                CMPQ $0x40, BX                          
  memmove_amd64.s:69    0x468274                0f86fc000000            JBE 0x468376                            
  memmove_amd64.s:70    0x46827a                4881fb80000000          CMPQ $0x80, BX                          
  memmove_amd64.s:71    0x468281                0f861a010000            JBE 0x4683a1                            
  memmove_amd64.s:72    0x468287                4881fb00010000          CMPQ $0x100, BX                         
  memmove_amd64.s:73    0x46828e                0f8664010000            JBE 0x4683f8                            
  memmove_amd64.s:75    0x468294                f60558331d0001          TESTB $0x1, runtime.useAVXmemmove(SB)   
  memmove_amd64.s:76    0x46829b                0f8525030000            JNE 0x4685c6                            
  memmove_amd64.s:81    0x4682a1                4839fe                  CMPQ DI, SI                             
  memmove_amd64.s:82    0x4682a4                763a                    JBE 0x4682e0                            
  memmove_amd64.s:88    0x4682a6                4881fb00080000          CMPQ $0x800, BX                         
  memmove_amd64.s:89    0x4682ad                0f8609020000            JBE 0x4684bc

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.compiler/runtimeIssues related to the Go compiler and/or runtime.

    Type

    No type

    Projects

    Status

    Todo

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions