Perl benchmark shows performance decline after upgrading perl 5.30.1 from perl 5.28.2 #17558

dapeng-mi · 2020-02-12T05:35:46Z

Module:

Description

We upgraded perl from 5.28.2 to 5.30.1 on Clear Linux. But we found there were performance decline (3% ~ 10%) on Perl benchmark (Pod2html and Interpreter). Here are detailed test results.

Perl Version	Pod2html	Interpreter
perl-5.28.2	0.10765260/0.43%	0.00106407/0.19%
perl-5.30.1	0.11623903/0.52%	0.00110267/0.41%

Steps to Reproduce

Download and install phoronix-test-suite from this link (https://www.phoronix-test-suite.com)
Run command "phoronix-test-suite run perl-benchmark-1.0.1" and select "3" to test all options (Pod2html and Interpreter).
After test finishes, record the test results.

Expected behavior

Perl configuration

Perl Info

Summary of my perl5 (revision 5 version 30 subversion 1) configuration:

  Platform:
    osname=linux
    osvers=5.3.11-869.native
    archname=x86_64-linux-thread-multi
    uname='linux bc7d1fc894be44529b787f45b9a1fcde 5.3.11-869.native #1 smp thu nov 14 06:38:22 pst 2019 x86_64 gnulinux '
    config_args='--build=x86_64-generic-linux-gnu --host=x86_64-generic-linux-gnu --target=x86_64-clr-linux-gnu --program-prefix= --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/bin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/usr/com --mandir=/usr/share/man --infodir=/usr/share/info --disable-static -d -e -r -Dprefix=/usr -Dsiteprefix=/usr/local -Dvendorprefix=/usr -Dinstallman1dir=/usr/share/man/man1 -Dinstallman3dir=/usr/share/man/man3 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dusethreads -Duseshrplib -Adefine:d_procselfexe -Adefine:procselfexe="/proc/self/exe" -Adefine:ccflags=-O2 -g -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=32 -Wformat -Wformat-security -Wl,-z,now,-z,relro,-z,max-page-size=0x1000,-z,separate-code -Wno-error -ftree-vectorize -ftree-slp-vectorize -Wl,--enable-new-dtags -fcf-protection=return -Wl,--build-id=sha1 -Wa,-mbranches-within-32B-boundaries -m64 -march=westmere -mtune=haswell -fasynchronous-unwind-tables -fno-omit-frame-pointer -Wp,-D_REENTRANT -O3 -ffat-lto-objects -flto=4  -fPIC -Adefine:ldflags= -Wl,--build-id=sha1 -Adefine:lddflags= -Wl,--build-id=sha1 -U d_off64_t'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=define
    usemultiplicity=define
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
    bincompat5005=undef
  Compiler:
    cc='cc'
    ccflags ='-D_REENTRANT -D_GNU_SOURCE -O2 -g -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=32 -Wformat -Wformat-security -Wl,-z,now,-z,relro,-z,max-page-size=0x1000,-z,separate-code -Wno-error -ftree-vectorize -ftree-slp-vectorize -Wl,--enable-new-dtags -fcf-protection=return -Wl,--build-id=sha1 -Wa,-mbranches-within-32B-boundaries -m64 -march=westmere -mtune=haswell -fasynchronous-unwind-tables -fno-omit-frame-pointer -Wp,-D_REENTRANT -O3 -ffat-lto-objects -flto=4 -fPIC -fwrapv -fno-strict-aliasing -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    optimize='-O2'
    cppflags='-D_REENTRANT -D_GNU_SOURCE -O2 -g -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=32 -Wformat -Wformat-security -Wl,-z,now,-z,relro,-z,max-page-size=0x1000,-z,separate-code -Wno-error -ftree-vectorize -ftree-slp-vectorize -Wl,--enable-new-dtags -fcf-protection=return -Wl,--build-id=sha1 -Wa,-mbranches-within-32B-boundaries -m64 -march=westmere -mtune=haswell -fasynchronous-unwind-tables -fno-omit-frame-pointer -Wp,-D_REENTRANT -O3 -ffat-lto-objects -flto=4 -fPIC -fwrapv -fno-strict-aliasing -fstack-protector-strong'
    ccversion=''
    gccversion='9.2.1 20200111 gcc-9-branch@280154'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='cc'
    ldflags =' -Wl,--build-id=sha1 -fstack-protector-strong'
    libpth=/usr/lib64/gcc/x86_64-generic-linux/9/include-fixed /usr/lib /lib/../lib64 /usr/lib/../lib64 /lib /lib64 /usr/lib64
    libs=-lpthread -lnsl -lgdbm -ldl -lm -lcrypt -lutil -lc
    perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
    libc=libc-2.30.so
    so=so
    useshrplib=true
    libperl=libperl.so
    gnulibc_version='2.30'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.30.1/x86_64-linux-thread-multi/CORE'
    cccdlflags='-fPIC'
    lddlflags='-shared -O2 -fstack-protector-strong'


Characteristics of this binary (from libperl):
  Compile-time options:
    HAS_TIMES
    MULTIPLICITY
    PERLIO_LAYERS
    PERL_COPY_ON_WRITE
    PERL_DONT_CREATE_GVSV
    PERL_IMPLICIT_CONTEXT
    PERL_MALLOC_WRAP
    PERL_OP_PARENT
    PERL_PRESERVE_IVUV
    USE_64_BIT_ALL
    USE_64_BIT_INT
    USE_ITHREADS
    USE_LARGE_FILES
    USE_LOCALE
    USE_LOCALE_COLLATE
    USE_LOCALE_CTYPE
    USE_LOCALE_NUMERIC
    USE_LOCALE_TIME
    USE_PERLIO
    USE_PERL_ATOF
    USE_REENTRANT_API
    USE_THREAD_SAFE_LOCALE
  Built under linux
  Compiled at Jan  6 2020 22:15:26
  @INC:
    /usr/local/lib/perl5/site_perl/5.30.1/x86_64-linux-thread-multi
    /usr/local/lib/perl5/site_perl/5.30.1
    /usr/lib/perl5/vendor_perl/5.30.1/x86_64-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.30.1
    /usr/lib/perl5/5.30.1/x86_64-linux-thread-multi
    /usr/lib/perl5/5.30.1
    /usr/local/lib/perl5/site_perl
    /usr/lib/perl5/vendor_perl

The text was updated successfully, but these errors were encountered:

richardleach · 2020-02-15T23:05:26Z

The phoronix-test-suite - https://github.com/phoronix-test-suite/phoronix-test-suite/tree/master/ob-cache/test-profiles/pts/perl-benchmark-1.0.1 - looks like a wrapper around Gisle Aas' https://github.com/gisle/perlbench.

It also only looks to run each test 3 times, so potential for system noise to make a big difference.

Having a look at what the tests are doing to see if there's an easy route to minimal test cases.

richardleach · 2020-02-16T02:05:24Z

Notes:

phoronix-test-suite downloads a static copy of Perlbench from http://phoronix-test-suite.com/benchmark-files/perlbench-20160525.zip. This looks to be Perlbench v0.93, specifically the master branch of Gisle's repo: https://github.com/gisle/perlbench. There are more modern forks of Perlbench, such as https://github.com/briandfoy/perlbench, used for the latest CPAN release.
Changes in more recent versions include a Reini Urban patch to modernise the scaling factor on what phoronix-test-suite calls the "Interpreter" test.

The Pod2html test (benchmarks/app/podhtml.b) essentially uses static copies of Pod::HTML & perlfunc.pod to run this in a loop:

pod2html("--infile" => "benchmarks/app/perlfunc.pod", "--outfile" => "/dev/null");

The Interpreter test (./benchmarks/startup/noprog.t) runs this in a loop:

system $^X, "-e", "1";

Next, I'm going to:

Look at how the perl-benchmark gets set up within phoronix-test-suite
Run the current version of Perlbench from CPAN compare v5.28.2 and v5.30.1 directly

jkeenan · 2020-02-16T02:43:12Z

[snip]

Next, I'm going to:

1. Look at how the _perl-benchmark_ gets set up within phoronix-test-suite

2. Run the current version of Perlbench from CPAN compare v5.28.2 and v5.30.1 directly

Thank you for undertaking this systematic investigation.

richardleach · 2020-02-16T02:48:11Z

Thank you for undertaking this systematic investigation.

It's a relaxingly nerdy way to wind down. ;-)

richardleach · 2020-02-16T02:57:57Z

phoronix-test-suite looks to set up a shell script that calls:

perl perlbench-runtests [benchmarks to test]

(This will execute the benchmarks using whichever perl is in $^X .)

phoronix-test-suite then reads the Avg: time out of the resulting .pb file.

That seems legit.

Don't know why phoronix-test-suite only uses 2 out of ~40 tests in Perlbench tho!

richardleach · 2020-02-16T04:31:58Z

Did a few runs using the latest Perlbench in a Debian VM against fresh 5.28.2 and 5.30.1 builds. The relative rankings of the two builds across the ~40 tests were pretty much all over the place. On one run, 5.28.2 might be ahead on a test by 1-10%, but on a different run 5.30.1 was ahead instead.

IIRC this is not unusual behaviour for Perlbench, because of how it takes its measurements. (Reading the clock, rather than using e.g. cachegrind or perf. )

The "Interpreter" test did perform better on 5.28.2 across all runs, but only by 1-3%. That:

Might not hold true with more samples!
Could result from factors outside of P5P's direct control, e.g. as mentioned by Dave M in https://code.activestate.com/lists/perl5-porters/205525/:

The benchmarks are always swamped by noise. The outputs of individual
tests within perlbench can easily vary by 10% on slightly different perl
builds just by the changed alignment of bits of the object code across
cache line boundaries.

Could also result from differences between the builds. For example, I just ran "./Configure -des" for both my builds, but somehow 5.30.1 was built with -D_FORTIFY_SOURCE=2 and 5.28.2 was not. (I think this was down to this commit or one similar: 9f4e630.)

In summary, the phoronix-test-suite results by themselves don't seem to clearly point to a regression.

Is either test case suitable for running with Porting/bench.pl?

dapeng-mi · 2020-02-17T08:58:33Z

Phoronix-test-suite would run three times for each indicator by default, but if deviation exceeds the default threshold, phoronix-test-suite would continue to run more times until the deviation meets the threshold.
From my side, the regression can be reproduced stably, the Pod2html regression on i7-8700 is ~10% and on i9-9900K is ~6%.

richardleach · 2020-02-17T09:01:43Z

Thanks. Approximately how long does phoronix-test-suite run for?
Please could you include the perl -V output for your 5.28.2 build as well?

dapeng-mi · 2020-02-18T01:39:23Z

On my side, generally Pod2html needs to run near 15 times to reach the default deviation threshold and takes near 15 minutes to finish the test. The same configuration and compiling options are used to build these two version perls.

Here are the output of perl 5.28.2.
Summary of my perl5 (revision 5 version 28 subversion 2) configuration:

Platform:
osname=linux
osvers=5.3.11-869.native
archname=x86_64-linux-thread-multi
uname='linux 07cd155c8bf447d99259a955d5500cff 5.3.11-869.native #1 smp thu nov 14 06:38:22 pst 2019 x86_64 gnulinux '
config_args='--build=x86_64-generic-linux-gnu --host=x86_64-generic-linux-gnu --target=x86_64-clr-linux-gnu --program-prefix= --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/bin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/usr/com --mandir=/usr/share/man --infodir=/usr/share/info --disable-static -d -e -r -Dprefix=/usr -Dsiteprefix=/usr/local -Dvendorprefix=/usr -Dinstallman1dir=/usr/share/man/man1 -Dinstallman3dir=/usr/share/man/man3 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dusethreads -Duseshrplib -Adefine:d_procselfexe -Adefine:procselfexe="/proc/self/exe" -Adefine:ccflags=-O2 -g -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=32 -Wformat -Wformat-security -Wl,-z,now,-z,relro,-z,max-page-size=0x1000,-z,separate-code -Wno-error -ftree-vectorize -ftree-slp-vectorize -Wl,--enable-new-dtags -fcf-protection=return -Wl,--build-id=sha1 -Wa,-mbranches-within-32B-boundaries -m64 -march=westmere -mtune=haswell -fasynchronous-unwind-tables -fno-omit-frame-pointer -Wp,-D_REENTRANT -O3 -ffat-lto-objects -flto=4 -fPIC -Adefine:ldflags= -Wl,--build-id=sha1 -Adefine:lddflags= -Wl,--build-id=sha1 -U d_off64_t'
hint=recommended
useposix=true
d_sigaction=define
useithreads=define
usemultiplicity=define
use64bitint=define
use64bitall=define
uselongdouble=undef
usemymalloc=n
default_inc_excludes_dot=define
bincompat5005=undef
Compiler:
cc='cc'
ccflags ='-D_REENTRANT -D_GNU_SOURCE -O2 -g -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=32 -Wformat -Wformat-security -Wl,-z,now,-z,relro,-z,max-page-size=0x1000,-z,separate-code -Wno-error -ftree-vectorize -ftree-slp-vectorize -Wl,--enable-new-dtags -fcf-protection=return -Wl,--build-id=sha1 -Wa,-mbranches-within-32B-boundaries -m64 -march=westmere -mtune=haswell -fasynchronous-unwind-tables -fno-omit-frame-pointer -Wp,-D_REENTRANT -O3 -ffat-lto-objects -flto=4 -fPIC -fwrapv -fno-strict-aliasing -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
optimize='-O2'
cppflags='-D_REENTRANT -D_GNU_SOURCE -O2 -g -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=32 -Wformat -Wformat-security -Wl,-z,now,-z,relro,-z,max-page-size=0x1000,-z,separate-code -Wno-error -ftree-vectorize -ftree-slp-vectorize -Wl,--enable-new-dtags -fcf-protection=return -Wl,--build-id=sha1 -Wa,-mbranches-within-32B-boundaries -m64 -march=westmere -mtune=haswell -fasynchronous-unwind-tables -fno-omit-frame-pointer -Wp,-D_REENTRANT -O3 -ffat-lto-objects -flto=4 -fPIC -fwrapv -fno-strict-aliasing -fstack-protector-strong'
ccversion=''
gccversion='9.2.1 20191210 gcc-9-branch@279166'
gccosandvers=''
intsize=4
longsize=8
ptrsize=8
doublesize=8
byteorder=12345678
doublekind=3
d_longlong=define
longlongsize=8
d_longdbl=define
longdblsize=16
longdblkind=3
ivtype='long'
ivsize=8
nvtype='double'
nvsize=8
Off_t='off_t'
lseeksize=8
alignbytes=8
prototype=define
Linker and Libraries:
ld='cc'
ldflags =' -Wl,--build-id=sha1 -fstack-protector-strong'
libpth=/usr/lib64/gcc/x86_64-generic-linux/9/include-fixed /usr/lib /lib/../lib64 /usr/lib/../lib64 /lib /lib64 /usr/lib64
libs=-lpthread -lnsl -lgdbm -ldl -lm -lcrypt -lutil -lc
perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
libc=libc-2.30.so
so=so
useshrplib=true
libperl=libperl.so
gnulibc_version='2.30'
Dynamic Linking:
dlsrc=dl_dlopen.xs
dlext=so
d_dlsymun=undef
ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.28.2/x86_64-linux-thread-multi/CORE'
cccdlflags='-fPIC'
lddlflags='-shared -O2 -fstack-protector-strong'

Characteristics of this binary (from libperl):
Compile-time options:
HAS_TIMES
MULTIPLICITY
PERLIO_LAYERS
PERL_COPY_ON_WRITE
PERL_DONT_CREATE_GVSV
PERL_IMPLICIT_CONTEXT
PERL_MALLOC_WRAP
PERL_OP_PARENT
PERL_PRESERVE_IVUV
USE_64_BIT_ALL
USE_64_BIT_INT
USE_ITHREADS
USE_LARGE_FILES
USE_LOCALE
USE_LOCALE_COLLATE
USE_LOCALE_CTYPE
USE_LOCALE_NUMERIC
USE_LOCALE_TIME
USE_PERLIO
USE_PERL_ATOF
USE_REENTRANT_API
Built under linux
Compiled at Dec 12 2019 00:24:27
@inc:
/usr/local/lib/perl5/site_perl/5.28.2/x86_64-linux-thread-multi
/usr/local/lib/perl5/site_perl/5.28.2
/usr/lib/perl5/vendor_perl/5.28.2/x86_64-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.28.2
/usr/lib/perl5/5.28.2/x86_64-linux-thread-multi
/usr/lib/perl5/5.28.2

richardleach · 2020-02-18T08:34:53Z

@dapeng-mi Thanks for that detail.

@ everyone: Suggestions on how to move forward with this? Results from some quick perf experiments below. (Disclaimer: I've never used perf before.) I don't mind building every 5.29.x release and running the same perf, if that would be worthwhile...

/5282# perf stat /gh17558/5282/perl5/perl -I/gh17558/5282/perl5/lib -MPod::Html -MFile::Spec -e 'pod2html("--infile" => "/gh17558/perlfunc.pod", "--outfile" => File::Spec->devnull, "--quiet" )'

 Performance counter stats for '/gh17558/5282/perl5/perl -I/gh17558/5282/perl5/lib -MPod::Html -MFile::Spec -e pod2html("--infile" => "/gh17558/perlfunc.pod", "--outfile" => File::Spec->devnull, "--quiet" )':

            384.48 msec task-clock                #    0.992 CPUs utilized          
                28      context-switches          #    0.073 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
             3,714      page-faults               #    0.010 M/sec                  
     1,072,682,917      cycles                    #    2.790 GHz                    
     1,745,919,221      instructions              #    1.63  insn per cycle         
       356,790,778      branches                  #  927.986 M/sec                  
         6,341,590      branch-misses             #    1.78% of all branches        

       0.387540624 seconds time elapsed

       0.357445000 seconds user
       0.027801000 seconds sys

/5282# perf stat /gh17558/5282/perl5/perl -I/gh17558/5282/perl5/lib -MPod::Html -MFile::Spec -e 'pod2html("--infile" => "/gh17558/perlfunc.pod", "--outfile" => File::Spec->devnull, "--quiet" ) for ( 0 .. 999 )'

 Performance counter stats for '/gh17558/5282/perl5/perl -I/gh17558/5282/perl5/lib -MPod::Html -MFile::Spec -e pod2html("--infile" => "/gh17558/perlfunc.pod", "--outfile" => File::Spec->devnull, "--quiet" ) for ( 0 .. 999 )':

        354,431.68 msec task-clock                #    0.997 CPUs utilized          
            10,461      context-switches          #    0.030 K/sec                  
               101      cpu-migrations            #    0.000 K/sec                  
           321,077      page-faults               #    0.906 K/sec                  
   974,256,551,546      cycles                    #    2.749 GHz                    
 1,601,626,624,150      instructions              #    1.64  insn per cycle         
   326,049,339,752      branches                  #  919.922 M/sec                  
     5,020,400,433      branch-misses             #    1.54% of all branches        

     355.414378728 seconds time elapsed

     352.627838000 seconds user
       1.728528000 seconds sys

/5301# perf stat /gh17558/5301/perl5/perl -I/gh17558/5301/perl5/lib -MPod::Html -MFile::Spec -e 'pod2html("--infile" => "/gh17558/perlfunc.pod", "--outfile" => File::Spec->devnull, "--quiet" )'

 Performance counter stats for '/gh17558/5301/perl5/perl -I/gh17558/5301/perl5/lib -MPod::Html -MFile::Spec -e pod2html("--infile" => "/gh17558/perlfunc.pod", "--outfile" => File::Spec->devnull, "--quiet" )':

            432.09 msec task-clock                #    0.990 CPUs utilized          
                26      context-switches          #    0.060 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
             3,730      page-faults               #    0.009 M/sec                  
     1,164,654,318      cycles                    #    2.695 GHz                    
     1,780,088,766      instructions              #    1.53  insn per cycle         
       363,935,095      branches                  #  842.272 M/sec                  
         7,029,140      branch-misses             #    1.93% of all branches        

       0.436268922 seconds time elapsed

       0.416828000 seconds user
       0.015729000 seconds sys

/5301# perf stat /gh17558/5301/perl5/perl -I/gh17558/5301/perl5/lib -MPod::Html -MFile::Spec -e 'pod2html("--infile" => "/gh17558/perlfunc.pod", "--outfile" => File::Spec->devnull, "--quiet" ) for (0 .. 999)'

 Performance counter stats for '/gh17558/5301/perl5/perl -I/gh17558/5301/perl5/lib -MPod::Html -MFile::Spec -e pod2html("--infile" => "/gh17558/perlfunc.pod", "--outfile" => File::Spec->devnull, "--quiet" ) for (0 .. 999)':

        362,237.18 msec task-clock                #    0.998 CPUs utilized          
            10,016      context-switches          #    0.028 K/sec                  
               102      cpu-migrations            #    0.000 K/sec                  
           205,705      page-faults               #    0.568 K/sec                  
 1,002,808,465,214      cycles                    #    2.768 GHz                    
 1,631,358,178,037      instructions              #    1.63  insn per cycle         
   332,252,149,201      branches                  #  917.223 M/sec                  
     5,060,947,906      branch-misses             #    1.52% of all branches        

     362.881463506 seconds time elapsed

     361.196604000 seconds user
       0.974318000 seconds sys

iabyn · 2020-02-18T10:29:40Z

On Tue, Feb 18, 2020 at 12:35:00AM -0800, Richard Leach wrote: @ everyone: Suggestions on how to move forward with this? Results from some quick perf experiments below. (Disclaimer: I've never used perf before.) I don't mind building every 5.29.x release and running the same perf, if that would be worthwhile...

I think we are mostly measuring noise in this thread. It would be useful to run the benchmarks against several perls to see if there is a consistent slowdown or random variations across perl versions.

…

-- A major Starfleet emergency breaks out near the Enterprise, but fortunately some other ships in the area are able to deal with it to everyone's satisfaction. -- Things That Never Happen in "Star Trek" #13

richardleach · 2020-03-21T04:11:16Z

pod2html_misc_perf.txt - some numbers for the past two release cycles, plus the .0 stable releases back to 5.20.

Cycles, instructions, branches all vary. Haven't had time to look in any detail.

One thing that jumps out though is a massive jump in cycles, instructions, and branches from v5.31.9 to v5.31.10.

iabyn · 2020-03-21T15:27:26Z

On Fri, Mar 20, 2020 at 09:11:27PM -0700, Richard Leach wrote: [pod2html_misc_perf.txt](https://github.com/Perl/perl5/files/4362417/pod2html_misc_perf.txt) - some numbers for the past two release cycles, plus the .0 stable releases back to 5.20. Cycles, instructions, branches all vary. Haven't had time to look in any detail. One thing that jumps out though is a massive jump in cycles, instructions, and branches from v5.31.9 to v5.31.10.

I see almost identical (to each other) results for 5.31.9 and 5.31.10. Are you sure your .9 and .10 perl binaries are built in the same way as each other as regards to threading, DEBUGGING, compiler etc?

…

-- I before E. Except when it isn't.

khwilliamson · 2020-03-21T15:50:17Z

On 3/21/20 9:27 AM, iabyn wrote: On Fri, Mar 20, 2020 at 09:11:27PM -0700, Richard Leach wrote: > [pod2html_misc_perf.txt](https://github.com/Perl/perl5/files/4362417/pod2html_misc_perf.txt) - some numbers for the past two release cycles, plus the .0 stable releases back to 5.20. > > Cycles, instructions, branches all vary. Haven't had time to look in any detail. > > One thing that jumps out though is a massive jump in cycles, instructions, and branches from v5.31.9 to v5.31.10. I see almost identical (to each other) results for 5.31.9 and 5.31.10. Are you sure your .9 and .10 perl binaries are built in the same way as each other as regards to threading, DEBUGGING, compiler etc? --

I do know that the test suite takes about 25% longer to complete on my Linux box on recent bleads than before, with threads and debugging enabled. No change on unthreaded, non-debugging builds. I bisected it enough to rule out a couple of my commits that I worried might be the cause, but I didn't try a comprehensive bisect.

iabyn · 2020-03-21T16:10:53Z

On Sat, Mar 21, 2020 at 08:50:29AM -0700, Karl Williamson wrote: I do know that the test suite takes about 25% longer to complete on my Linux box on recent bleads than before, with threads and debugging enabled. No change on unthreaded, non-debugging builds. I bisected it enough to rule out a couple of my commits that I worried might be the cause, but I didn't try a comprehensive bisect.

I generally find that test suite timings are all over the place. Here is a summary of the 'CPU' line following the 'all tests successful' line from my debugging+threads snapshots I've built for each devel release: Files Tests wall usr sys cusr csys CPU 5.31.0 2656 1230625 252 (220.17 12.88 + 1399.45 71.41 1703.91 5.31.1 2645 1230238 286 (244.83 15.82 + 1570.64 87.71 1919.00 5.31.2 2646 1230207 266 (234.14 12.98 + 1480.00 72.09 1799.21 5.31.3 2647 1230330 333 (282.49 18.22 + 1738.91 108.91 2148.53 5.31.4 2648 1230464 268 (234.23 13.92 + 1492.55 77.10 1817.80 5.31.5 2657 1178989 253 (219.82 13.10 + 1511.16 73.66 1817.74 5.31.6 2665 1205732 286 (221.77 13.38 + 1550.83 75.66 1861.64 5.31.7 2666 1205925 304 (259.22 15.36 + 1708.78 96.28 2079.64 5.31.8 2666 1206215 624 (522.32 35.43 + 2940.73 258.56 3757.04 5.31.9 2670 1217780 267 (226.32 13.35 + 1583.50 76.27 1899.44 5.31.10 2675 1221414 346 (273.41 14.56 + 2036.65 91.18 2415.80 Th only thing I've ever been able to conclude from test suite timings is that I can't conclude anything from them. I suspect its more to do with whether my laptop was overheating and CPU throttling at the time. Running Porting/perlbench.pl on the full t/perf/benchmarks suite (currently 432 benchmarks) shows virtually no change between 5.30.0 and 5.31.10.

…

-- More than any other time in history, mankind faces a crossroads. One path leads to despair and utter hopelessness. The other, to total extinction. Let us pray we have the wisdom to choose correctly. -- Woody Allen

richardleach · 2020-03-21T16:12:15Z

I see almost identical (to each other) results for 5.31.9 and 5.31.10. Are you sure your .9 and .10 perl binaries are built in the same way as each other as regards to threading, DEBUGGING, compiler etc?

Yeah, I built them all in the same way (./Configure -des -Dusedevel && make, with a make clean in-between), but just reran for blead and it didn't show that jump:

 Performance counter stats for './perl -Ilib -MPod::Html -MFile::Spec -e pod2html("--infile" => "../perlfunc.pod","--outfile" => File::Spec->devnull, "--quiet")':

            635.77 msec task-clock                #    0.997 CPUs utilized          
                 7      context-switches          #    0.011 K/sec                  
                 1      cpu-migrations            #    0.002 K/sec                  
             4,491      page-faults               #    0.007 M/sec                  
     1,309,623,366      cycles                    #    2.060 GHz                    
     1,850,029,406      instructions              #    1.41  insn per cycle         
       375,064,232      branches                  #  589.940 M/sec                  
         7,713,324      branch-misses             #    2.06% of all branches        

       0.637361018 seconds time elapsed

       0.591935000 seconds user
       0.045998000 seconds sys

Nothing like repeatability, eh? ;-)

richardleach · 2020-03-21T16:33:16Z

And again for v5.30.10, back in the same ballpark:

 Performance counter stats for './perl -Ilib -MPod::Html -MFile::Spec -e pod2html("--infile" => "../perlfunc.pod","--outfile" => File::Spec->devnull, "--quiet")':

            622.37 msec task-clock                #    0.998 CPUs utilized          
                 8      context-switches          #    0.013 K/sec                  
                 1      cpu-migrations            #    0.002 K/sec                  
             4,497      page-faults               #    0.007 M/sec                  
     1,287,636,701      cycles                    #    2.069 GHz                    
     1,848,940,119      instructions              #    1.44  insn per cycle         
       374,712,223      branches                  #  602.072 M/sec                  
         7,708,676      branch-misses             #    2.06% of all branches        

       0.623594041 seconds time elapsed

       0.601463000 seconds user
       0.022481000 seconds sys

Th only thing I've ever been able to conclude from test suite timings is that I can't conclude anything from them.

This seems very true.

It might be useful if we could produce box and whisker plots for benchmarks, so we're not mislead by outliers. (Using counter-based data from perlbench or perf.) Wonder if Perl::Formance can/could do this?

iabyn · 2020-03-21T17:13:35Z

On Sat, Mar 21, 2020 at 09:33:28AM -0700, Richard Leach wrote: It might be useful if we could produce box and whisker plots for benchmarks, so we're not mislead by outliers. (Using counter-based data from perlbench or perf.) Wonder if Perl::Formance can/could do this?

I rely mainly on Porting/bench.pl, which is built upon cachegrind and thus gives 99.9% consistent results.

…

-- This is a great day for France! -- Nixon at Charles De Gaulle's funeral

richardleach · 2020-03-21T20:16:46Z

1,000x pod2html iterations for 5.28.2, 5.30.1, 5.31.10 below. Similar story.

v5.31.10

 Performance counter stats for './perl -Ilib -MPod::Html -MFile::Spec -e pod2html("--infile" => "../perlfunc.pod","--outfile" => File::Spec->devnull, "--quiet") for (0..999)':

        454,670.34 msec task-clock                #    0.999 CPUs utilized          
             3,078      context-switches          #    0.007 K/sec                  
                13      cpu-migrations            #    0.000 K/sec                  
           205,704      page-faults               #    0.452 K/sec                  
 1,082,806,460,200      cycles                    #    2.382 GHz                    
 1,687,579,076,989      instructions              #    1.56  insn per cycle         
   340,128,910,670      branches                  #  748.078 M/sec                  
     5,416,255,880      branch-misses             #    1.59% of all branches        

     454.920092899 seconds time elapsed

     453.228837000 seconds user
       1.411193000 seconds sys

==========================
v5.30.1

 Performance counter stats for './perl -Ilib -MPod::Html -MFile::Spec -e pod2html("--infile" => "../perlfunc.pod","--outfile" => File::Spec->devnull, "--quiet") for (0..999)':

        408,737.90 msec task-clock                #    0.999 CPUs utilized          
             2,577      context-switches          #    0.006 K/sec                  
                14      cpu-migrations            #    0.000 K/sec                  
           202,101      page-faults               #    0.494 K/sec                  
   996,591,259,416      cycles                    #    2.438 GHz                    
 1,634,068,853,879      instructions              #    1.64  insn per cycle         
   333,080,367,811      branches                  #  814.900 M/sec                  
     5,362,305,842      branch-misses             #    1.61% of all branches        

     408.947508213 seconds time elapsed

     407.643432000 seconds user
       1.082093000 seconds sys

==========================
v5.28.2

 Performance counter stats for './perl -Ilib -MPod::Html -MFile::Spec -e pod2html("--infile" => "../perlfunc.pod","--outfile" => File::Spec->devnull, "--quiet") for (0..999)':

        468,584.42 msec task-clock                #    0.999 CPUs utilized          
             3,361      context-switches          #    0.007 K/sec                  
                27      cpu-migrations            #    0.000 K/sec                  
           322,632      page-faults               #    0.689 K/sec                  
 1,027,909,571,067      cycles                    #    2.194 GHz                    
 1,605,057,470,043      instructions              #    1.56  insn per cycle         
   327,106,521,040      branches                  #  698.074 M/sec                  
     5,109,485,964      branch-misses             #    1.56% of all branches        

     468.908419310 seconds time elapsed

     466.658571000 seconds user
       1.898568000 seconds sys

richardleach · 2020-03-21T20:29:50Z

@dapeng-mi If you're able to identify a specific commit as the cause of a pod2html regression, that would give something specific to examine. Otherwise it looks like test noise is swamping any signal.

khwilliamson · 2020-04-14T23:39:38Z

On 3/21/20 10:11 AM, iabyn wrote: On Sat, Mar 21, 2020 at 08:50:29AM -0700, Karl Williamson wrote: > I do know that the test suite takes about 25% longer to complete on my > Linux box on recent bleads than before, with threads and debugging > enabled. No change on unthreaded, non-debugging builds. > > I bisected it enough to rule out a couple of my commits that I worried > might be the cause, but I didn't try a comprehensive bisect. I generally find that test suite timings are all over the place. Here is a summary of the 'CPU' line following the 'all tests successful' line from my debugging+threads snapshots I've built for each devel release: Files Tests wall usr sys cusr csys CPU 5.31.0 2656 1230625 252 (220.17 12.88 + 1399.45 71.41 1703.91 5.31.1 2645 1230238 286 (244.83 15.82 + 1570.64 87.71 1919.00 5.31.2 2646 1230207 266 (234.14 12.98 + 1480.00 72.09 1799.21 5.31.3 2647 1230330 333 (282.49 18.22 + 1738.91 108.91 2148.53 5.31.4 2648 1230464 268 (234.23 13.92 + 1492.55 77.10 1817.80 5.31.5 2657 1178989 253 (219.82 13.10 + 1511.16 73.66 1817.74 5.31.6 2665 1205732 286 (221.77 13.38 + 1550.83 75.66 1861.64 5.31.7 2666 1205925 304 (259.22 15.36 + 1708.78 96.28 2079.64 5.31.8 2666 1206215 624 (522.32 35.43 + 2940.73 258.56 3757.04 5.31.9 2670 1217780 267 (226.32 13.35 + 1583.50 76.27 1899.44 5.31.10 2675 1221414 346 (273.41 14.56 + 2036.65 91.18 2415.80 Th only thing I've ever been able to conclude from test suite timings is that I can't conclude anything from them. I suspect its more to do with whether my laptop was overheating and CPU throttling at the time. Running Porting/perlbench.pl on the full t/perf/benchmarks suite (currently 432 benchmarks) shows virtually no change between 5.30.0 and 5.31.10. --

Timing information to be consistent should be done consistently with the same number of parallel jobs. Doing it with one job is the most reliable. Otherwise, succeeding runs after cleaning the workspace learn from the previous run to schedule tests more intelligently in parallel. And there is a bug in TAP::Harness which yields inaccurate information for that scheduling algorithm. It doesn't start the timers until the first output of a test is seen, typically after the buffer gets filled. Thus, tests which do a lot of computation before outputting their results are marked as taking much less time to complete than they actually do. This throws off the scheduling. That same test writing a diagnostic to stderr early (unbuffered) will show a significant difference in timings. I'm submitting a PR to fix this. But it remains that the wall time will often be significantly shorter than the claimed CPU usage for a test running in a single thread.

richardleach · 2024-11-17T22:14:01Z

This issue hasn't turned up a minimally reproduceable test case, or even a general suspicion of "something in X part of core slowed down". It does seem likely that noise and/or the standard small % variability of performance across builds due to factors outside of our control might have been what was happening.

I'll close this ticket now. If there are any specific slowdowns that people want looking at, please could you raise a new issue and include your detailed test case(s) there.

dapeng-mi added the Needs Triage label Feb 12, 2020

jkeenan removed the Needs Triage label Feb 15, 2020

jkeenan mentioned this issue Feb 16, 2020

Add pod to PerlBench.pm briandfoy/perlbench#2

Open

richardleach closed this as completed Nov 17, 2024

Perl benchmark shows performance decline after upgrading perl 5.30.1 from perl 5.28.2 #17558

Perl benchmark shows performance decline after upgrading perl 5.30.1 from perl 5.28.2 #17558

Comments

dapeng-mi commented Feb 12, 2020 • edited by toddr Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

richardleach commented Feb 15, 2020

Uh oh!

richardleach commented Feb 16, 2020

Uh oh!

jkeenan commented Feb 16, 2020

Uh oh!

richardleach commented Feb 16, 2020

Uh oh!

richardleach commented Feb 16, 2020

Uh oh!

richardleach commented Feb 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dapeng-mi commented Feb 17, 2020

Uh oh!

richardleach commented Feb 17, 2020

Uh oh!

dapeng-mi commented Feb 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

richardleach commented Feb 18, 2020

Uh oh!

iabyn commented Feb 18, 2020 via email

Uh oh!

richardleach commented Mar 21, 2020

Uh oh!

iabyn commented Mar 21, 2020 via email

Uh oh!

khwilliamson commented Mar 21, 2020 via email

Uh oh!

iabyn commented Mar 21, 2020 via email

Uh oh!

richardleach commented Mar 21, 2020

Uh oh!

richardleach commented Mar 21, 2020

Uh oh!

iabyn commented Mar 21, 2020 via email

Uh oh!

richardleach commented Mar 21, 2020

Uh oh!

richardleach commented Mar 21, 2020

Uh oh!

khwilliamson commented Apr 14, 2020 via email

Uh oh!

richardleach commented Nov 17, 2024

Uh oh!

dapeng-mi commented Feb 12, 2020 •

edited by toddr

Loading

richardleach commented Feb 16, 2020 •

edited

Loading

dapeng-mi commented Feb 18, 2020 •

edited

Loading