-
Notifications
You must be signed in to change notification settings - Fork 584
Perl benchmark shows performance decline after upgrading perl 5.30.1 from perl 5.28.2 #17558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The phoronix-test-suite - https://github.com/phoronix-test-suite/phoronix-test-suite/tree/master/ob-cache/test-profiles/pts/perl-benchmark-1.0.1 - looks like a wrapper around Gisle Aas' https://github.com/gisle/perlbench. It also only looks to run each test 3 times, so potential for system noise to make a big difference. Having a look at what the tests are doing to see if there's an easy route to minimal test cases. |
Notes:
The Pod2html test (benchmarks/app/podhtml.b) essentially uses static copies of Pod::HTML & perlfunc.pod to run this in a loop:
The Interpreter test (./benchmarks/startup/noprog.t) runs this in a loop:
Next, I'm going to:
|
[snip]
Thank you for undertaking this systematic investigation. |
It's a relaxingly nerdy way to wind down. ;-) |
phoronix-test-suite looks to set up a shell script that calls:
(This will execute the benchmarks using whichever perl is in $^X .) phoronix-test-suite then reads the Avg: time out of the resulting .pb file. That seems legit. Don't know why phoronix-test-suite only uses 2 out of ~40 tests in Perlbench tho! |
Did a few runs using the latest Perlbench in a Debian VM against fresh 5.28.2 and 5.30.1 builds. The relative rankings of the two builds across the ~40 tests were pretty much all over the place. On one run, 5.28.2 might be ahead on a test by 1-10%, but on a different run 5.30.1 was ahead instead. IIRC this is not unusual behaviour for Perlbench, because of how it takes its measurements. (Reading the clock, rather than using e.g. cachegrind or perf. ) The "Interpreter" test did perform better on 5.28.2 across all runs, but only by 1-3%. That:
In summary, the phoronix-test-suite results by themselves don't seem to clearly point to a regression. Is either test case suitable for running with Porting/bench.pl? |
Phoronix-test-suite would run three times for each indicator by default, but if deviation exceeds the default threshold, phoronix-test-suite would continue to run more times until the deviation meets the threshold. |
Thanks. Approximately how long does phoronix-test-suite run for? |
On my side, generally Pod2html needs to run near 15 times to reach the default deviation threshold and takes near 15 minutes to finish the test. The same configuration and compiling options are used to build these two version perls. Here are the output of perl 5.28.2. Platform: Characteristics of this binary (from libperl): |
@dapeng-mi Thanks for that detail. @ everyone: Suggestions on how to move forward with this? Results from some quick perf experiments below. (Disclaimer: I've never used perf before.) I don't mind building every 5.29.x release and running the same perf, if that would be worthwhile... /5282# perf stat /gh17558/5282/perl5/perl -I/gh17558/5282/perl5/lib -MPod::Html -MFile::Spec -e 'pod2html("--infile" => "/gh17558/perlfunc.pod", "--outfile" => File::Spec->devnull, "--quiet" )'
/5282# perf stat /gh17558/5282/perl5/perl -I/gh17558/5282/perl5/lib -MPod::Html -MFile::Spec -e 'pod2html("--infile" => "/gh17558/perlfunc.pod", "--outfile" => File::Spec->devnull, "--quiet" ) for ( 0 .. 999 )'
/5301# perf stat /gh17558/5301/perl5/perl -I/gh17558/5301/perl5/lib -MPod::Html -MFile::Spec -e 'pod2html("--infile" => "/gh17558/perlfunc.pod", "--outfile" => File::Spec->devnull, "--quiet" )'
/5301# perf stat /gh17558/5301/perl5/perl -I/gh17558/5301/perl5/lib -MPod::Html -MFile::Spec -e 'pod2html("--infile" => "/gh17558/perlfunc.pod", "--outfile" => File::Spec->devnull, "--quiet" ) for (0 .. 999)'
|
On Tue, Feb 18, 2020 at 12:35:00AM -0800, Richard Leach wrote:
@ everyone: Suggestions on how to move forward with this? Results from
some quick perf experiments below. (Disclaimer: I've never used perf
before.) I don't mind building every 5.29.x release and running the same
perf, if that would be worthwhile...
I think we are mostly measuring noise in this thread. It would be useful
to run the benchmarks against several perls to see if there is a
consistent slowdown or random variations across perl versions.
…--
A major Starfleet emergency breaks out near the Enterprise, but
fortunately some other ships in the area are able to deal with it to
everyone's satisfaction.
-- Things That Never Happen in "Star Trek" #13
|
pod2html_misc_perf.txt - some numbers for the past two release cycles, plus the .0 stable releases back to 5.20. Cycles, instructions, branches all vary. Haven't had time to look in any detail. One thing that jumps out though is a massive jump in cycles, instructions, and branches from v5.31.9 to v5.31.10. |
On Fri, Mar 20, 2020 at 09:11:27PM -0700, Richard Leach wrote:
[pod2html_misc_perf.txt](https://github.com/Perl/perl5/files/4362417/pod2html_misc_perf.txt) - some numbers for the past two release cycles, plus the .0 stable releases back to 5.20.
Cycles, instructions, branches all vary. Haven't had time to look in any detail.
One thing that jumps out though is a massive jump in cycles, instructions, and branches from v5.31.9 to v5.31.10.
I see almost identical (to each other) results for 5.31.9 and 5.31.10.
Are you sure your .9 and .10 perl binaries are built in the same way
as each other as regards to threading, DEBUGGING, compiler etc?
…--
I before E. Except when it isn't.
|
On 3/21/20 9:27 AM, iabyn wrote:
On Fri, Mar 20, 2020 at 09:11:27PM -0700, Richard Leach wrote:
>
[pod2html_misc_perf.txt](https://github.com/Perl/perl5/files/4362417/pod2html_misc_perf.txt)
- some numbers for the past two release cycles, plus the .0 stable
releases back to 5.20.
>
> Cycles, instructions, branches all vary. Haven't had time to look in
any detail.
>
> One thing that jumps out though is a massive jump in cycles,
instructions, and branches from v5.31.9 to v5.31.10.
I see almost identical (to each other) results for 5.31.9 and 5.31.10.
Are you sure your .9 and .10 perl binaries are built in the same way
as each other as regards to threading, DEBUGGING, compiler etc?
--
I do know that the test suite takes about 25% longer to complete on my
Linux box on recent bleads than before, with threads and debugging
enabled. No change on unthreaded, non-debugging builds.
I bisected it enough to rule out a couple of my commits that I worried
might be the cause, but I didn't try a comprehensive bisect.
|
On Sat, Mar 21, 2020 at 08:50:29AM -0700, Karl Williamson wrote:
I do know that the test suite takes about 25% longer to complete on my
Linux box on recent bleads than before, with threads and debugging
enabled. No change on unthreaded, non-debugging builds.
I bisected it enough to rule out a couple of my commits that I worried
might be the cause, but I didn't try a comprehensive bisect.
I generally find that test suite timings are all over the place. Here is a
summary of the 'CPU' line following the 'all tests successful' line from
my debugging+threads snapshots I've built for each devel release:
Files Tests wall usr sys cusr csys CPU
5.31.0 2656 1230625 252 (220.17 12.88 + 1399.45 71.41 1703.91
5.31.1 2645 1230238 286 (244.83 15.82 + 1570.64 87.71 1919.00
5.31.2 2646 1230207 266 (234.14 12.98 + 1480.00 72.09 1799.21
5.31.3 2647 1230330 333 (282.49 18.22 + 1738.91 108.91 2148.53
5.31.4 2648 1230464 268 (234.23 13.92 + 1492.55 77.10 1817.80
5.31.5 2657 1178989 253 (219.82 13.10 + 1511.16 73.66 1817.74
5.31.6 2665 1205732 286 (221.77 13.38 + 1550.83 75.66 1861.64
5.31.7 2666 1205925 304 (259.22 15.36 + 1708.78 96.28 2079.64
5.31.8 2666 1206215 624 (522.32 35.43 + 2940.73 258.56 3757.04
5.31.9 2670 1217780 267 (226.32 13.35 + 1583.50 76.27 1899.44
5.31.10 2675 1221414 346 (273.41 14.56 + 2036.65 91.18 2415.80
Th only thing I've ever been able to conclude from test suite timings is
that I can't conclude anything from them. I suspect its more to do with
whether my laptop was overheating and CPU throttling at the time.
Running Porting/perlbench.pl on the full t/perf/benchmarks suite
(currently 432 benchmarks) shows virtually no change between 5.30.0 and
5.31.10.
…--
More than any other time in history, mankind faces a crossroads. One path
leads to despair and utter hopelessness. The other, to total extinction.
Let us pray we have the wisdom to choose correctly.
-- Woody Allen
|
Yeah, I built them all in the same way (./Configure -des -Dusedevel && make, with a make clean in-between), but just reran for blead and it didn't show that jump:
Nothing like repeatability, eh? ;-) |
And again for v5.30.10, back in the same ballpark:
This seems very true. It might be useful if we could produce box and whisker plots for benchmarks, so we're not mislead by outliers. (Using counter-based data from perlbench or perf.) Wonder if Perl::Formance can/could do this? |
On Sat, Mar 21, 2020 at 09:33:28AM -0700, Richard Leach wrote:
It might be useful if we could produce box and whisker plots for
benchmarks, so we're not mislead by outliers. (Using counter-based data
from perlbench or perf.) Wonder if Perl::Formance can/could do this?
I rely mainly on Porting/bench.pl, which is built upon cachegrind and
thus gives 99.9% consistent results.
…--
This is a great day for France!
-- Nixon at Charles De Gaulle's funeral
|
1,000x pod2html iterations for 5.28.2, 5.30.1, 5.31.10 below. Similar story.
|
@dapeng-mi If you're able to identify a specific commit as the cause of a pod2html regression, that would give something specific to examine. Otherwise it looks like test noise is swamping any signal. |
On 3/21/20 10:11 AM, iabyn wrote:
On Sat, Mar 21, 2020 at 08:50:29AM -0700, Karl Williamson wrote:
> I do know that the test suite takes about 25% longer to complete on my
> Linux box on recent bleads than before, with threads and debugging
> enabled. No change on unthreaded, non-debugging builds.
>
> I bisected it enough to rule out a couple of my commits that I worried
> might be the cause, but I didn't try a comprehensive bisect.
I generally find that test suite timings are all over the place. Here is a
summary of the 'CPU' line following the 'all tests successful' line from
my debugging+threads snapshots I've built for each devel release:
Files Tests wall usr sys cusr csys CPU
5.31.0 2656 1230625 252 (220.17 12.88 + 1399.45 71.41 1703.91
5.31.1 2645 1230238 286 (244.83 15.82 + 1570.64 87.71 1919.00
5.31.2 2646 1230207 266 (234.14 12.98 + 1480.00 72.09 1799.21
5.31.3 2647 1230330 333 (282.49 18.22 + 1738.91 108.91 2148.53
5.31.4 2648 1230464 268 (234.23 13.92 + 1492.55 77.10 1817.80
5.31.5 2657 1178989 253 (219.82 13.10 + 1511.16 73.66 1817.74
5.31.6 2665 1205732 286 (221.77 13.38 + 1550.83 75.66 1861.64
5.31.7 2666 1205925 304 (259.22 15.36 + 1708.78 96.28 2079.64
5.31.8 2666 1206215 624 (522.32 35.43 + 2940.73 258.56 3757.04
5.31.9 2670 1217780 267 (226.32 13.35 + 1583.50 76.27 1899.44
5.31.10 2675 1221414 346 (273.41 14.56 + 2036.65 91.18 2415.80
Th only thing I've ever been able to conclude from test suite timings is
that I can't conclude anything from them. I suspect its more to do with
whether my laptop was overheating and CPU throttling at the time.
Running Porting/perlbench.pl on the full t/perf/benchmarks suite
(currently 432 benchmarks) shows virtually no change between 5.30.0 and
5.31.10.
--
Timing information to be consistent should be done consistently with the
same number of parallel jobs. Doing it with one job is the most
reliable. Otherwise, succeeding runs after cleaning the workspace learn
from the previous run to schedule tests more intelligently in parallel.
And there is a bug in TAP::Harness which yields inaccurate information
for that scheduling algorithm. It doesn't start the timers until the
first output of a test is seen, typically after the buffer gets filled.
Thus, tests which do a lot of computation before outputting their
results are marked as taking much less time to complete than they
actually do. This throws off the scheduling. That same test writing a
diagnostic to stderr early (unbuffered) will show a significant
difference in timings.
I'm submitting a PR to fix this. But it remains that the wall time will
often be significantly shorter than the claimed CPU usage for a test
running in a single thread.
|
This issue hasn't turned up a minimally reproduceable test case, or even a general suspicion of "something in X part of core slowed down". It does seem likely that noise and/or the standard small % variability of performance across builds due to factors outside of our control might have been what was happening. I'll close this ticket now. If there are any specific slowdowns that people want looking at, please could you raise a new issue and include your detailed test case(s) there. |
Uh oh!
There was an error while loading. Please reload this page.
Module:
Description
We upgraded perl from 5.28.2 to 5.30.1 on Clear Linux. But we found there were performance decline (3% ~ 10%) on Perl benchmark (Pod2html and Interpreter). Here are detailed test results.
Steps to Reproduce
Expected behavior
Perl configuration
Perl Info
The text was updated successfully, but these errors were encountered: