Skip to content

rework tests for the PL_check change #17351

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 12, 2019
Merged

Conversation

tonycoz
Copy link
Contributor

@tonycoz tonycoz commented Dec 9, 2019

This re-works #17240 while retaining the original commit and functional change.

Nicholas has an alternative fix that's more back-portable (currently in #17343), but he also says in Re-implement IO::Handle getline and getlines as XS code:

PL_check itself should not need this level of
protection because it should be part of the interpreter's state, not global
state.

So this patch is still needed to fix the thread-safety issue.

If Nicholas's patch is applied the original io/getlines.t test is no longer meaningful, so I added similar code to XS::APItest, a test case for it and removed io/getlines.t.

@jkeenan
Copy link
Contributor

jkeenan commented Dec 9, 2019

I was about to push a smoke-me branch for this, but decided to first test it locally. One test failure:

$ cd t;./perl harness -v ../ext/XS-APItest/t/pl_check.t;cd -
This Perl not built to support threads
Compilation failed in require at t/pl_check.t line 10.
BEGIN failed--compilation aborted at t/pl_check.t line 10.
Dubious, test returned 255 (wstat 65280, 0xff00)
No subtests run 

Test Summary Report
-------------------
../ext/XS-APItest/t/pl_check.t (Wstat: 65280 Tests: 0 Failed: 0)
  Non-zero exit status: 255
  Parse errors: No plan found in TAP output

Thank you very much.
Jim Keenan

@jkeenan
Copy link
Contributor

jkeenan commented Dec 9, 2019

I was about to push a smoke-me branch for this, but decided to first test it locally. One test failure:

$ cd t;./perl harness -v ../ext/XS-APItest/t/pl_check.t;cd -
This Perl not built to support threads
Compilation failed in require at t/pl_check.t line 10.
BEGIN failed--compilation aborted at t/pl_check.t line 10.
Dubious, test returned 255 (wstat 65280, 0xff00)
No subtests run 

Test Summary Report
-------------------
../ext/XS-APItest/t/pl_check.t (Wstat: 65280 Tests: 0 Failed: 0)
  Non-zero exit status: 255
  Parse errors: No plan found in TAP output

Thank you very much.
Jim Keenan

Try this:

diff --git a/ext/XS-APItest/t/pl_check.t b/ext/XS-APItest/t/pl_check.t
index 60bd94852a..e359ab8700 100644
--- a/ext/XS-APItest/t/pl_check.t
+++ b/ext/XS-APItest/t/pl_check.t
@@ -6,13 +6,10 @@ use Config;
 BEGIN {
     require '../../t/test.pl';
 }
+BEGIN { plan skip_all => 'no threads' unless $Config{useithreads} }
 
 use threads;
 
-
-$Config{usethreads}
-  or plan skip_all => "test requires threads";
-
 # do not use XS::APItest in this test
 
 use constant thread_count => 20;

@tonycoz tonycoz force-pushed the 17240-rework-tests branch from ef894f4 to bf9eb61 Compare December 9, 2019 02:51
@jkeenan
Copy link
Contributor

jkeenan commented Dec 9, 2019

@tonycoz I think these code changes have performance problems and are not ready for merging.

I corrected the smoke-me branch I had created from your p.r. and proceeded to smoke it. On FreeBSD-11, running my most comprehensive configuration, the test suite as a whole aborted at the 9 hour max I have set in smokecurrent_config. I got failures in these individual test files:

$ grep -E '(Failed test|FAILED)' smokecurrent.log
# Failed test 4 -    right time (waited 6 secs for 3-sec alarm) at op/alarm.t line 54
[2019-12-09 04:10:58+0000] ../t/op/alarm.t.............................................FAILED
# Failed test 4 -    right time (waited 6 secs for 3-sec alarm) at op/alarm.t line 54
[2019-12-09 04:30:32+0000] ../t/op/alarm.t.............................................FAILED
# Failed test 955 - Test Perl 73464 at ./test.pl line 1062
[2019-12-09 05:11:30+0000] ../t/re/pat_thr.t...........................................FAILED
# Failed test 58 - should not COW on long string with substr and m//g at ./re/speed.t line 156
[2019-12-09 06:10:16+0000] ../t/re/speed_thr.t.........................................FAILED
# Failed test 4 -    right time (waited 16 secs for 3-sec alarm) at op/alarm.t line 54
[2019-12-09 07:00:31+0000] ../t/op/alarm.t.............................................FAILED
# Failed test 4 -    right time (waited 6 secs for 3-sec alarm) at op/alarm.t line 54
[2019-12-09 08:14:20+0000] ../t/op/alarm.t.............................................FAILED
# Failed test 4 -    right time (waited 17 secs for 3-sec alarm) at op/alarm.t line 54
[2019-12-09 09:07:47+0000] ../t/op/alarm.t.............................................FAILED
# Failed test 955 - Test Perl 73464 at ./test.pl line 1062
[2019-12-09 09:54:34+0000] ../t/re/pat_thr.t...........................................FAILED
# Failed test 955 - Test Perl 73464 at ./test.pl line 1062
# Failed test 4 -    right time (waited 6 secs for 3-sec alarm) at op/alarm.t line 54
[2019-12-09 10:42:06+0000] ../t/re/pat_thr.t...........................................FAILED
[2019-12-09 10:42:06+0000] ../t/op/alarm.t.............................................FAILED
# Failed test 955 - Test Perl 73464 at ./test.pl line 1062
# Failed test 4 -    right time (waited 16 secs for 3-sec alarm) at op/alarm.t line 54

Here is an example of the failure in t/re/pat_thr.t:

# Failed test 955 - Test Perl 73464 at ./test.pl line 1062
#      got "# Test process timed out - terminating"
# expected "Timeout"
# PROG:
#
#                 BEGIN{require q(test.pl);}
#                 watchdog(3);
#                 $SIG{ALRM} = sub {print "Timeout\n"; exit(1)};
#                 alarm 1;
#                 $_ = "a" x 1000 . "b" x 1000 . "c" x 1000;
#                 /.*a.*b.*c.*[de]/;
# 
# STATUS: 9

While it's not unusual to get failures during smoke-testing in t/re/pat_thr.t and t/op/alarm.t in some of the resource-constrained VMs sitting on this host, it is rare to get timeout failures on the host itself. And it's even rarer for the smoke-test run to exhaust the allocated 9 hours.

(Because the run aborted, no report was generated, but I can gzip the log and send it to you if you want. See http://perl5.test-smoke.org/report/101038 for a more typical smoke-test run on this host with this configuration.)

Granted, these failures could be due to transitory conditions on the host. But this needs more investigation before proceeding.

Thank you very much.
Jim Keenan

@tonycoz
Copy link
Contributor Author

tonycoz commented Dec 9, 2019

@tonycoz I think these code changes have performance problems and are not ready for merging.

I corrected the smoke-me branch I had created from your p.r. and proceeded to smoke it. On FreeBSD-11, running my most comprehensive configuration, the test suite as a whole aborted at the 9 hour max I have set in smokecurrent_config. I got failures in these individual test files:

$ grep -E '(Failed test|FAILED)' smokecurrent.log
# Failed test 4 -    right time (waited 6 secs for 3-sec alarm) at op/alarm.t line 54
[2019-12-09 04:10:58+0000] ../t/op/alarm.t.............................................FAILED
# Failed test 4 -    right time (waited 6 secs for 3-sec alarm) at op/alarm.t line 54
[2019-12-09 04:30:32+0000] ../t/op/alarm.t.............................................FAILED
# Failed test 955 - Test Perl 73464 at ./test.pl line 1062
[2019-12-09 05:11:30+0000] ../t/re/pat_thr.t...........................................FAILED
# Failed test 58 - should not COW on long string with substr and m//g at ./re/speed.t line 156
[2019-12-09 06:10:16+0000] ../t/re/speed_thr.t.........................................FAILED
# Failed test 4 -    right time (waited 16 secs for 3-sec alarm) at op/alarm.t line 54
[2019-12-09 07:00:31+0000] ../t/op/alarm.t.............................................FAILED
# Failed test 4 -    right time (waited 6 secs for 3-sec alarm) at op/alarm.t line 54
[2019-12-09 08:14:20+0000] ../t/op/alarm.t.............................................FAILED
# Failed test 4 -    right time (waited 17 secs for 3-sec alarm) at op/alarm.t line 54
[2019-12-09 09:07:47+0000] ../t/op/alarm.t.............................................FAILED
# Failed test 955 - Test Perl 73464 at ./test.pl line 1062
[2019-12-09 09:54:34+0000] ../t/re/pat_thr.t...........................................FAILED
# Failed test 955 - Test Perl 73464 at ./test.pl line 1062
# Failed test 4 -    right time (waited 6 secs for 3-sec alarm) at op/alarm.t line 54
[2019-12-09 10:42:06+0000] ../t/re/pat_thr.t...........................................FAILED
[2019-12-09 10:42:06+0000] ../t/op/alarm.t.............................................FAILED
# Failed test 955 - Test Perl 73464 at ./test.pl line 1062
# Failed test 4 -    right time (waited 16 secs for 3-sec alarm) at op/alarm.t line 54

Here is an example of the failure in t/re/pat_thr.t:

# Failed test 955 - Test Perl 73464 at ./test.pl line 1062
#      got "# Test process timed out - terminating"
# expected "Timeout"
# PROG:
#
#                 BEGIN{require q(test.pl);}
#                 watchdog(3);
#                 $SIG{ALRM} = sub {print "Timeout\n"; exit(1)};
#                 alarm 1;
#                 $_ = "a" x 1000 . "b" x 1000 . "c" x 1000;
#                 /.*a.*b.*c.*[de]/;
# 
# STATUS: 9

While it's not unusual to get failures during smoke-testing in t/re/pat_thr.t and t/op/alarm.t in some of the resource-constrained VMs sitting on this host, it is rare to get timeout failures on the host itself. And it's even rarer for the smoke-test run to exhaust the allocated 9 hours.

(Because the run aborted, no report was generated, but I can gzip the log and send it to you if you want. See http://perl5.test-smoke.org/report/101038 for a more typical smoke-test run on this host with this configuration.)

Yes please.

The underlying change is pretty simple, I'd expect it to increase memory usage per interpreter by 3200 bytes, and start up time per interpreter is increased by copying that from the parent thread or the global version (which shouldn't be noticeable unless you're specifically benchmarking.

At compile-time (for perl code, not the perl build itself) we're accessing an interpreter variable instead of a global, which shouldn't be any more expensive if my_perl (aTHX) ends up in a register.

The tests should only have an effect on those tests, or any tests running in parallel with them, the new test in the first commit spawns 20 threads and tries to load XS::APItest in each thread, and that's moderately expensive, but my old Q6600 has no problem with it.

Granted, these failures could be due to transitory conditions on the host. But this needs more investigation before proceeding.

It might be useful to record the results of vmstat 1 5 before and after each run to get some idea of system load, and maybe uptime for the load average report.

Recording vmstat for the entire job would be better, but that's a little more difficult to automate.

@jkeenan
Copy link
Contributor

jkeenan commented Dec 11, 2019

@tonycoz I think these code changes have performance problems and are not ready for merging.
[snip]

Granted, these failures could be due to transitory conditions on the host. But this needs more investigation before proceeding.

[snip]

This turned out to be the case. In the last week, instead of having only 1 or 2 virtual box VMs running on the host, I had 4. This sapped memory. For example, when I had 4 VMs running, I observed this line in top on the host:

Mem: 74M Active, 2448K Inact, 248K Laundry, 7675M Wired, 151M Free

But when I shut down 3 of those 4 VMs, I observed this:

Mem: 58M Active, 244M Inact, 20K Laundry, 3421M Wired, 4179M Free

I then ran a new test of the branch I had created from Tony's p.r. using the same configuration. The result is here: smoke-report 101513. All tests PASS and the running time is actually slightly shorter than usual (though probably not to a statistically significant degree).

Upshot: I no longer object to p.r. 17351.

Thank you very much.
Jim Keenan

tonycoz and others added 3 commits December 12, 2019 09:50
Nicholas Clark's fix for IO makes the test in niner's patch
meaningless, so test it separately.
io/handle.t depends on IO::Handle using the PL_check hack, but
Nicholas's back portable fix no longer uses that

Fix threaded perl detection, thanks to James Keenan.
tonycoz added a commit to tonycoz/perl5 that referenced this pull request Dec 12, 2019
This failed on Win32 like in the Perl#17351 CI checks with:

../dist/if/t/if.t .................................................. ok
Can't call method "sockdomain" on an undefined value at t/cachepropagate-tcp.t line 46.
# Looks like your test exited with 9 just after 5.
../dist/IO/t/cachepropagate-tcp.t ..................................
Dubious, test returned 9 (wstat 2304, 0x900)
Failed 3/8 subtests

I suspect what happened is there was a race between the parent
accepting the connection and the child exiting and closing the
connection.

The Microsoft documentation for accept() indicates one possible
reason for failure is:

WSAECONNRESET
	An incoming connection was indicated, but was subsequently
        terminated by the remote peer prior to accepting the call.

which I suspect happened here.

So I've:

- added a basic error check for the result of accept()
- made the child to wait for the parent to close the socket
- the parent explicitly closes the socket
@tonycoz tonycoz merged commit c8b3a65 into Perl:blead Dec 12, 2019
tonycoz added a commit that referenced this pull request Dec 16, 2019
This failed on Win32 like in the #17351 CI checks with:

../dist/if/t/if.t .................................................. ok
Can't call method "sockdomain" on an undefined value at t/cachepropagate-tcp.t line 46.
# Looks like your test exited with 9 just after 5.
../dist/IO/t/cachepropagate-tcp.t ..................................
Dubious, test returned 9 (wstat 2304, 0x900)
Failed 3/8 subtests

I suspect what happened is there was a race between the parent
accepting the connection and the child exiting and closing the
connection.

The Microsoft documentation for accept() indicates one possible
reason for failure is:

WSAECONNRESET
	An incoming connection was indicated, but was subsequently
        terminated by the remote peer prior to accepting the call.

which I suspect happened here.

So I've:

- added a basic error check for the result of accept()
- made the child to wait for the parent to close the socket
- the parent explicitly closes the socket
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants