Skip to content

Nicholas Clark's fix for the IO::getline problem #17343

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

tonycoz
Copy link
Contributor

@tonycoz tonycoz commented Dec 4, 2019

as described in #16554

This fixes that problem, but not the general problem that PL_check is thread-unsafe

This was smoke-med and seems to have been forgotten after that.

nwc10 added 2 commits December 4, 2019 13:37
Extend the tests for <> and the open pragma to verify that the behaviour
changes with/without the open pragma.
…rl#14816.

Re-implement getline() and getlines() as XS code.

The underlying problem that we're trying to solve here is making
getline() and getlines() in IO::Handle respect the open pragma.

That bug was first addressed in Sept 2011 by commit 986a805:
    Make IO::Handle::getline(s) respect the open pragma

However, that fix introduced a more subtle bug, hence this reworking.
Including the entirety of the rest of that commit message because it
explains both the bug the previous approach:

    See <https://rt.cpan.org/Ticket/Display.html?id=66474>.  Also, this
    came up in <https://rt.perl.org/rt3/Ticket/Display.html?id=92728>.

    The <> operator, when reading from the magic ARGV handle, automatic-
    ally opens the next file.  Layers set by the lexical open pragma are
    applied, if they are in scope at the point where <> is used.

    This works almost all the time, because the common convention is:

        use open ":utf8";

        while(<>) {
            ...
        }

    IO::Handle’s getline and getlines methods are Perl subroutines
    that call <> themselves.  But that happens within the scope of
    IO/Handle.pm, so the caller’s I/O layer settings are ignored.  That
    means that these two expressions are not equivalent within in a
    ‘use open’ scope:

        <>
        *ARGV->getline

    The latter will open the next file with no layers applied.

    This commit solves that by putting PL_check hooks in place in
    IO::Handle before compiling the getline and getlines subroutines.
    Those hooks cause every state op (nextstate, or dbstate under the
    debugger) to have a custom pp function that saves the previous value
    of PL_curcop, calls the default pp function, and then restores
    PL_curcop.

    That means that getline and getlines run with the caller’s compile-
    time hints.  Another way to see it is that getline and getlines’s own
    lexical hints are never activated.

    (A state op carries all the lexical pragmata.  Every statement
    has one.  When any op executes, it’s ‘pp’ function is called.
    pp_nextstate and pp_dbstate both set PL_curcop to the op itself.  Any
    code that checks hints looks at PL_curcop, which contains the current
    run-time hints.)

The problem with this approach is that the (current) design and implementation
of PL_check hooks is actually not threadsafe. There's one array (as a global),
which is used by all interpreters in the process. But as the code added to
IO.xs demonstrates, realistically it needs to be possible to change the hook
just for this interpreter.

GH Perl#14816 has a fix for that bug for blead. However, it will be tricky (to
impossible) to backport to earlier perl versions.

Hence it's also worthwhile to change IO.xs to use a different approach to
solve the original bug. As described above, the bug is fixed by having the
readline OP (that implements getline() and getlines()) see the caller's
lexical state, not their "own". Unlike Perl subroutines, XS subroutines don't
have any lexical hints of their own. getline() and getlines() are very
simple, mostly parameter checking, ending with a one line that maps to
a single core OP, whose values are directly returned.

Hence "all" we need to do re-implement the Perl code as XS. This might look
easy, but turns out to be trickier than expected. There isn't any API to be
called for the OP in question, pp_readline(). The body of the OP inspects
interpreter state, it directly calls pp_rv2gv() which also inspects state,
and then it tail calls Perl_do_readline(), which inspects state.

The easiest approach seems to be to set up enough state, and then call
pp_readline() directly. This leaves us very tightly coupled to the
internals, but so do all other approaches to try to tackle this bug.

The current implementation of PL_check (and possibly other arrays) still
needs to be addressed.
@tonycoz
Copy link
Contributor Author

tonycoz commented Jan 19, 2020

This was merged in 7a992cc

@tonycoz tonycoz closed this Jan 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants