Skip to content

(Mac)Debug ALL assertion fails when instancing embedded code before a named capture followed by named back reference #19350

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AnFunctionArray opened this issue Jan 19, 2022 · 22 comments

Comments

@AnFunctionArray
Copy link
Contributor

AnFunctionArray commented Jan 19, 2022

Module: re

NOTE: Just seems to be a Mac only issue please post your version if you are reporting as a false positive.

I just tested on a Mac 12.1 virtual machine (in case my original installation was corrupt) and is still reproducible).

Description
Assertion failed when I use qw(Debug ALL) in my program (on my default built-in perl mac os 12.1).

(_svpvx) & SVt_MASK]), function my_regprop, file re_comp.c, line 20441.
zsh: abort      perl test.pl

Steps to Reproduce

use re qw(Debug ALL);

qr{(?{callfunc})(?<name>\g{absqualifsbasic}}

Expected behavior
Not assert fail.

Perl configuration

Summary of my perl5 (revision 5 version 30 subversion 3) configuration:
   
  Platform:
    osname=darwin
    osvers=21.0
    archname=darwin-thread-multi-2level
    uname='darwin bb-g9-pdb59.ta10.sd.apple.com 21.0 darwin kernel version 20.1.0: tue jun 29 22:08:19 pdt 2021; root:xnu-7195.41.8.100.7~1development_x86_64 x86_64 '
    config_args='-ds -e -Dprefix=/usr -Dccflags=-g  -pipe  -Dldflags= -Dman3ext=3pm -Duseithreads -Duseshrplib -Dinc_version_list=none -Dcc=cc'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=define
    usemultiplicity=define
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
    bincompat5005=undef
  Compiler:
    cc='cc'
    ccflags =' -g -pipe -fno-strict-aliasing -fstack-protector-strong -DPERL_USE_SAFE_PUTENV'
    optimize='-Os'
    cppflags='-g -pipe -fno-strict-aliasing -fstack-protector-strong'
    ccversion=''
    gccversion='Apple LLVM 13.0.0 (clang-1300.0.29.10) [+internal-os, ptrauth-isa=deployment-target-based]'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='cc'
    ldflags =' -fstack-protector-strong'
    libpth=/System/Volumes/Data/SWE/macOS/BuildRoots/5b2e67f8af/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.1.Internal.sdk/usr/local/lib /System/Volumes/Data/SWE/macOS/BuildRoots/5b2e67f8af/Applications/Xcode.app/Contents/Developer/Toolchains/OSX12.1.xctoolchain/usr/lib/clang/13.0.0/lib /System/Volumes/Data/SWE/macOS/BuildRoots/5b2e67f8af/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.1.Internal.sdk/usr/lib /System/Volumes/Data/SWE/macOS/BuildRoots/5b2e67f8af/Applications/Xcode.app/Contents/Developer/Toolchains/OSX12.1.xctoolchain/usr/lib /System/Volumes/Data/SWE/macOS/BuildRoots/5b2e67f8af/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib /usr/lib /usr/local/lib
    libs= 
    perllibs=
    libc=
    so=dylib
    useshrplib=true
    libperl=libperl.dylib
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=bundle
    d_dlsymun=undef
    ccdlflags=' '
    cccdlflags=' '
    lddlflags=' -bundle -undefined dynamic_lookup -fstack-protector-strong'


Characteristics of this binary (from libperl): 
  Compile-time options:
    HAS_TIMES
    MULTIPLICITY
    PERLIO_LAYERS
    PERL_COPY_ON_WRITE
    PERL_DONT_CREATE_GVSV
    PERL_IMPLICIT_CONTEXT
    PERL_MALLOC_WRAP
    PERL_OP_PARENT
    PERL_PRESERVE_IVUV
    PERL_USE_SAFE_PUTENV
    USE_64_BIT_ALL
    USE_64_BIT_INT
    USE_ITHREADS
    USE_LARGE_FILES
    USE_LOCALE
    USE_LOCALE_COLLATE
    USE_LOCALE_CTYPE
    USE_LOCALE_NUMERIC
    USE_LOCALE_TIME
    USE_PERLIO
    USE_PERL_ATOF
    USE_REENTRANT_API
    USE_THREAD_SAFE_LOCALE
  Locally applied patches:
    /Library/Perl/Updates/<version> comes before system perl directories
    installprivlib and installarchlib points to the Updates directory
  Built under darwin
  Compiled at Nov 13 2021 00:39:53
  @INC:
    /Library/Perl/5.30/darwin-thread-multi-2level
    /Library/Perl/5.30
    /Network/Library/Perl/5.30/darwin-thread-multi-2level
    /Network/Library/Perl/5.30
    /Library/Perl/Updates/5.30.3
    /System/Library/Perl/5.30/darwin-thread-multi-2level
    /System/Library/Perl/5.30
    /System/Library/Perl/Extras/5.30/darwin-thread-multi-2level
    /System/Library/Perl/Extras/5.30
@AnFunctionArray
Copy link
Contributor Author

I closed since the example worked on online perl but maybe this is a macos exclusive issue.

@demerphq
Copy link
Collaborator

demerphq commented Jan 19, 2022 via email

@AnFunctionArray
Copy link
Contributor Author

On Wed, 19 Jan 2022 at 01:38, 6a4h8 @.***> wrote: Module: re Description Assertion failed when I use qw(Debug ALL) in my program. Steps to Reproduce use re qw(Debug ALL); qr{(?{callfunc})(?\g{absqualifsbasic}} Expected behavior Not assert fail. Perl configuration Summary of my perl5 (revision 5 version 30 subversion 3) configuration: No assert fail in 5.28.1:
$ perl -Mre=Debug,ALL -e'qr{(?{callfunc})(?\g{absqualifsbasic}}' Assembling pattern from 2 elements Compiling REx "(?{callfunc})(?\g{absqualifsbasic}" Starting first pass (sizing) <(?{callfunc>...| 1| reg | | brnc | | piec | | atom <?{callfunc}>...| | reg <(?\g{>...| 4| piec | | atom <?\g{a>...| | reg <\g{absquali>...| 6| brnc | | piec | | atom Unmatched ( in regex; marked by <-- HERE in m/(?{callfunc})( <-- HERE ?\g{absqualifsbasic}/ at -e line 1.

-- perl -Mre=debug -e "/just|another|perl|hacker/"

Did you try on a mac? what is your perl version?

@AnFunctionArray AnFunctionArray changed the title Debug ALL assertion fails when instancing embedded code before a named capture followed by named back reference (Mac)Debug ALL assertion fails when instancing embedded code before a named capture followed by named back reference Jan 19, 2022
@demerphq
Copy link
Collaborator

demerphq commented Jan 19, 2022 via email

@AnFunctionArray
Copy link
Contributor Author

On Wed, 19 Jan 2022 at 06:15, 6a4h8 @.> wrote: On Wed, 19 Jan 2022 at 01:38, 6a4h8 @.> wrote: Module: re Description Assertion failed when I use qw(Debug ALL) in my program. Steps to Reproduce use re qw(Debug ALL); qr{(?{callfunc})(?\g{absqualifsbasic}} Expected behavior Not assert fail. Perl configuration Summary of my perl5 (revision 5 version 30 subversion 3) configuration: No assert fail in 5.28.1: $ perl -Mre=Debug,ALL -e'qr{(?{callfunc})(?\g{absqualifsbasic}}' Assembling pattern from 2 elements Compiling REx "(?{callfunc})(?\g{absqualifsbasic}" Starting first pass (sizing) <(?{callfunc>...| 1| reg | | brnc | | piec | | atom <?{callfunc}>...| | reg <(?\g{>...| 4| piec | | atom <?\g{a>...| | reg <\g{absquali>...| 6| brnc | | piec | | atom Unmatched ( in regex; marked by <-- HERE in m/(?{callfunc})( <-- HERE ?\g{absqualifsbasic}/ at -e line 1. … <#m_-5255632312895881260_> -- perl -Mre=debug -e "/just|another|perl|hacker/" Did you try on a mac?
Nope. I have a linux box.
what is your perl version?
Stated above 5.28.1 yves.

Well it seems to be an issue only with Mac - (tested on mac mini m1 with both bult-in version and blead and also on a vm inside the mac).

@choroba
Copy link
Contributor

choroba commented Jan 22, 2022

Fails with the latest blead perl on Linux:

$ blead perl -e'use re Debug=>"ALL";qr{(?{a})(?<b>\g{c})}'
Assembling pattern from 2 elements
Compiling REx "(?{a})(?<b>\g{c})"
Starting parse and generation
<(?{a})(?<b>>...|   1|  reg    
                |    |    brnc   
                |    |      piec   
                |    |        atom   
<?{a})(?<b>\>...|    |          reg    
<(?<b>\g{c})>   |   4|      piec   
                |    |        atom   
<?<b>\g{c})>    |    |          reg    
                |    |            Setting open paren #1 to 4
<\g{c})>        |   6|            brnc   
                |    |              piec   
                |    |                atom   
<)>             |   8|            tail~ OPEN1 'b' (4) -> REFN
                |    |            Setting close paren #1 to 8
                |  10|          lsbrperl5.35.9: re_comp.c:21443: my_regprop: Assertion `PL_valid_types_PVX[SvTYPE(_svpvx) & SVt_MASK]' failed.
/home/choroba/bin/blead: line 26: 31743 Aborted                 (core dumped) "${exec[0]}" "$@"
$ blead -v

This is perl 5, version 35, subversion 9 (v5.35.9 (v5.35.8-7-g8030c9e253)) built for x86_64-linux-thread-multi

@demerphq
Copy link
Collaborator

demerphq commented Jan 23, 2022 via email

demerphq added a commit that referenced this issue Jan 23, 2022
This reverts the op.c change from
"newSVpvn_flags().. is more efficient than sv_2mortal(newSVpvn(..))"
full commit id 2a98b8c

For some reason the op.c change tickles a compiler bug. Various minor
changes to the text result in this working as expected, and then other
changes make it fail, etc. I was not able to work out why, the code as
presented is not wrong, although it hides some stuff underneath.

The expression is:

    newSVpvn_flags( PadnamePV(name)+1,PadnameLEN(name)-1,
               (PadnameUTF8(name)) ? SVf_UTF8|SVs_TEMP : SVs_TEMP
              );

Which should be the same as:

    sv_2mortal(newSVpvn_utf8(
                PadnamePV(name)+1,PadnameLEN(name)-1, PadnameUTF8(name)
              ));

However for some reason the compiler does something different. An
interesting subtlety is that PadnameUTF8() always returns 1. So you
would think that it could be replaced by

    newSVpvn_flags( PadnamePV(name)+1,PadnameLEN(name)-1, SVf_UTF8|SVs_TEMP);

but when I compile that it seems to randomly produce the right result or
not. I tried various formulations of parenthesizing the arguments,
reducing the ternary to its natural result (above) etc, and the compiler
seemed to produce random results. The only formulation that reliably
produced the correct results is the original one.
demerphq added a commit that referenced this issue Jan 23, 2022
This reverts the op.c change from
"newSVpvn_flags().. is more efficient than sv_2mortal(newSVpvn(..))"
full commit id 2a98b8c

For some reason the op.c change tickles a compiler bug. Various minor
changes to the text result in this working as expected, and then other
changes make it fail, etc. I was not able to work out why, the code as
presented is not wrong, although it hides some stuff underneath.

The expression is:

    newSVpvn_flags( PadnamePV(name)+1,PadnameLEN(name)-1,
               (PadnameUTF8(name)) ? SVf_UTF8|SVs_TEMP : SVs_TEMP
              );

Which should be the same as:

    sv_2mortal(newSVpvn_utf8(
                PadnamePV(name)+1,PadnameLEN(name)-1, PadnameUTF8(name)
              ));

However for some reason the compiler does something different. An
interesting subtlety is that PadnameUTF8() always returns 1. So you
would think that it could be replaced by

    newSVpvn_flags( PadnamePV(name)+1,PadnameLEN(name)-1, SVf_UTF8|SVs_TEMP);

but when I compile that it seems to randomly produce the right result or
not. I tried various formulations of parenthesizing the arguments,
reducing the ternary to its natural result (above) etc, and the compiler
seemed to produce random results. The only formulation that reliably
produced the correct results is the original one.
@demerphq
Copy link
Collaborator

demerphq commented Jan 23, 2022 via email

@demerphq
Copy link
Collaborator

demerphq commented Jan 23, 2022 via email

@demerphq
Copy link
Collaborator

demerphq commented Jan 23, 2022 via email

@demerphq
Copy link
Collaborator

demerphq commented Jan 23, 2022 via email

@demerphq
Copy link
Collaborator

demerphq commented Jan 23, 2022 via email

demerphq added a commit that referenced this issue Jan 23, 2022
In 7c932d0 karl changed the regex parser
to not do two passes always, and instead do one if it could. However in
some edge cases it still must do a second past and some of the info needed
for the Debug => "All" would not be available during the first pass.

This was compounded by S_add_data() validly returning a 0 index for
data that was stored in the data array, which meant that there was no
way to tell the difference between "we dont have the data to call S_add_data()
at all" and "we called S_add_data() and it returned 0", this in turn
would cause the dumping logic to try to access data that was not there,
and misinterpret it as a valid SV, with ensure assert fails or worse
segfaults.

This patch changes S_add_data() so it always returns a non-zero
return, so that the regex debug logic can tell that it shouldnt do
a lookup into the data array. This means create a new "what" type
of "%", which is used SOLELY to populate the ->data[0] and ->what[0]
with a dummy entry.

With this done we can add guard statements to places that lookup
things in the data array, if the index is 0 it can not be valid.
@demerphq
Copy link
Collaborator

demerphq commented Jan 23, 2022 via email

demerphq added a commit that referenced this issue Jan 23, 2022
This test should be run multiple times, and we should add more
tests for the segfaulting cases. Will do it tomorrow.
@khwilliamson
Copy link
Contributor

I'm examining this in detail. I think that clang asan and valgrind don't have the heisenbug problem. asan revealed undefined behavior in a 32 bit int word which I've fixed in blead.

@demerphq
Copy link
Collaborator

demerphq commented Jan 24, 2022 via email

demerphq added a commit that referenced this issue Jan 24, 2022
…SE")

In 7c932d0 karl changed the regex parser
to not do two passes always, and instead do one if it could. However in
some edge cases it still must do a second past and some of the info needed
for the Debug => "All" would not be available during the first pass.

This was compounded by S_add_data() validly returning a 0 index for
data that was stored in the data array, which meant that there was no
way to tell the difference between "we dont have the data to call S_add_data()
at all" and "we called S_add_data() and it returned 0", this in turn
would cause the dumping logic to try to access data that was not there,
and misinterpret it as a valid SV, with ensure assert fails or worse
segfaults.

This patch changes S_add_data() so it always returns a non-zero
return, so that the regex debug logic can tell that it shouldnt do
a lookup into the data array. This means create a new "what" type
of "%", which is used SOLELY to populate the ->data[0] and ->what[0]
with a dummy entry.

With this done we can add guard statements to places that lookup
things in the data array, if the index is 0 it can not be valid.
demerphq added a commit that referenced this issue Jan 24, 2022
Note that 'use re Debug => "ALL"' and 'use re Debug => "PARSE"'
are basically the same as far as this issue goes. ALL implies PARSE
along with other things, the fault is in how the PARSE part works,
and the omitting the full output of ALL and just testing PARSE is
sufficient to test this issue, and avoids any need to deal with
memory addresses or word size issues that are irrelevant to the
issue anyway.
@demerphq
Copy link
Collaborator

demerphq commented Jan 24, 2022 via email

@khwilliamson
Copy link
Contributor

I'm examining this in detail. I think that clang asan and valgrind don't have the heisenbug problem. asan revealed undefined behavior in a 32 bit int word which I've fixed in blead.

@khwilliamson
Copy link
Contributor

I have no problems with the patch, besides the trivial ones of a typo in a comment, and in the commit message. And there are new trailing white spaces that git complains about. github seems to hide these. Do we have an official position on these?

Thanks for the patch and analysis.

I wasn't convinced that there was a one-off error, but adding one as a precaution seems fine to me.

My real concern is that the test feels brittle. It would fail if we make even trivial changes in the output of Debug, which becomes a future maintenance time waster. Isn't the segfault a legitimate reduction for the initial problem? If so why not use it. If not, I think the fact that valgrind and asan don't exhibit heisenbug issues means we don't have to run the test x number of times

@demerphq
Copy link
Collaborator

demerphq commented Jan 25, 2022 via email

khwilliamson pushed a commit that referenced this issue Feb 1, 2022
In 7c932d0 karl changed the regex
parser to not do two passes always, and instead do one if it could.
However in some edge cases it still must do a second past and some of
the info needed for the Debug => "All" would not be available during the
first pass.

This was compounded by S_add_data() validly returning a 0 index for data
that was stored in the data array, which meant that there was no way to
tell the difference between "we dont have the data to call S_add_data()
at all" and "we called S_add_data() and it returned 0", this in turn
would cause the dumping logic to try to access data that was not there,
and misinterpret it as a valid SV, with ensure assert fails or worse
segfaults.

This patch changes S_add_data() so it always returns a non-zero return,
so that the regex debug logic can tell that it shouldnt do a lookup into
the data array. This means create a new "what" type of "%", which is
used SOLELY to populate the ->data[0] and ->what[0] with a dummy entry.

With this done we can add guard statements to places that lookup things
in the data array, if the index is 0 it can not be valid.
@khwilliamson
Copy link
Contributor

You persuaded me. I fixed one typo, reflowed overlong comments, and pushed otherwise unchanged. This issue should now be fixed by a963d6d

@demerphq
Copy link
Collaborator

demerphq commented Feb 1, 2022 via email

@demerphq
Copy link
Collaborator

demerphq commented Feb 2, 2022 via email

steve-m-hay pushed a commit that referenced this issue Feb 26, 2022
In 7c932d0 karl changed the regex
parser to not do two passes always, and instead do one if it could.
However in some edge cases it still must do a second past and some of
the info needed for the Debug => "All" would not be available during the
first pass.

This was compounded by S_add_data() validly returning a 0 index for data
that was stored in the data array, which meant that there was no way to
tell the difference between "we dont have the data to call S_add_data()
at all" and "we called S_add_data() and it returned 0", this in turn
would cause the dumping logic to try to access data that was not there,
and misinterpret it as a valid SV, with ensure assert fails or worse
segfaults.

This patch changes S_add_data() so it always returns a non-zero return,
so that the regex debug logic can tell that it shouldnt do a lookup into
the data array. This means create a new "what" type of "%", which is
used SOLELY to populate the ->data[0] and ->what[0] with a dummy entry.

With this done we can add guard statements to places that lookup things
in the data array, if the index is 0 it can not be valid.

(cherry picked from commit a963d6d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants