Perl SEGV #23326

ancientwizard · 2025-05-25T19:40:13Z

Module:

Description
Perl SEGV's during SIGCHLD.

BTW I opened #23241 before but that issue has been fixed by a patch and pull request submitted the DBD::Oracle myself. This isn't an Oracle only issue. Apparently and dynamic library that creates threads could find their way into Perl land trying to service an interrupt.

Steps to Reproduce
The steps to reproduce are reported within perl5-dbi/DBD-Oracle#192
I have added a test that reproduces the SEGV t/92-segv-fork.t

Expected behavior
Perl should not SEGV

Perl configuration

CFLAGS="-fsanitize=address -g -fno-omit-frame-pointer" LDFLAGS="-fsanitize=address" \
./Configure -des -Dusethreads -Duse64bitint -Duse64bitall -Dprefix=/your-install-path \
      -Alddlflags="-shared -g -O0 -L/usr/local/lib -fstack-protector-strong" \
      -Accflags="-g -O0"

The text was updated successfully, but these errors were encountered:

tonycoz · 2025-05-25T23:00:36Z

Possibly fixed by 85e9706

ancientwizard · 2025-05-26T00:48:04Z

Possibly fixed by 85e9706

So you're suggesting that the Oracle instant client lib starts one or more additional non Perl threads that may leak into Perl land causing mayhem? That wouldn't seem like the typical M.O. of a dynamic library.

tonycoz · 2025-05-26T00:57:14Z

It's only a problem when you mix library worker threads with signals.

Your backtrace:

#6  0x00007fffeeff1ec9 in kpucpincrtime () from /usr/lib/oracle/23/client64/lib/libclntsh.so.23.1
#7  0x00007ffff7bb51ca in start_thread (arg=<optimized out>) at pthread_create.c:479
#8  0x00007ffff6e5d8d3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

shows this isn't a perl thread.

ancientwizard · 2025-05-26T18:41:30Z

It's only a problem when you mix library worker threads with signals.

Your backtrace:

#6  0x00007fffeeff1ec9 in kpucpincrtime () from /usr/lib/oracle/23/client64/lib/libclntsh.so.23.1
#7  0x00007ffff7bb51ca in start_thread (arg=<optimized out>) at pthread_create.c:479
#8  0x00007ffff6e5d8d3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

shows this isn't a perl thread.

Thanks for the observation. I thought the GDB output seemed to suggest this wasn't a main thread; but I wasn't exactly ready to believe because I'm not creating any additional Perl threads. I have confirmed that once the Perl has connected and then disconnected from the DB the process does have an additional thread. I can easily see it by inspecting /proc/$$/status. Once created my app is extremely suspectable to SEGV's.

I'm in no position to coerce or convince the Perl maintainers to make change to Perl's SIG handling. ALSO: I have yet to find a way to tell the OCI to
A) not create threads
B) keep its threads on a leash!

I suppose there is a reason Oracle didn't have a built-in solution to funnel signals to the main thread; starting with they know a whole lot more than I do on the subject.

xenu · 2025-05-26T19:48:10Z

So does this still happen on blead or not? It's not clear which Perl version you're using.

bulk88 · 2025-05-26T19:54:16Z

Possibly fixed by 85e9706

I read somewhere online (gossip), pthreads wants to or did actually deploy to prod a change, that their pthread_t type is now a 16 byte struct and not a 8 byte size_t opaque pointer. I forgot the rational but either it removes 1 level of ptr derefs, or a pthread_t object has no destructor and no memory behind it anymore, its not ref counted, its not malloced, its pass by copy, with a benefit of infinite CPU core count scaling because there is no more user mode global lock for getter/setter methods for pthread metadata/TLS.

Upgrading to 16 byte structs did cause widespread generic common POSIX app breakage, because suddenly == C operator syntax errors.

Google brings up https://forums.oracle.com/ords/apexds/post/issue-with-system-call-with-libclntsh-so-19-1-0673 and SIGCHILD drama.

bulk88 · 2025-05-26T20:33:00Z

Thanks for the observation. I thought the GDB output seemed to suggest this wasn't a main thread; but I wasn't exactly ready to believe because I'm not creating any additional Perl threads. I have confirmed that once the Perl has connected and then disconnected from the DB the process does have an additional thread. I can easily see it by inspecting /proc/$$/status. Once created my app is extremely suspectable to SEGV's.

I'm in no position to coerce or convince the Perl maintainers to make change to Perl's SIG handling. ALSO: I have yet to find a way to tell the OCI to A) not create threads B) keep its threads on a leash!

I suppose there is a reason Oracle didn't have a built-in solution to funnel signals to the main thread; starting with they know a whole lot more than I do on the subject.

SP my API names before using them. dTHX;/Perl_get_context();/ENTER/SAVETEMPS/PUSHMARK/Perl_call_sv() executing on a random OS thread, from a OS/LibC thread pool, is always user error. And a very common recurrent mousetrap bug for all Perl CPAN XS code running on Windows. Without detaching a my_perl ptr from 1 OS thread, pause/sleeping/idling the old OS thread, and attach the my_perl ptr to a new OS thread with Perl_set_context(), any other behavior is illegal/invalid.

1 my_perl ptr can be bound to exactly 1 OS TID at any given time.

WinPerl has an my_perl ptr of last resort for signal handlers and a very small % of all possible permutations of thread pool accidents (most will just SEGV on WinPerl). See PL_curinterp specifically. IDK what Perl on other OSes do. Perl also has a "safe signals" API that packetizes/async (0.25-30 ms away) resends the unix signal event back to the root my_perl, to execute at a safe control flow location inside the run loop. its called PERL_ASYNC_CHECK().

interpreter * perl_alloc_using(IPerlMem **ipM, IPerlMem **ipMS, IPerlMem **ipMP, IPerlEnv **ipE, IPerlStdIO **ipStd, IPerlLIO **ipLIO, IPerlDir **ipD, IPerlSock **ipS, IPerlProc **ipP)
{
  interpreter *my_perl;
  _PerlIO **fd;
  interpreter *result;

  my_perl = (interpreter *)(*ipM)->pCalloc(ipM, 1ui64, 4816ui64);
  if ( PL_curinterp )
  {
    Perl_set_context(my_perl);
  }
  else
  {
    PL_curinterp = my_perl;
    if ( my_perl && !PL_veto_switch_non_tTHX_context )
      Perl_switch_locale_context(my_perl);
    PL_thr_key = TlsAlloc();
    if ( PL_thr_key == -1 )
    {
      fd = Perl_PerlIO_stderr(my_perl);
      PerlIO_printf(fd, "panic: TlsAlloc");
      exit(1);
    }
    Perl_set_context(my_perl);
    InitializeCriticalSection(&PL_op_mutex);
    InitializeCriticalSection(&PL_check_mutex);
    InitializeCriticalSection(&PL_keyword_plugin_mutex);
    InitializeCriticalSection(&PL_hints_mutex);
    InitializeCriticalSection(&PL_locale_mutex.lock);

jkeenan · 2025-05-26T21:20:31Z

@tonycoz, is this a perl problem or a DBD-Oracle problem? If the latter, then we should direct the OP to a more appropriate place to file this bug report. Thanks.

ancientwizard · 2025-05-26T21:57:15Z

@tonycoz, is this a perl problem or a DBD-Oracle problem? If the latter, then we should direct the OP to a more appropriate place to file this bug report. Thanks.

This is a Perl vs libraries that create non Perl worker threads. PERL can't cope with non Perl threads entering its domain.

tonycoz · 2025-05-26T22:28:18Z

@tonycoz, is this a perl problem or a DBD-Oracle problem? If the latter, then we should direct the OP to a more appropriate place to file this bug report. Thanks.

It appears to be the same as #22487 which is fixed in blead.

We won't know if it fixes it for OP until they test it, though they might run into other DBD::Oracle issues along the way.

Edit: if you're using a vendor perl you might be able to convince them to backport 85e9706

tonycoz · 2025-05-26T22:39:18Z

I read somewhere online (gossip), pthreads wants to or did actually deploy to prod a change, that their pthread_t type is now a 16 byte struct and not a 8 byte size_t opaque pointer

pthreads isn't a specific implementation, but a specification. That specification does allow for non-scalar pthread_t which is why the test code in 85e9706 uses pthread_equal().

Though I kind of doubt any implementation does use non-scalar pthread_t - it would break too much code.

I thought the GDB output seemed to suggest this wasn't a main thread; but I wasn't exactly ready to believe because I'm not creating any additional Perl threads.

The thread start function for the thread is in libclntsh.so.23.1 rather than in the perl binary, libperl, or threads.so (the loadable object that does perl threads)

bulk88 · 2025-05-28T05:54:32Z

Though I kind of doubt any implementation does use non-scalar pthread_t - it would break too much code.

https://www.ibm.com/docs/en/open-xl-c-cpp-zos/2.1.0?topic=guide-binary-compatibility

Internal type went from char thd [8] to void * I guess.

https://github.com/nginx/unit/blob/master/src/nxt_thread_id.h#L171 good reference.

It seems from my opinion, to say, certain pthread impls, pack a ptr and a 0 based kernel tid or a certain readable or unreadable memory address that the kernel returns as a "thread obj ref", together in 1 void * or into 8 bytes. The bitfield space is gained by extreme alignment of the master pthread struct, at 4096, or 10s of KBs to free up space in the void * for the kernel TID. The platform/CPU/OS might be lacking a dedicated TLS register, and having to ask the kernel each time for an abstract TID, and somehow convert the abstract number back into their large pthread internal C struct.

ancientwizard · 2025-05-31T19:48:54Z

I have tested the patch placed in "bleed" (mentioned above) that forces Perl "WOOL' on to the non-Perl thread and then allows it to continue to handle the interrupt. I've also tested an alternative (though a real Perl internals developer should make it conform to Perl standards); Both work well to eliminate the SEGV when a non-Perl worker thread is called upon to enter Perl land to handle an interrupt. For the time being I'll apply said patch to any Perl I build within my organization as I can't use bleed there.

The alternative is shown here: perl5-dbi/DBD-Oracle#192

tonycoz · 2025-06-01T01:45:27Z

The patch you show as applied to blead in perl5-dbi/DBD-Oracle#192 isn't what was applied to blead. blead (5.42 to be) currently uses pthread_kill() to forward the signal to the main thread.

ancientwizard · 2025-06-01T03:05:24Z

The example I used was something I made up (the alternative) simply to convey the intent not the final code. I'll have another look and if there is a bleed fix that passes the interrupt along that's a better solution and why I offered the example SU-DO code.

As for the the original approach that someone claimed to put in bleed is on them. I saw the code and the comment they made and I tested it. I don't really know anything about bleed. I just tried to connect the dots so others could follow If they wished.

ancientwizard added the Needs Triage label May 25, 2025

ancientwizard mentioned this issue May 26, 2025

Test cleanups & New SEGV forking test reproduction perl5-dbi/DBD-Oracle#193

Merged

ancientwizard mentioned this issue May 26, 2025

Segfault when used within a forking application perl5-dbi/DBD-Oracle#179

Open

jkeenan removed the Needs Triage label May 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Perl SEGV #23326

Perl SEGV #23326

ancientwizard commented May 25, 2025 •

edited

Loading

tonycoz commented May 25, 2025

Uh oh!

ancientwizard commented May 26, 2025

Uh oh!

tonycoz commented May 26, 2025

Uh oh!

ancientwizard commented May 26, 2025

Uh oh!

xenu commented May 26, 2025

Uh oh!

bulk88 commented May 26, 2025 •

edited

Loading

Uh oh!

bulk88 commented May 26, 2025

Uh oh!

jkeenan commented May 26, 2025

Uh oh!

ancientwizard commented May 26, 2025 •

edited

Loading

Uh oh!

tonycoz commented May 26, 2025 •

edited

Loading

Uh oh!

tonycoz commented May 26, 2025

Uh oh!

bulk88 commented May 28, 2025

Uh oh!

ancientwizard commented May 31, 2025

Uh oh!

tonycoz commented Jun 1, 2025

Uh oh!

ancientwizard commented Jun 1, 2025

Uh oh!

Perl SEGV #23326

Perl SEGV #23326

Comments

ancientwizard commented May 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

tonycoz commented May 25, 2025

Uh oh!

ancientwizard commented May 26, 2025

Uh oh!

tonycoz commented May 26, 2025

Uh oh!

ancientwizard commented May 26, 2025

Uh oh!

xenu commented May 26, 2025

Uh oh!

bulk88 commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bulk88 commented May 26, 2025

Uh oh!

jkeenan commented May 26, 2025

Uh oh!

ancientwizard commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tonycoz commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tonycoz commented May 26, 2025

Uh oh!

bulk88 commented May 28, 2025

Uh oh!

ancientwizard commented May 31, 2025

Uh oh!

tonycoz commented Jun 1, 2025

Uh oh!

ancientwizard commented Jun 1, 2025

Uh oh!

ancientwizard commented May 25, 2025 •

edited

Loading

bulk88 commented May 26, 2025 •

edited

Loading

ancientwizard commented May 26, 2025 •

edited

Loading

tonycoz commented May 26, 2025 •

edited

Loading