Skip to content

Commit e9bc6d6

Browse files
committed
Add thread-safe locale handling
This (large) commit allows locales to be used in threaded perls on platforms that support it. This includes recent Windows and Posix 2008 ones.
1 parent ddd5ebe commit e9bc6d6

File tree

21 files changed

+1377
-108
lines changed

21 files changed

+1377
-108
lines changed

dist/ExtUtils-ParseXS/lib/perlxs.pod

Lines changed: 102 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2195,7 +2195,7 @@ To summarize, here's what to expect and how to handle locales in XS code:
21952195
=item Non-locale-aware XS code
21962196

21972197
Keep in mind that even if you think your code is not locale-aware, it
2198-
may call a C library function that is. Hopefully the man page for such
2198+
may call a library function that is. Hopefully the man page for such
21992199
a function will indicate that dependency, but the documentation is
22002200
imperfect.
22012201

@@ -2231,9 +2231,107 @@ L<perlapi/STORE_LC_NUMERIC_FORCE_TO_UNDERLYING>, and
22312231
L<perlapi/RESTORE_LC_NUMERIC> should be used to affect any needed
22322232
change.
22332233

2234-
However, some alien libraries that may be called do set it, such as
2235-
C<Gtk>. This can cause problems for the perl core and other modules.
2236-
Starting in v5.20.1, calling the function
2234+
But, starting with Perl v5.28, locales are thread-safe on platforms that
2235+
support this functionality. Windows has this starting with Visual
2236+
Studio 2005. Many other modern platforms support the thread-safe POSIX
2237+
2008 functions. The C C<#define> C<USE_THREAD_SAFE_LOCALE> will be
2238+
defined iff this build is using these. From Perl-space, the read-only
2239+
variable C<${SAFE_LOCALES}> is 1 if either the build is not threaded, or
2240+
if C<USE_THREAD_SAFE_LOCALE> is defined; otherwise it is 0.
2241+
2242+
The way this works under-the-hood is that every thread has a choice of
2243+
using a locale specific to it (this is the Windows and POSIX 2008
2244+
functionality), or the global locale that is accessible to all threads
2245+
(this is the functionality that has always been there). The
2246+
implementations for Windows and POSIX are completely different. On
2247+
Windows, the runtime can be set up so that the standard
2248+
L<C<setlocale(3)>> function either only knows about the global locale or
2249+
the locale for this thread. On POSIX, C<setlocale> always deals with
2250+
the global locale, and other functions have been created to handle
2251+
per-thread locales. Perl makes this transparent to perl-space code. It
2252+
continues to use C<POSIX::setlocale()>, and the interpreter translates
2253+
that into the per-thread functions.
2254+
2255+
All other locale-senstive functions automatically use the per-thread
2256+
locale, if that is turned on, and failing that, the global locale. Thus
2257+
calls to C<setlocale> are ineffective on POSIX systems for the current
2258+
thread if that thread is using a per-thread locale. If perl is compiled
2259+
for single-thread operation, it does not use the per-thread functions,
2260+
so C<setlocale> does work as expected.
2261+
2262+
If you have loaded the L<C<POSIX>> module you can use the methods given
2263+
in L<perlcall> to call L<C<POSIX::setlocale>|POSIX/setlocale> to safely
2264+
change or query the locale (on systems where it is safe to do so), or
2265+
you can use the new 5.28 function L<perlapi/Perl_setlocale> instead,
2266+
which is a drop-in replacement for the system L<C<setlocale(3)>>, and
2267+
handles single-threaded and multi-threaded applications transparently.
2268+
2269+
There are some locale-related library calls that still aren't
2270+
thread-safe because they return data in a buffer global to all threads.
2271+
In the past, these didn't matter as locales weren't thread-safe at all.
2272+
But now you have to be aware of them in case your module is called in a
2273+
multi-threaded application. The known ones are
2274+
2275+
asctime()
2276+
ctime()
2277+
gcvt() [POSIX.1-2001 only (function removed in POSIX.1-2008)]
2278+
getdate()
2279+
wcrtomb() if its final argument is NULL
2280+
wcsrtombs() if its final argument is NULL
2281+
wcstombs()
2282+
wctomb()
2283+
2284+
Some of these shouldn't really be called in a Perl application, and for
2285+
others there are thread-safe versions of these already implemented:
2286+
2287+
asctime_r()
2288+
ctime_r()
2289+
Perl_langinfo()
2290+
2291+
The C<_r> forms are automatically used, starting in Perl 5.28, if you
2292+
compile your code, with
2293+
2294+
#define PERL_REENTRANT
2295+
2296+
See also L<perlapi/Perl_langinfo>.
2297+
You can use the methods given in L<perlcall>, to get the best available
2298+
locale-safe versions of these
2299+
2300+
POSIX::localeconv()
2301+
POSIX::wcstombs()
2302+
POSIX::wctomb()
2303+
2304+
And note, that some items returned by C<Localeconv> are available
2305+
through L<perlapi/Perl_langinfo>.
2306+
2307+
The others shouldn't be used in a threaded application.
2308+
2309+
Some modules may call a non-perl library that is locale-aware. This is
2310+
fine as long as it doesn't try to query or change the locale using the
2311+
system C<setlocale>. But if these do call the system C<setlocale>,
2312+
those calls may be ineffective. Instead,
2313+
L<C<Perl_setlocale>|perlapi/Perl_setlocale> works in all circumstances.
2314+
Plain setlocale is ineffective on multi-threaded POSIX 2008 systems. It
2315+
operates only on the global locale, whereas each thread has its own
2316+
locale, paying no attention to the global one. Since converting
2317+
these non-Perl libraries to C<Perl_setlocale> is out of the question,
2318+
there is a new function in v5.28
2319+
C<switch_to_global_locale> that will
2320+
switch the thread it is called from so that any system C<setlocale>
2321+
calls will have their desired effect. The function
2322+
L<C<sync_locale>|perlapi/sync_locale> must be called before returning to
2323+
perl.
2324+
2325+
This thread can change the locale all it wants and it won't affect any
2326+
other thread, except any that also have been switched to the global
2327+
locale. This means that a multi-threaded application can have a single
2328+
thread using an alien library without a problem; but no more than a
2329+
single thread can be so-occupied. Bad results likely will happen.
2330+
2331+
In perls without multi-thread locale support, some alien libraries,
2332+
such as C<Gtk> change locales. This can cause problems for the Perl
2333+
core and other modules. For these, before control is returned to
2334+
perl, starting in v5.20.1, calling the function
22372335
L<sync_locale()|perlapi/sync_locale> from XS should be sufficient to
22382336
avoid most of these problems. Prior to this, you need a pure Perl
22392337
statement that does this:

dist/threads/lib/threads.pm

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ use 5.008;
55
use strict;
66
use warnings;
77

8-
our $VERSION = '2.21'; # remember to update version in POD!
8+
our $VERSION = '2.22'; # remember to update version in POD!
99
my $XS_VERSION = $VERSION;
1010
$VERSION = eval $VERSION;
1111

@@ -937,6 +937,33 @@ C<chdir()>) will affect all the threads in the application.
937937
On MSWin32, each thread maintains its own the current working directory
938938
setting.
939939
940+
=item Locales
941+
942+
Prior to Perl 5.28, locales could not be used with threads, due to various
943+
race conditions. Starting in that release, on systems that implement
944+
thread-safe locale functions, threads can be used, with some caveats.
945+
This includes Windows starting with Visual Studio 2005, and systems compatible
946+
with POSIX 2008. See L<perllocale/Multi-threaded operation>.
947+
948+
Each thread (except the main thread) is started using the C locale. The main
949+
thread is started like all other Perl programs; see L<perllocale/ENVIRONMENT>.
950+
You can switch locales in any thread as often as you like.
951+
952+
If you want to inherit the parent thread's locale, you can, in the parent, set
953+
a variable like so:
954+
955+
$foo = POSIX::setlocale(LC_ALL, NULL);
956+
957+
and then pass to threads->create() a sub that closes over C<$foo>. Then, in
958+
the child, you say
959+
960+
POSIX::setlocale(LC_ALL, $foo);
961+
962+
Or you can use the facilities in L<threads::shared> to pass C<$foo>;
963+
or if the environment hasn't changed, in the child, do
964+
965+
POSIX::setlocale(LC_ALL, "");
966+
940967
=item Environment variables
941968
942969
Currently, on all platforms except MSWin32, all I<system> calls (e.g., using

dist/threads/threads.xs

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -580,6 +580,8 @@ S_ithread_run(void * arg)
580580
S_set_sigmask(&thread->initial_sigmask);
581581
#endif
582582

583+
thread_locale_init();
584+
583585
PL_perl_destruct_level = 2;
584586

585587
{
@@ -665,6 +667,8 @@ S_ithread_run(void * arg)
665667
MUTEX_UNLOCK(&thread->mutex);
666668
MUTEX_UNLOCK(&MY_POOL.create_destruct_mutex);
667669

670+
thread_locale_term();
671+
668672
/* Exit application if required */
669673
if (exit_app) {
670674
(void)S_jmpenv_run(aTHX_ 2, thread, NULL, &exit_app, &exit_code);

embed.fnc

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1310,6 +1310,8 @@ Xp |void |set_numeric_underlying
13101310
Xp |void |set_numeric_standard
13111311
Xp |bool |_is_in_locale_category|const bool compiling|const int category
13121312
Apd |void |sync_locale
1313+
ApMn |void |thread_locale_init
1314+
ApMn |void |thread_locale_term
13131315
ApdO |void |require_pv |NN const char* pv
13141316
Apd |void |pack_cat |NN SV *cat|NN const char *pat|NN const char *patend \
13151317
|NN SV **beglist|NN SV **endlist|NN SV ***next_in_list|U32 flags
@@ -2796,6 +2798,12 @@ s |void |new_collate |NULLOK const char* newcoll
27962798
s |void |new_ctype |NN const char* newctype
27972799
s |void |set_numeric_radix|const bool use_locale
27982800
s |void |new_numeric |NULLOK const char* newnum
2801+
# ifdef USE_POSIX_2008_LOCALE
2802+
sn |const char*|emulate_setlocale|const int category \
2803+
|NULLOK const char* locale \
2804+
|unsigned int index \
2805+
|const bool is_index_valid
2806+
# endif
27992807
# ifdef WIN32
28002808
s |char* |win32_setlocale|int category|NULLOK const char* locale
28012809
# endif

embed.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -717,6 +717,8 @@
717717
#define sync_locale() Perl_sync_locale(aTHX)
718718
#define taint_env() Perl_taint_env(aTHX)
719719
#define taint_proper(a,b) Perl_taint_proper(aTHX_ a,b)
720+
#define thread_locale_init Perl_thread_locale_init
721+
#define thread_locale_term Perl_thread_locale_term
720722
#define to_uni_lower(a,b,c) Perl_to_uni_lower(aTHX_ a,b,c)
721723
#define to_uni_lower_lc(a) Perl_to_uni_lower_lc(aTHX_ a)
722724
#define to_uni_title(a,b,c) Perl_to_uni_title(aTHX_ a,b,c)
@@ -1635,6 +1637,9 @@
16351637
#define new_numeric(a) S_new_numeric(aTHX_ a)
16361638
#define set_numeric_radix(a) S_set_numeric_radix(aTHX_ a)
16371639
#define stdize_locale(a) S_stdize_locale(aTHX_ a)
1640+
# if defined(USE_POSIX_2008_LOCALE)
1641+
#define emulate_setlocale S_emulate_setlocale
1642+
# endif
16381643
# if defined(WIN32)
16391644
#define win32_setlocale(a,b) S_win32_setlocale(aTHX_ a,b)
16401645
# endif

embedvar.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,7 @@
106106
#define PL_cryptseen (vTHX->Icryptseen)
107107
#define PL_curcop (vTHX->Icurcop)
108108
#define PL_curcopdb (vTHX->Icurcopdb)
109+
#define PL_curlocales (vTHX->Icurlocales)
109110
#define PL_curpad (vTHX->Icurpad)
110111
#define PL_curpm (vTHX->Icurpm)
111112
#define PL_curpm_under (vTHX->Icurpm_under)

ext/POSIX/lib/POSIX.pod

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -939,6 +939,7 @@ containing the current underlying locale's formatting values. Users of this fun
939939
should also read L<perllocale>, which provides a comprehensive
940940
discussion of Perl locale handling, including
941941
L<a section devoted to this function|perllocale/The localeconv function>.
942+
Prior to Perl 5.28, or when operating in a non thread-safe environment,
942943
It should not be used in a threaded application unless it's certain that
943944
the underlying locale is C or POSIX. This is because it otherwise
944945
changes the locale, which globally affects all threads simultaneously.

intrpvar.h

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -576,7 +576,15 @@ PERLVAR(I, constpadix, PADOFFSET) /* lowest unused for constants */
576576

577577
PERLVAR(I, padix_floor, PADOFFSET) /* how low may inner block reset padix */
578578

579+
#if defined(USE_POSIX_2008_LOCALE) \
580+
&& defined(USE_THREAD_SAFE_LOCALE) \
581+
&& ! defined(HAS_QUERYLOCALE)
582+
583+
PERLVARA(I, curlocales, 12, char *)
584+
585+
#endif
579586
#ifdef USE_LOCALE_COLLATE
587+
580588
PERLVAR(I, collation_name, char *) /* Name of current collation */
581589
PERLVAR(I, collxfrm_base, Size_t) /* Basic overhead in *xfrm() */
582590
PERLVARI(I, collxfrm_mult,Size_t, 2) /* Expansion factor in *xfrm() */

0 commit comments

Comments
 (0)