Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 64 additions & 18 deletions Doc/using/cmdline.rst
Original file line number Diff line number Diff line change
Expand Up @@ -806,23 +806,23 @@ conflict.

.. envvar:: PYTHONCOERCECLOCALE

If set to the value ``0``, causes the main Python command line application
to skip coercing the legacy ASCII-based C and POSIX locales to a more
capable UTF-8 based alternative.

If this variable is *not* set (or is set to a value other than ``0``), the
``LC_ALL`` locale override environment variable is also not set, and the
current locale reported for the ``LC_CTYPE`` category is either the default
``C`` locale, or else the explicitly ASCII-based ``POSIX`` locale, then the
Python CLI will attempt to configure the following locales for the
``LC_CTYPE`` category in the order listed before loading the interpreter
runtime:
If set to the value ``0`` (encoded as ASCII bytes), causes the main Python
command line application to skip coercing the legacy ASCII-based C and
POSIX locales to a more capable UTF-8 based alternative. Locale coercion
is also skipped if the ``LC_ALL`` locale override environment variable is
set (as setting ``LC_CTYPE`` will have no effect in that case).

Otherwise, when the current locale reported for the ``LC_CTYPE`` category is
either the default ``C`` locale, or else the explicitly ASCII-based ``POSIX``
locale, then the Python CLI will attempt to configure the following locales
for the ``LC_CTYPE`` category in the order listed before beginning to
initalise the interpreter runtime:

* ``C.UTF-8``
* ``C.utf8``
* ``UTF-8``

If setting one of these locale categories succeeds, then the ``LC_CTYPE``
If enabling one of these locale categories succeeds, then the ``LC_CTYPE``
environment variable will also be set accordingly in the current process
environment before the Python runtime is initialized. This ensures that in
addition to being seen by both the interpreter itself and other locale-aware
Expand All @@ -842,18 +842,64 @@ conflict.
For debugging purposes, setting ``PYTHONCOERCECLOCALE=warn`` will cause
Python to emit warning messages on ``stderr`` if either the locale coercion
activates, or else if a locale that *would* have triggered coercion is
still active when the Python runtime is initialized.
still active when the Python runtime is initialized. (Note: as with the
locale coercion disabling marker, both the environment variable name and
the ``warn`` value must be encoded as ASCII text in order to have the
described effect)

Also note that even when locale coercion is disabled, or when it fails to
find a suitable target locale, :envvar:`PYTHONUTF8` will still activate by
default in legacy ASCII-based locales. Both features must be disabled in
order to force the interpreter to use ``ASCII`` instead of ``UTF-8`` for
system interfaces.
find a suitable target locale, UTF-8 mode (as described under
:envvar:`PYTHONUTF8`) will still activate by default in these legacy
ASCII-based locales. Both features must be explicitly disabled in order to
force the interpreter to use ``ASCII`` instead of ``UTF-8`` for system
interfaces.

.. availability:: \*nix.

.. note::

While UTF-8 mode is able to handle many of the issues that otherwise
arise, running in the legacy C locale is `not formally supported`_, as it
almost inevitably leads to text encoding and decoding problems at either
operating system interfaces or else between the interpreter runtime and
locale-aware extension modules. As such, the ability to disable locale
coercion or emit warnings when it occurs is provided solely as a debugging
aid when behavioural differences are encountered across different platforms
(where some platforms may not define a suitable coercion target), or
embedding applications (where some applications, including older versions
of CPython, will manage the locale differently from the way the CPython
3.7+ CLI does).

.. _not formally supported: https://www.python.org/dev/peps/pep-0011/#legacy-c-locale

.. note::

As a locale configuration variable, ``PYTHONCOERCECLOCALE`` is processed
in a way that is similar to the C level locale configuration variables
(``LANG``, ``LC_CTYPE``, and ``LC_ALL``): to have any effect, both the
key and the value must be encoded as ASCII bytes, and the setting is
processed even when :option:`-E` or :option:`-I` option is passed on the
command line.

.. versionadded:: 3.7
See :pep:`538` for more details.
See :pep:`538` for more details. Note that the 3.7 implementation differs
from the PEP in a few key aspects due to some unintended interactions
with the :pep:`540` implementation: 1) ``LC_ALL=C`` must be used to
disable locale coercion when the :option:`-E` or :option:`-I` option is
specified (``PYTHONCOERCECLOCALE=0`` won't work); 2) calling
:c:func:`Py_Initialize` in an embedding application will trigger locale
coercion when running in the C locale; 2) calling :c:func:`Py_Main` in
an embedding application will trigger locale coercion when running in the
C locale.

.. versionchanged:: 3.8
Locale coercion is now triggered by the POSIX locale, even on platforms
where that isn't a simple alias for the C locale.
The locale coercion implementation has also been brought back into line
with the approach described in :pep:`538`: calling :c:func:`Py_Initialize`
and :c:func:`Py_Main` will never implicitly trigger locale coercion, and
``PYTHONCOERCECLOCALE=0`` and ``PYTHONCOERCECLOCALE=warn`` take effect
even when the :option:`-E` or :option:`-I` option is specified.


.. envvar:: PYTHONDEVMODE
Expand Down
5 changes: 2 additions & 3 deletions Include/coreconfig.h
Original file line number Diff line number Diff line change
Expand Up @@ -63,8 +63,7 @@ typedef struct {
int show_alloc_count; /* -X showalloccount */
int dump_refs; /* PYTHONDUMPREFS */
int malloc_stats; /* PYTHONMALLOCSTATS */
int coerce_c_locale; /* PYTHONCOERCECLOCALE, -1 means unknown */
int coerce_c_locale_warn; /* PYTHONCOERCECLOCALE=warn */
int warn_on_c_locale; /* PYTHONCOERCECLOCALE=warn */

/* Python filesystem encoding and error handler:
sys.getfilesystemencoding() and sys.getfilesystemencodeerrors().
Expand Down Expand Up @@ -316,7 +315,7 @@ typedef struct {
.use_hash_seed = -1, \
.faulthandler = -1, \
.tracemalloc = -1, \
.coerce_c_locale = -1, \
.warn_on_c_locale = -1, \
.utf8_mode = -1, \
.argc = -1, \
.nmodule_search_path = -1, \
Expand Down
5 changes: 4 additions & 1 deletion Include/cpython/pylifecycle.h
Original file line number Diff line number Diff line change
Expand Up @@ -90,10 +90,13 @@ PyAPI_FUNC(int) _PyOS_URandom(void *buffer, Py_ssize_t size);
PyAPI_FUNC(int) _PyOS_URandomNonblock(void *buffer, Py_ssize_t size);

/* Legacy locale support */
PyAPI_FUNC(void) _Py_CoerceLegacyLocale(int warn);
PyAPI_FUNC(int) _Py_LegacyLocaleDetected(void);
PyAPI_FUNC(int) _Py_LegacyLocaleCoercionEnabled(int argc, char **argv);
PyAPI_FUNC(int) _Py_CoerceLegacyLocale(const char **coercion_target,
const char **coercion_warning);
PyAPI_FUNC(char *) _Py_SetLocaleFromEnv(int category);


#ifdef __cplusplus
}
#endif
74 changes: 52 additions & 22 deletions Lib/test/test_c_locale_coercion.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
)

# Set the list of ways we expect to be able to ask for the "C" locale
EXPECTED_C_LOCALE_EQUIVALENTS = ["C", "invalid.ascii"]
EXPECTED_C_LOCALE_EQUIVALENTS = ["C", "invalid.ascii", "POSIX"]

# Set our expectation for the default encoding used in the C locale
# for the filesystem encoding and the standard streams
Expand All @@ -34,12 +34,6 @@
# Android defaults to using UTF-8 for all system interfaces
EXPECTED_C_LOCALE_STREAM_ENCODING = "utf-8"
EXPECTED_C_LOCALE_FS_ENCODING = "utf-8"
else:
# Linux distros typically alias the POSIX locale directly to the C
# locale.
# TODO: Once https://bugs.python.org/issue30672 is addressed, we'll be
# able to check this case unconditionally
EXPECTED_C_LOCALE_EQUIVALENTS.append("POSIX")
elif sys.platform.startswith("aix"):
# AIX uses iso8859-1 in the C locale, other *nix platforms use ASCII
EXPECTED_C_LOCALE_STREAM_ENCODING = "iso8859-1"
Expand All @@ -56,13 +50,14 @@
# Note that the above expectations are still wrong in some cases, such as:
# * Windows when PYTHONLEGACYWINDOWSFSENCODING is set
# * Any platform other than AIX that uses latin-1 in the C locale
# * Any Linux distro where POSIX isn't a simple alias for the C locale
# * Any Linux distro where the default locale is something other than "C"
#
# Options for dealing with this:
# * Don't set the PY_COERCE_C_LOCALE preprocessor definition on
# such platforms (e.g. it isn't set on Windows)
# * Fix the test expectations to match the actual platform behaviour
# * Change the tests to be simply "Don't crash" tests when run on systems
# where we're less certain about the expected locale coercion behaviour

# In order to get the warning messages to match up as expected, the candidate
# order here must much the target locale order in Python/pylifecycle.c
Expand All @@ -89,7 +84,7 @@ def _set_locale_in_subprocess(locale_name):
# If there's no valid CODESET, we expect coercion to be skipped
cmd_fmt += "; import sys; sys.exit(not locale.nl_langinfo(locale.CODESET))"
cmd = cmd_fmt.format(locale_name)
result, py_cmd = run_python_until_end("-c", cmd, PYTHONCOERCECLOCALE='')
result, py_cmd = run_python_until_end("-c", cmd, __isolated=True)
return result.rc == 0


Expand Down Expand Up @@ -142,7 +137,7 @@ def _handle_output_variations(data):
return data

@classmethod
def get_child_details(cls, env_vars):
def get_child_details(cls, env_vars, isolated):
"""Retrieves fsencoding and standard stream details from a child process

Returns (encoding_details, stderr_lines):
Expand All @@ -155,7 +150,7 @@ def get_child_details(cls, env_vars):
"""
result, py_cmd = run_python_until_end(
"-X", "utf8=0", "-c", cls.CHILD_PROCESS_SCRIPT,
**env_vars
__isolated=isolated, **env_vars
)
if not result.rc == 0:
result.fail(py_cmd)
Expand All @@ -169,15 +164,15 @@ def get_child_details(cls, env_vars):

# Details of the shared library warning emitted at runtime
LEGACY_LOCALE_WARNING = (
"Python runtime initialized with LC_CTYPE=C (a locale with default ASCII "
"encoding), which may cause Unicode compatibility problems. Using C.UTF-8, "
"Python runtime initialized with a legacy locale that uses a default ASCII "
"encoding, which may cause Unicode compatibility problems. Using C.UTF-8, "
"C.utf8, or UTF-8 (if available) as alternative Unicode-compatible "
"locales is recommended."
)

# Details of the CLI locale coercion warning emitted at runtime
CLI_COERCION_WARNING_FMT = (
"Python detected LC_CTYPE=C: LC_CTYPE coerced to {} (set another locale "
"Python detected a legacy locale: LC_CTYPE coerced to {} (set LC_ALL "
"or PYTHONCOERCECLOCALE=0 to disable this locale coercion behavior)."
)

Expand Down Expand Up @@ -224,15 +219,16 @@ def _check_child_encoding_details(self,
expected_fs_encoding,
expected_stream_encoding,
expected_warnings,
coercion_expected):
coercion_expected,
isolated):
"""Check the C locale handling for the given process environment

Parameters:
expected_fs_encoding: expected sys.getfilesystemencoding() result
expected_stream_encoding: expected encoding for standard streams
expected_warning: stderr output to expect (if any)
"""
result = EncodingDetails.get_child_details(env_vars)
result = EncodingDetails.get_child_details(env_vars, isolated=isolated)
encoding_details, stderr_lines = result
expected_details = EncodingDetails.get_expected_details(
coercion_expected,
Expand Down Expand Up @@ -269,7 +265,6 @@ def test_external_target_locale_configuration(self):
"LANG": "",
"LC_CTYPE": "",
"LC_ALL": "",
"PYTHONCOERCECLOCALE": "",
}
for env_var in ("LANG", "LC_CTYPE"):
for locale_to_set in AVAILABLE_TARGETS:
Expand All @@ -287,7 +282,14 @@ def test_external_target_locale_configuration(self):
expected_fs_encoding,
expected_stream_encoding,
expected_warnings=None,
coercion_expected=False)
coercion_expected=False,
isolated=False)
self._check_child_encoding_details(var_dict,
expected_fs_encoding,
expected_stream_encoding,
expected_warnings=None,
coercion_expected=False,
isolated=True)



Expand All @@ -302,6 +304,7 @@ def _check_c_locale_coercion(self,
coerce_c_locale,
expected_warnings=None,
coercion_expected=True,
isolated=False,
**extra_vars):
"""Check the C locale handling for various configurations

Expand Down Expand Up @@ -354,7 +357,8 @@ def _check_c_locale_coercion(self,
fs_encoding,
stream_encoding,
_expected_warnings,
_coercion_expected)
_coercion_expected,
isolated)

# Check behaviour for explicitly configured locales
for locale_to_set in EXPECTED_C_LOCALE_EQUIVALENTS:
Expand All @@ -369,7 +373,8 @@ def _check_c_locale_coercion(self,
fs_encoding,
stream_encoding,
expected_warnings,
coercion_expected)
coercion_expected,
isolated)

def test_PYTHONCOERCECLOCALE_not_set(self):
# This should coerce to the first available target locale by default
Expand All @@ -387,6 +392,18 @@ def test_PYTHONCOERCECLOCALE_set_to_warn(self):
coerce_c_locale="warn",
expected_warnings=[CLI_COERCION_WARNING])

def test_PYTHONCOERCECLOCALE_set_to_warn_when_isolated(self):
# PYTHONCOERCECLOCALE=warn should be ignored by the -I switch
self._check_c_locale_coercion("utf-8", "utf-8",
coerce_c_locale="warn",
isolated=True)
# Setting LC_ALL=C should still render the locale coercion ineffective
self._check_c_locale_coercion(EXPECTED_C_LOCALE_FS_ENCODING,
EXPECTED_C_LOCALE_STREAM_ENCODING,
coerce_c_locale="warn",
LC_ALL="C",
isolated=True,
coercion_expected=False)

def test_PYTHONCOERCECLOCALE_set_to_zero(self):
# The setting "0" should result in the locale coercion being disabled
Expand All @@ -401,6 +418,19 @@ def test_PYTHONCOERCECLOCALE_set_to_zero(self):
LC_ALL="C",
coercion_expected=False)

def test_PYTHONCOERCECLOCALE_set_to_zero_when_isolated(self):
# The setting "0" should be ignored by the -I switch
self._check_c_locale_coercion("utf-8", "utf-8",
coerce_c_locale="0",
isolated=True)
# Setting LC_ALL=C should still render the locale coercion ineffective
self._check_c_locale_coercion(EXPECTED_C_LOCALE_FS_ENCODING,
EXPECTED_C_LOCALE_STREAM_ENCODING,
coerce_c_locale="0",
LC_ALL="C",
isolated=True,
coercion_expected=False)

def test_LC_ALL_set_to_C(self):
# Setting LC_ALL should render the locale coercion ineffective
self._check_c_locale_coercion(EXPECTED_C_LOCALE_FS_ENCODING,
Expand All @@ -421,13 +451,13 @@ def test_PYTHONCOERCECLOCALE_set_to_one(self):
old_loc = locale.setlocale(locale.LC_CTYPE, None)
self.addCleanup(locale.setlocale, locale.LC_CTYPE, old_loc)
loc = locale.setlocale(locale.LC_CTYPE, "")
if loc == "C":
if loc in ("C", "POSIX"):
self.skipTest("test requires LC_CTYPE locale different than C")
if loc in TARGET_LOCALES :
self.skipTest("coerced LC_CTYPE locale: %s" % loc)

# bpo-35336: PYTHONCOERCECLOCALE=1 must not coerce the LC_CTYPE locale
# if it's not equal to "C"
# if it's not equal to "C" or "POSIX"
code = 'import locale; print(locale.setlocale(locale.LC_CTYPE, None))'
env = dict(os.environ, PYTHONCOERCECLOCALE='1')
cmd = subprocess.run([sys.executable, '-c', code],
Expand Down
13 changes: 11 additions & 2 deletions Lib/test/test_embed.py
Original file line number Diff line number Diff line change
Expand Up @@ -287,8 +287,7 @@ class InitConfigTests(EmbeddingTestsMixin, unittest.TestCase):
'filesystem_errors': GET_DEFAULT_CONFIG,

'utf8_mode': 0,
'coerce_c_locale': 0,
'coerce_c_locale_warn': 0,
'warn_on_c_locale': 0,

'pycache_prefix': None,
'program_name': './_testembed',
Expand Down Expand Up @@ -597,6 +596,16 @@ def test_init_isolated(self):
}
self.check_config("init_isolated", config)

class NoImplicitLocaleCoercion(EmbeddingTestsMixin, unittest.TestCase):

def test_no_implicit_locale_coercion_in_Py_Initialize(self):
out, err = self.run_embedded_interpreter("init_api_does_not_coerce_c_locale")
self.assertEqual(err, '')

def test_no_implicit_locale_coercion_in_Py_Main(self):
out, err = self.run_embedded_interpreter("main_api_does_not_coerce_c_locale")
self.assertTrue(out.startswith('Python '))
self.assertEqual(err, '')

if __name__ == "__main__":
unittest.main()
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
The locale coercion implementation has been adjusted back to being closer to
the design documented in PEP 538. This means locale coercion is once again
solely the responsibility of the application embedding the CPython runtime
(in CPython's case, the CLI wrapper), with the runtime only providing
support to detect legacy locales, actually apply the changes needed to
coerce the locale to a UTF-8 based one, and to emit warnings when requested
via PYTHONCOERCECLOCALE=warn.
Loading