6

This seems like a weird problem, and it's causing my some heartburn, because i'm using a library that stashes the current locale, and tries to set it back to what it stashed.

$ docker run --rm -it python:3.6 bash root@bcee8785c2e1:/# locale LANG=C.UTF-8 LANGUAGE= LC_CTYPE="C.UTF-8" LC_NUMERIC="C.UTF-8" LC_TIME="C.UTF-8" LC_COLLATE="C.UTF-8" LC_MONETARY="C.UTF-8" LC_MESSAGES="C.UTF-8" LC_PAPER="C.UTF-8" LC_NAME="C.UTF-8" LC_ADDRESS="C.UTF-8" LC_TELEPHONE="C.UTF-8" LC_MEASUREMENT="C.UTF-8" LC_IDENTIFICATION="C.UTF-8" LC_ALL= 
root@bcee8785c2e1:/# locale -a C C.UTF-8 POSIX 
root@bcee8785c2e1:/# python Python 3.6.9 (default, Jul 13 2019, 14:51:44) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import locale >>> curr = locale.getlocale() >>> curr ('en_US', 'UTF-8') >>> locale.setlocale(locale.LC_ALL, curr) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.6/locale.py", line 598, in setlocale return _setlocale(category, locale) locale.Error: unsupported locale setting >>> 

I'm not sure why getlocale is returning en_US? It's not anywhere in my environment vars (and I'm not sure where else it could be in my shell?).

In any case, I can't setlocale with the value from getlocale, which seems weird to me.

Does anyone have any guidance here?

Much appreciated!

2
  • I don't think C.UTF-8 is a valid locale. C (synonym POSIX) is and is intended to be strictly a byte=char fall back [On phone so can't check] Commented Jan 7, 2020 at 14:07
  • It's a bit more complicated😈 Followup of above comment in answer below Commented Jan 10, 2020 at 7:25

2 Answers 2

1

For the first part: Does it matter? As far I know, I never see differences until you call setlocale(), so we are on the second part:

You should use:

import locale curr = locale.getdefaultlocale() locale.setlocale(locale.LC_ALL, curr) 

so getdefaultlocale() and not just getlocale(). I also do not fully understand the reason to have both. Is it possible that it is a Python bug that fail to recognize C.xxx.

Sign up to request clarification or add additional context in comments.

4 Comments

It's crazy, same result : ('en_US', 'UTF-8') ... and same error...
But it doesn't crash. Note: I'm not sure if there are testable differences (with outputs), of just C.utf-8 includes en-US.utf-8. [Python 3.7.3 here, with ` LC_ALL='C.utf8' python3 /tmp/b.py`]
Well whether the bug is python, debian, docker or libc is arguable. C.UTF-8 sure is a bug-magnet — see my answer below, particularly the Haskell and Redhat bug-reports.
@Rusi: but your answer is not an answer. C.UTF-8 may be bad (but I looked the std and it should be correct, also because the two parts are different: one about how program should prepare things [user dependent], the second how to display [terminal dependent]. But the problem here is that the OP used the wrong function: he asked a string for locale which cannot be used to set locale. The string is an interpretation of locale string (useful for other purpose), but it is not a system locale, so it cannot be used safely. C.UTF-8 is one case, but there are much more [locale semantic is complex]
1

C.UTF-8 — A recent non-portable debianism

The intention of C.UTF-8 is good but the implementation not quite yet. For now avoid till it stabilizes.

Some discussion of context

A redhat discussion around including it. Which means it's not quite there (at time of writing at least). Note particularly, Nick Coghlan, a core python-dev, suggests that python doesn't get locales right in some contexts like this one.

A haskell discussion showing that portable cross-platform stuff — in this case haskell-stack but by implication also docker — becomes harder and less reliable with C.UTF-8 usage.

The Intention

636086" rel="nofollow noreferrer">Debian (also) initiated C.UTF-8 and the intention is correct.

Today's Linux systems are intensively localized — a slew of locales, fine-grained choice of LC_* choices etc etc. But all this is not on by default: if the locale system is broken the system is broken. The reason a broken locale-system is not as drastic in effects as say a broken kernel or fstab or grub etc is...

The C locale

The C locale (synonym POSIX) is guaranteed to always be available as a fallback if other things break. So for example you won't see localized errors but English — not mojibake or empty rectangles or question-marks!

By and large you get these kind of warnings not errors and otherwise things keep working.

But C = POSIX implies the legacy ASCII not UTF-8 everywhere — an undesired side-effect of legacy.

Towards making that legacy less and less necessary even as a fallback, Debian introduced the always available C.UTF-8 locale.

The catch? It's always available...

Only in Debian

Which means recent Debian, derivatives like Ubuntu also recent. But not (yet) other systems.

In short C.UTF-8 is not universal, not portable, fragile and therefore avoidable... at least for now, at least on client-server, virtualized (containerized) etc systems like docker. The....

Practical Upshot

You need to explicitly install old-fashioned locales like en_US.UTF-8. (People wanting a reasonable international English locale and not wanting en_US may wish to check out en_DK.UTF-8).

Yeah that involves some amount of

Getting your hands dirty

Here is a collection of references on docker oriented locale setup

I don't approve of one anti-pattern that repeats in the above but It's going too far afield (from this question) to expand on this, so in v short:

Setting locale should usually only involve setting LANG. Setting LC_ALL , especially along with LANG is a no-no.

From Debian wiki

⚠️ WARNING

Using LC_ALL is strongly discouraged as it overrides everything. Please use it only when testing and never set it in a startup file.

2 Comments

en_DK Is a crazy aberration which would perhaps be defensible if it was available everywhere by default; but, alas, it is not. Also IIRC it has less than ideal values for dates and currency. Perhaps see also unix.stackexchange.com/questions/62316/…
@tripleee I've weakened the en_DK reference. Can remove it if you prefer (It's hardly relevant to the q or a!)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.