11

I have found a number of questions on various forums where Mac users complain about locale errors when they log in to Linux systems over SSH which complain that the LC_CTYPE=UTF-8 setting is incorrect.

In some more detail, the shell on MacOS seems to set this value, and then (if you have the option enabled in Terminal, or etc) your local LC_* variables get exported to the remote system when you SSH in.

Linux insists that LC_CTYPE needs to be set to a valid locale (sometimes you can fix this with localegen as admin on the Linux system) but UTF-8 is not a locale in the first place.

My primary question is, is this a bug in MacOS? Or is Linux wrong in insisting that the variable needs to be set to a fully specified locale name?

Secondarily, in order to be able to argue which one is correct and why, where is this specified?

Tertiarily, is there something these Mac users (myself included) could or should do differently?

The obvious workaround is to put something like

LC_CTYPE=en_US.UTF-8 

in your .bash_profile, but this obviously only solves it for your personal account, and hardcodes a value which may or may not agree with your other locale settings.

3

2 Answers 2

9

I didn't get into the details of who's "right or wrong" - but was equally annoyed by the issue. Some solutions to this:

  • Server-side:
    • change/disable AcceptEnv LC_* in /etc/ssh/sshd
      • cons: it sets them to the system-default
    • edit .profile
      • cons: single user
    • edit /etc/bash* or /etc/profile
      • cons: may be reversed in updates
  • Client-side:
    • alias ssh="LC_CTYPE=\"${LANG}\" ssh" in .bashrc/.profile/whereEver
      • cons: single user
    • same as server-side in .bashrc/.profile...
    • change/add settings in Terminal
      • con: entire session, be it local or remote

So, in the end I ended up creating mac-locale-fix.sh in /etc/profile.d on the server (raspian in my case) with this line in it:

[ "A${LC_CTYPE}" == "AUTF-8" ] && export LC_CTYPE="${LANG}" 

Hope this helps others...

1
  • 1
    Thanks, I'm upvoting this but still looking for authoritative answers. Commented Feb 3, 2021 at 10:25
7
+500

The basic question is

My primary question is, is this a bug in MacOS? Or is Linux wrong in insisting that the variable needs to be set to a fully specified locale name?

and the POSIX page for environment variables shows the reason why others view the macOS configuration as incorrect:

[XSI] If the locale value has the form:

language[_territory][.codeset] 

it refers to an implementation-provided locale, where settings of language, territory, and codeset are implementation-defined.

LC_COLLATE, LC_CTYPE, LC_MESSAGES, LC_MONETARY, LC_NUMERIC, and LC_TIME are defined to accept an additional field @ modifier, which allows the user to select a specific instance of localization data within a single category (for example, for selecting the dictionary as opposed to the character ordering of data). The syntax for these environment variables is thus defined as:

[language[_territory][.codeset][@modifier]] 

For example, if a user wanted to interact with the system in French, but required to sort German text files, LANG and LC_COLLATE could be defined as:

LANG=Fr_FR LC_COLLATE=De_DE 

This could be extended to select dictionary collation (say) by use of the @ modifier field; for example:

LC_COLLATE=De_DE@dict 

An implementation may support other formats.

If the locale value is not recognized by the implementation, the behavior is unspecified.

That is, they assume that POSIX prescribes a syntax for the locale settings. An unwary reader would assume that POSIX defines the permissible forms for the environment variables so that the codeset value is optional, and not act as a replacement for the language. But that last "may" opens up a can of worms, in effect blessing this difference in interpretation. Apple can do whatever it wants, if it wants to provide valid locales which don't follow that pattern exactly.

@tripleee suggested that the page on Locale gives better information, but that is almost entirely a discussion of the locale definitions rather than providing guidance for interoperability (i.e., POSIX's ostensible goal).

Neither page addresses differences in the available locale settings (such as ".utf8" versus ".UTF-8"). Those are implementation-dependent, as noted on the POSIX page. That leaves users with the sole solution being to determine for themselves what locale settings are supported on the local and remote systems, and (ssh behavior here) determine how to set those on the remote system "compatibly".

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.