Python3.3: .format() with unicode format_spec

Question

I have datetime object and my users provide their own format string to format the time in the way they like.

One way I find is to use '{:...}'.format(mydatetime).

lt = time.localtime(time.time()) d = datetime. datetime.fromtimestamp(time.mktime(lt)) print(userString.format(datetime=d))

English users may provide '{datetime:%B %d, %Y}', which formats to December 24, 2013.

Chinese users may provide '{datetime:%Y年%m月%d日}' (in YYYYMMDD format, 年=Year, 月=Month, 日=Day).

But when executing '{datetime:%Y年%m月%d日}'.format(datetime=d), Python raises UnicodeEncodingError: 'locale' codec can't encode character '\u5e74' in position 2: Illegal byte sequence

I know there is a workaround that I can tell my Chinese users to give format string like '{datetime:%Y}年{datime:%m}月{datetime:%d}日', but cannot unicode character show in format_spec? How to solve this problem?

I'm using Windows.

Thanks

What is the output of import sys; sys.getdefaultencoding()? — Simeon Visser
– Simeon Visser, Commented Dec 24, 2013 at 14:27
In PY3 sys.getdefaultencoding() is always UTF-8. Use locale.getlocale() to get the current locale for the LC_CTYPE category, which is what wcstombs uses. — Eryk Sun
– Eryk Sun, Commented Dec 24, 2013 at 15:00
@eryksun I'm new to Python. Do you mean locale.getlocale(locale.LC_CTYPE) ? It returns (None, None). — user746461
– user746461, Commented Dec 25, 2013 at 2:32
Yes, I use Windows. After locale.setlocale(locale.LC_CTYPE, 'chinese'), '{datetime:%Y年%m月%d日}' runs well. Besides, I can also put Japanese characters in format_spec. Thank you so much! — user746461
– user746461, Commented Dec 25, 2013 at 5:15

Eryk Sun · Accepted Answer · 2013-12-25 07:22:50Z

datetime.__format__ calls datetime.strftime, which does some preprocessing and then calls time.strftime (CPython 3.3.3 source).

On Windows, time.strftime uses the C runtime's multibyte-string function strftime instead of the wide-character string function wcsftime. First it has to encode the format string according to the current locale by calling PyUnicode_EncodeLocale. This in turn calls the CRT function wcstombs (MSDN), which uses the currently configured locale for the LC_CTYPE category. If the process is currently using the default "C" locale, wcstombs converts Latin-1 (codes < 256) directly to bytes, and anything else is an EILSEQ error, i.e. "Illegal byte sequence".

Use the locale module to set a new locale. The actual locale names vary by platform, but with Microsoft's setlocale you should be able to just set a language string and use the default codepage for the given language. Generally you shouldn't mess with this for a library, and an application should configure the locale at startup. For example:

>>> import datetime, locale >>> oldlocale = locale.setlocale(locale.LC_CTYPE, None) >>> oldlocale 'C' >>> newlocale = locale.setlocale(locale.LC_CTYPE, 'chinese') >>> d = datetime.datetime.now() >>> '{datetime:%Y\\u5e74%m\\u6708%d\\u65e5}'.format(datetime=d) '2013\\u5e7412\\u670825\\u65e5'

If you want the formatted time to use locale-specific names (e.g. month and day), then also set the LC_TIME category:

>>> newlocale = locale.setlocale(locale.LC_TIME, 'chinese') >>> '{datetime:%B %d, %Y}'.format(datetime=d) '\u5341\u4e8c\u6708 25, 2013'

Collectives™ on Stack Overflow

Python3.3: .format() with unicode format_spec

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related