Timeline for Why does [A-Z] match lowercase letters in bash?

Current License: CC BY-SA 4.0

26 events

when toggle format	what		by	license	comment
Jun 28, 2018 at 7:43	history	edited	Stéphane Chazelas	CC BY-SA 4.0	added 768 characters in body
May 22, 2017 at 20:43	comment	added	dragon788		This was a little easier to see using `printf '%s\n' {{0..99},{A-Z},{a-z}} \| sort` and `printf '%s\n' {{0..99},{A-Z},{a-z}} \| LANG=C sort` and also helped me confirm my language setting is causing the collating behavior I'm seeing.
Sep 24, 2015 at 2:37	comment	added	user79743		@schily Yes, Bash behavior is controllable via LC_* variables. Just that the must be active in the running environment to work their magic. Start a new bash as this: `LC_COLLATE="C" bash` and try again `echo [a-z]`. Or, more to the point, try: `LC_COLLATE="C" bash -c 'echo [a-z]'`.
Sep 9, 2015 at 10:31	comment	added	Stéphane Chazelas		@schily, there are other ways. See how `zsh` is now doing it.
Sep 3, 2015 at 11:09	comment	added	schily		The only other way to handle this seems to drop such characters and this does not look like a better solution.
Sep 3, 2015 at 11:02	comment	added	Stéphane Chazelas		@schily, yes the shell has to decide what to do with those bytes not forming parts of valid characters. What I'm saying is that treating them as if they were the characters whose code point has the same value as that byte value is not the best approach IMO. Hence me starting the discussion on the zsh ML (an on the Austin group 2 months ago).
Sep 3, 2015 at 10:59	comment	added	schily		how characters are converted from a multi byte locale depends on `mbtowc()`. If there is a character that is officially an impossible multibyte value, mbtowc() returns -1 and the string converter advances by one and the output is still what the first `wchar_t *` parameter returns.
Sep 3, 2015 at 10:39	comment	added	Stéphane Chazelas		@schily, in that case, the \xFF was expanded to the 0xFF byte by my shell (zsh) before passing to schily-sh. schily-sh internally wrongly identified it to U+00FF. You may want to have a look at the current discussion on the zsh ML. Note that ksh93 is a bit broken as well in that `b=$'\xff' ksh -c $'[[ $b = [\uff] ]]'` returns false but `b=$'\xff' ksh -c $'[[ $b = [[:alpha:]] ]]'` returns true.
Sep 3, 2015 at 10:19	comment	added	schily		@Stéphane - The way \xFF is handled depends on the shell internals. Given the fact that \x1234 is possible, this should explain that the shell has parts where characters are handled as wide characters and others where the shell uses multibyte characters. gmatch (that handles `case` exists in a place where the shell uses multibyte characters but inside `gmatch()` everything is temporary converted into wide characters for processing.
Sep 2, 2015 at 23:14	comment	added	Stéphane Chazelas		@schily, I've raised the question on the zsh-workers mailing list
Sep 2, 2015 at 23:08	history	edited	Stéphane Chazelas	CC BY-SA 3.0	deleted 95 characters in body
Sep 2, 2015 at 22:56	comment	added	Stéphane Chazelas		@schily, having said that, it seems that `zsh` behaves the same. `yash` won't allow invalid character.
Sep 2, 2015 at 22:51	comment	added	Stéphane Chazelas		@schily, note that `\xFF` there is the byte 0xFF, not the character U+00FF (`ÿ` itself encoded as 0xC3 0xBF). `\xFF` alone doesn't form a valid character so I can't see why it should be matched by `[É-Ź]`.
Sep 2, 2015 at 22:45	comment	added	schily		@Stéphane - What you get with the case statement in the Bourne Shell is expected behavior assuming that you use UTF-8. The Bourne Shell uses gmatch() for case statements and this supports wide characters but does not use strcoll() but a plain value compare for the range. 0xFF is perfectly inside the range you specified.
Sep 2, 2015 at 22:08	history	edited	Stéphane Chazelas	CC BY-SA 3.0	added 2811 characters in body
Sep 2, 2015 at 17:14	history	edited	Stéphane Chazelas	CC BY-SA 3.0	added 204 characters in body
Sep 2, 2015 at 17:12	comment	added	schily		Let me mention again: zsh, POSIX-ksh88, ksh93t+ Bourne Shell, all behave the same way as I expect. Bash is the only shell that behaves different and bash is not controllable via the locale in this case.
Sep 2, 2015 at 17:07	comment	added	Stéphane Chazelas		@schily, I mention `sort` because `bash` globs are based on character sort order. I don't currently have access to such an old version of `bash`, but I can check later. Was it different then?
Sep 2, 2015 at 17:06	history	edited	Stéphane Chazelas	CC BY-SA 3.0	added 75 characters in body
Sep 2, 2015 at 17:04	comment	added	schily		BTW: My question was against `bash`, but your reply is related to `sort`. Did you try to check file name globbing with bash-3?
Sep 2, 2015 at 17:03	comment	added	Stéphane Chazelas		@cuonglm, more like `mksh` (both derived from pdksh). `posh -c $'case Ó in [É-Ź]) echo yes; esac'` returns nothing.
Sep 2, 2015 at 17:00	history	edited	Stéphane Chazelas	CC BY-SA 3.0	deleted 8 characters in body
Sep 2, 2015 at 16:59	comment	added	cuonglm		`posh` also behave like `zsh` and `yash`.
Sep 2, 2015 at 16:53	history	edited	Stéphane Chazelas	CC BY-SA 3.0	added 663 characters in body
Sep 2, 2015 at 16:50	comment	added	schily		If you were right, this could be controlled via LC_* variables. There seems to be a different reason.
Sep 2, 2015 at 16:39	history	answered	Stéphane Chazelas	CC BY-SA 3.0

toggle format