Timeline for Why does [A-Z] match lowercase letters in bash?
Current License: CC BY-SA 4.0
26 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Jun 28, 2018 at 7:43 | history | edited | Stéphane Chazelas | CC BY-SA 4.0 | added 768 characters in body |
| May 22, 2017 at 20:43 | comment | added | dragon788 | This was a little easier to see using printf '%s\n' {{0..99},{A-Z},{a-z}} | sort and printf '%s\n' {{0..99},{A-Z},{a-z}} | LANG=C sort and also helped me confirm my language setting is causing the collating behavior I'm seeing. | |
| Sep 24, 2015 at 2:37 | comment | added | user79743 | @schily Yes, Bash behavior is controllable via LC_* variables. Just that the must be active in the running environment to work their magic. Start a new bash as this: LC_COLLATE="C" bash and try again echo [a-z]*. Or, more to the point, try: LC_COLLATE="C" bash -c 'echo [a-z]*'. | |
| Sep 9, 2015 at 10:31 | comment | added | Stéphane Chazelas | @schily, there are other ways. See how zsh is now doing it. | |
| Sep 3, 2015 at 11:09 | comment | added | schily | The only other way to handle this seems to drop such characters and this does not look like a better solution. | |
| Sep 3, 2015 at 11:02 | comment | added | Stéphane Chazelas | @schily, yes the shell has to decide what to do with those bytes not forming parts of valid characters. What I'm saying is that treating them as if they were the characters whose code point has the same value as that byte value is not the best approach IMO. Hence me starting the discussion on the zsh ML (an on the Austin group 2 months ago). | |
| Sep 3, 2015 at 10:59 | comment | added | schily | how characters are converted from a multi byte locale depends on mbtowc(). If there is a character that is officially an impossible multibyte value, mbtowc() returns -1 and the string converter advances by one and the output is still what the first wchar_t * parameter returns. | |
| Sep 3, 2015 at 10:39 | comment | added | Stéphane Chazelas | @schily, in that case, the \xFF was expanded to the 0xFF byte by my shell (zsh) before passing to schily-sh. schily-sh internally wrongly identified it to U+00FF. You may want to have a look at the current discussion on the zsh ML. Note that ksh93 is a bit broken as well in that b=$'\xff' ksh -c $'[[ $b = [\uff] ]]' returns false but b=$'\xff' ksh -c $'[[ $b = [[:alpha:]] ]]' returns true. | |
| Sep 3, 2015 at 10:19 | comment | added | schily | @Stéphane - The way \xFF is handled depends on the shell internals. Given the fact that \x1234 is possible, this should explain that the shell has parts where characters are handled as wide characters and others where the shell uses multibyte characters. gmatch (that handles case exists in a place where the shell uses multibyte characters but inside gmatch() everything is temporary converted into wide characters for processing. | |
| Sep 2, 2015 at 23:14 | comment | added | Stéphane Chazelas | @schily, I've raised the question on the zsh-workers mailing list | |
| Sep 2, 2015 at 23:08 | history | edited | Stéphane Chazelas | CC BY-SA 3.0 | deleted 95 characters in body |
| Sep 2, 2015 at 22:56 | comment | added | Stéphane Chazelas | @schily, having said that, it seems that zsh behaves the same. yash won't allow invalid character. | |
| Sep 2, 2015 at 22:51 | comment | added | Stéphane Chazelas | @schily, note that \xFF there is the byte 0xFF, not the character U+00FF (ÿ itself encoded as 0xC3 0xBF). \xFF alone doesn't form a valid character so I can't see why it should be matched by [É-Ź]. | |
| Sep 2, 2015 at 22:45 | comment | added | schily | @Stéphane - What you get with the case statement in the Bourne Shell is expected behavior assuming that you use UTF-8. The Bourne Shell uses gmatch() for case statements and this supports wide characters but does not use strcoll() but a plain value compare for the range. 0xFF is perfectly inside the range you specified. | |
| Sep 2, 2015 at 22:08 | history | edited | Stéphane Chazelas | CC BY-SA 3.0 | added 2811 characters in body |
| Sep 2, 2015 at 17:14 | history | edited | Stéphane Chazelas | CC BY-SA 3.0 | added 204 characters in body |
| Sep 2, 2015 at 17:12 | comment | added | schily | Let me mention again: zsh, POSIX-ksh88, ksh93t+ Bourne Shell, all behave the same way as I expect. Bash is the only shell that behaves different and bash is not controllable via the locale in this case. | |
| Sep 2, 2015 at 17:07 | comment | added | Stéphane Chazelas | @schily, I mention sort because bash globs are based on character sort order. I don't currently have access to such an old version of bash, but I can check later. Was it different then? | |
| Sep 2, 2015 at 17:06 | history | edited | Stéphane Chazelas | CC BY-SA 3.0 | added 75 characters in body |
| Sep 2, 2015 at 17:04 | comment | added | schily | BTW: My question was against bash, but your reply is related to sort. Did you try to check file name globbing with bash-3? | |
| Sep 2, 2015 at 17:03 | comment | added | Stéphane Chazelas | @cuonglm, more like mksh (both derived from pdksh). posh -c $'case Ó in [É-Ź]) echo yes; esac' returns nothing. | |
| Sep 2, 2015 at 17:00 | history | edited | Stéphane Chazelas | CC BY-SA 3.0 | deleted 8 characters in body |
| Sep 2, 2015 at 16:59 | comment | added | cuonglm | posh also behave like zsh and yash. | |
| Sep 2, 2015 at 16:53 | history | edited | Stéphane Chazelas | CC BY-SA 3.0 | added 663 characters in body |
| Sep 2, 2015 at 16:50 | comment | added | schily | If you were right, this could be controlled via LC_* variables. There seems to be a different reason. | |
| Sep 2, 2015 at 16:39 | history | answered | Stéphane Chazelas | CC BY-SA 3.0 |