Skip to main content
6 events
when toggle format what by license comment
Dec 4, 2022 at 21:28 comment added Stéphane Chazelas That's u with two combining characters. See also u\u0304\u0308 u\u0308\u0304 which actually have two different precomposed forms: (U+01D5 LATIN CAPITAL LETTER U WITH DIAERESIS AND MACRON and U+1E7B LATIN SMALL LETTER U WITH MACRON AND DIAERESIS). While both C\u301\u327 and C\u327\u301 give U+1E08 LATIN CAPITAL LETTER C WITH CEDILLA AND ACUTE after canonical composition...
Dec 3, 2022 at 14:21 comment added Thomas Tempelmann Since this is not a code related site, I think explaining the method is more helpful here. I only thought of adding precomposed and decomposed writings of the entire search string to the regex alternatives, but your case isn't known to me. Is that just an underline option for the letter ü, or what is that?
Dec 2, 2022 at 10:32 comment added Stéphane Chazelas Unless you share your code, that answer is not going to be very useful to anyone. Are you also taking into account the order in which characters are combined? Like ū̳ as uU+0304U+0333 vs uU+0333U+0304 vs U+016BU+0333?
Dec 2, 2022 at 10:10 history edited Thomas Tempelmann CC BY-SA 4.0
added 191 characters in body
Dec 2, 2022 at 10:05 vote accept Thomas Tempelmann
Dec 2, 2022 at 10:05 history answered Thomas Tempelmann CC BY-SA 4.0