how to make a regex to detect unicode characters?

Question

I am working on a application in which i have to detect unicode characters for example my text is

Suzana R°u˘zi˘ckova and Viktor Kalabis, Yvonne Sebastaková, Linda Servitová, Sandra Stevenson.

I have written a regex for it "[^\u0000-\u0080]+" but it not detects all characters. Also the word R°u˘zi˘ckova is not displaying correctly in c# because the combinning characters are on the top of alphabets not as a separate character.

How to make a regex which detects all combined characters and i am working in c#.

Which characters would you like to detect? And “R°u˘zi˘ckova” is not a word, it is apparently the name “Růžičkova” written in a special Asciification—does your data contain such strings, and how should they be handled. — Jukka K. Korpela
– Jukka K. Korpela, Commented Apr 3, 2014 at 10:55
yes my data contains such words and the characters in these words are ignored as a white spaces when fonts are applied to it. — mck
– mck, Commented Apr 3, 2014 at 10:58

Kent · Accepted Answer · 2014-04-03 11:55:15Z

'[\x00-\x7f]' is ascii range

'[^\x00-\x7f]' is non-ascii char range

no idea about the re engine of asp.net, but you can give it a try.

here is a test with my grep:

kent$ (US-2998|✔) echo "Suzana R°u˘zi˘ckova and Viktor Kalabis, Yvonne Sebastaková, Linda Servitová, Sandra Stevenson."|grep -oP '[^\x00-\x7f]' ° ˘ ˘ á á

Collectives™ on Stack Overflow

how to make a regex to detect unicode characters?

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related