1

I am working on a application in which i have to detect unicode characters for example my text is

Suzana R°u˘zi˘ckova and Viktor Kalabis, Yvonne Sebastaková, Linda Servitová, Sandra Stevenson. 

I have written a regex for it "[^\u0000-\u0080]+" but it not detects all characters. Also the word R°u˘zi˘ckova is not displaying correctly in c# because the combinning characters are on the top of alphabets not as a separate character.

How to make a regex which detects all combined characters and i am working in c#.

2
  • Which characters would you like to detect? And “R°u˘zi˘ckova” is not a word, it is apparently the name “Růžičkova” written in a special Asciification—does your data contain such strings, and how should they be handled. Commented Apr 3, 2014 at 10:55
  • yes my data contains such words and the characters in these words are ignored as a white spaces when fonts are applied to it. Commented Apr 3, 2014 at 10:58

1 Answer 1

1

'[\x00-\x7f]' is ascii range

'[^\x00-\x7f]' is non-ascii char range

no idea about the re engine of asp.net, but you can give it a try.

here is a test with my grep:

kent$ (US-2998|✔) echo "Suzana R°u˘zi˘ckova and Viktor Kalabis, Yvonne Sebastaková, Linda Servitová, Sandra Stevenson."|grep -oP '[^\x00-\x7f]' ° ˘ ˘ á á 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.