3

I have a text written with al kinds of weird characters, like ŸŞşȘș€ÀÈÉÌÒÓÙàèéìòóùºª«»€ and I am trying to convert them to their normal equivalents, SAEIOUaeiou etc. I have tried this in a number of ways, but I keep getting mixed results, some work, some don't. This is what I've done so far:

byteArray1 = UnicodeEncoding.GetEncoding(1250).GetBytes(charArray); byteArray2 = UnicodeEncoding.GetEncoding(852).GetBytes(charArray); byteArray3 = UnicodeEncoding.GetEncoding(737).GetBytes(charArray); resultArray1 = UTF7Encoding.GetEncoding(1250).GetChars(byteArray1); resultArray2 = UTF7Encoding.GetEncoding(852).GetChars(byteArray2); resultArray3 = UTF7Encoding.GetEncoding(737).GetChars(byteArray3); 

Is there something simple and obvious (I doubt it) that I'm missing? Also, if I'm doing something really the wrong way, do tell.

2
  • Why are you creating encodings from specific subclasses? This will only confuse a reader. Just use Encoding.GetEncoding(). Commented Feb 3, 2012 at 15:24
  • I've tried in lots of ways, and this was the only one that partially worked. Commented Feb 3, 2012 at 15:41

1 Answer 1

5

If what you want to do is simply remove the diacritic marks from characters, I recommend you take a look at this blog post which describes how to do so.

It will not do anything about characters such as ºª«»€ though, but you can get rid of those after removing diacritics with a simple regular expression if you want:

var noDiac = RemoveDiacritics("ŸŞşȘș€ÀÈÉÌÒÓÙàèéìòóùºª«»€"); var cleanTxt = Regex.Replace(noDiac, "[^A-Z]", string.Empty, RegexOptions.IgnoreCase); // outputs: YSsSsAEEIOOUaeeioou 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.