5

I need some sort of conversion/mapping that, for example, is done by CLCL clipboard manager.

What it does is like that:

I copy the following Unicode text: ūī
And CLCL converts it to: ui

Is there any technique to do such a conversion? Or maybe there are mapping tables that can be used to convert, let's say, symbol ū is mapped to u.

UPDATE

Thanks to all for help. Here is what I came with (a hybrid of two solutions), one posted by Erik Schierboom and one taken from http://blogs.infosupport.com/normalizing-unicode-strings-in-c/#comment-8984

public static string ConvertUnicodeToAscii(string unicodeStr, bool skipNonConvertibleChars = false) { if (string.IsNullOrWhiteSpace(unicodeStr)) { return unicodeStr; } var normalizedStr = unicodeStr.Normalize(NormalizationForm.FormD); if (skipNonConvertibleChars) { return new string(normalizedStr.ToCharArray().Where(c => (int) c <= 127).ToArray()); } return new string( normalizedStr.Where( c => { UnicodeCategory category = CharUnicodeInfo.GetUnicodeCategory(c); return category != UnicodeCategory.NonSpacingMark; }).ToArray()); } 
4
  • 2
    What, several questions that say that this is impossible? Which are those questions? They are wrong and need proper answers. There are also several questions which show how this works. Commented Mar 28, 2013 at 13:56
  • how about creating your own mapping? Commented Mar 28, 2013 at 13:56
  • By Unicode, do you mean UTF16? Commented Mar 28, 2013 at 13:56
  • 1
    possible duplicate of Converting ê to e and etc in .net c# Commented Mar 28, 2013 at 13:58

2 Answers 2

3

I have used the following code for some time:

private static string NormalizeDiacriticalCharacters(string value) { if (value == null) { throw new ArgumentNullException("value"); } var normalised = value.Normalize(NormalizationForm.FormD).ToCharArray(); return new string(normalised.Where(c => (int)c <= 127).ToArray()); } 
Sign up to request clarification or add additional context in comments.

7 Comments

I dislike the c <= 127 hack, it’s unnecessary. But yes, that’s the gist of it.
Well, otherwise you could have returned a string that contains characters that fall outside the ASCII range, right?
Look at the question I marked this one as a duplicate of. The “right” way is to look at the Unicode category and only retain non-spacing / non-combining diacritic characters. But to be honest that’s probably way less efficient and in my (admittedly limited) understanding of Unicode, your answer always yields the correct result.
Sorry, I missed the duplicate question part. You are right of course.
It works, but one note, the characters which cannot be mapped, are ignored. For example, "Łukasz" becomes "ukasz". The method used in the "duplicate of" question leaves such characters in output. So, probably, it is a good idea to combine the two methods and put a bool parameter whether to leave or skip.
|
0

In general, it is not possible to convert Unicode to ASCII because ASCII is a subset of Unicode.

That being said, it is possible to convert characters within the ASCII subset of Unicode to Unicode.

In C#, generally there's no need to do the conversion, since all strings are Unicode by default anyway, and all components are Unicode-aware, but if you must do the conversion, use the following:

 string myString = "SomeString"; byte[] asciiString = System.Text.Encoding.ASCII.GetBytes(myString); 

4 Comments

This is not what OP meant.
@DavinTryon: Can you think of any ASCII characters that aren't contained in, say, UTF-8? I can think of many characters in UTF-8 that aren't in ASCII. For example the character 字 cannot be represented in US-ASCII.
Yes, but saying that it is a subset is not correct. UTF-8 (only one of the unicode formats) was explicitly created to be "backwards compatible" with ASCII.
@DavinTryon: What definition of subset are you using? Every codepoint in ASCII is contained in Unicode. ASCII is therefore completely contained within Unicode, or in other words, ASCII is a subset of Unicode. That's not to say Unicode predates ASCII, merely that it contains every element in ASCII (after all, that's what subset means).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.