How to check if a string consists only of chars, which can be successfully encoded in ISO 8859-1? Or in other words - how to find "illegal"/"not ISO 8859-1 compatible" chars in a string?
3 Answers
Try this:
private static bool IsValidISO(string input) { byte[] bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(input); String result = Encoding.GetEncoding("ISO-8859-1").GetString(bytes); return String.Equals(input, result); } This answer is based on an answer of this Java question (my code is the C# equivalent): http://www.velocityreviews.com/forums/t137810-checking-whether-a-string-contains-only-iso-8859-1-chars.html
3 Comments
netblognet
This looks better than mine idea. Thanks for your answer!
user3094403
@netblognet You're welcome! I also looked at your code, but it looks "dangerous" because you can't be 100% sure that non-ISO chars will give a question mark. My code is also faster.
Oliver Voutat
Great answer. Small suggestion even if it is a really old post. Instead of Encoding.GetEncoding("ISO-8859-1") use Encoding.Latin1
I came up with this idea. Might this be possible?
private static bool IsValidISO(string input) { foreach (char c in input) { Encoding iso = Encoding.GetEncoding("ISO-8859-1"); Encoding utf8 = Encoding.UTF8; byte[] isoBytes = iso.GetBytes(c.ToString()); byte[] utfBytes = Encoding.Convert(iso, utf8, isoBytes); string convertedC = utf8.GetString(utfBytes); if (c != '?' && convertedC == "?") return false; } return true; }