Looking for some black magic that will match any string with "weird" characters in it. Standard ASCII characters are fine. Everything else isn't.
This is for sanitizing various web forms.
This gets anything out of the ASCII range
[^\x00-\x7F] There are still some "weird" characters like x00 (NULL), but they are valid ASCII.
For reference, see the ASCII table
[^\x20-\x7E]. That cuts out the control characters 0x0 through 0x31 and the 0x7F control character. Alternately, [^\x20-\x7E\r\n\t], which adds back in the common line ending characters and tabs, which may or may not be desirable.
\p{C}.new Regex(@"\p{C}").Replace(suspect, string.Empty)will clear out both ASCII and non-ASCII controls and formatters, while not damaging normal text a more naïve (or as you would have it, nave) approach would mangle. Particularly if you have names or people or places appearing anywhere (proper names being both places where non-ASCII letters crop up a lot in English, and places where users get particularly upset if you mangle them).