I'm looking for a quick way (in C#) to determine if a string is a valid variable name. My first intuition is to whip up some regex to do it, but I'm wondering if there's a better way to do it. Like maybe some kind of a secret method hidden deep somewhere called IsThisAValidVariableName(string name), or some other slick way to do it that is not prone to errors that might arise due to lack of regex prowess.
- do you mean C# variable name? And I think regex is your best bet unless you roll your own little parser thing(which is overkill for such a small thing to check)Earlz– Earlz2009-12-01 23:30:46 +00:00Commented Dec 1, 2009 at 23:30
- One thing to be careful of if you're using a regex is that there are several uinicode character classes that you might need to take into account: msdn.microsoft.com/en-us/library/aa664670%28VS.71%29.aspxMichael Burr– Michael Burr2009-12-01 23:41:19 +00:00Commented Dec 1, 2009 at 23:41
- Do you care about whether the variable is valid in a specific context or only about whether it can ever be a valid identifier in any context?Mark Byers– Mark Byers2009-12-01 23:45:15 +00:00Commented Dec 1, 2009 at 23:45
- 2It would be helpful to know WHY you want this information. Are you writing a C# compiler? Are you generating code based on a user-supplied string?Eric Lippert– Eric Lippert2009-12-02 07:09:52 +00:00Commented Dec 2, 2009 at 7:09
- Dupe: stackoverflow.com/questions/1904252/…RCIX– RCIX2009-12-15 00:07:46 +00:00Commented Dec 15, 2009 at 0:07
7 Answers
Try this:
// using System.CodeDom.Compiler; CodeDomProvider provider = CodeDomProvider.CreateProvider("C#"); if (provider.IsValidIdentifier (YOUR_VARIABLE_NAME)) { // Valid } else { // Not valid } 6 Comments
System.CodeDom.Compiler namespace for that :-)CodeDomProvider?new Microsoft.CSharp.CSharpCodeProvider().IsValidIdentifier(...) directly requires significantly less overhead; no parsing to find the providerA slight improvement over romfir's awnser using Microsoft.CodeAnalysis.CSharp, to also treat reserved keywords as invalid member names:
public static bool IsValidMemberName(string name) { return SyntaxFacts.IsValidIdentifier(name) && SyntaxFacts.GetKeywordKind(name) == SyntaxKind.None; } Comments
There are a couple of special cases around the @ character that are easy to forget to check - namely, '@' by itself is not a valid identifier, and neither is "@1foo". To catch these, you can first check if the string is a keyword, then remove @ from the start of the string, and then check if what's left is a valid identifier (disallowing @ characters).
Here I've combined this with a method to parse Unicode escape sequences in identifiers, and hopefully complete C# (5.0) Unicode character checking. To use it, first call TryParseRawIdentifier() to handle keywords, escape sequences, formatting characters (which are removed), and verbatim identifiers. Next, pass the result to IsValidParsedIdentifier() to check if the first and subsequent characters are valid. Note that the strings returned from TryParseRawIdentifier() are equal if and only if the identifiers are considered identical by C#.
public static class CSharpIdentifiers { private static HashSet<string> _keywords = new HashSet<string> { "abstract", "as", "base", "bool", "break", "byte", "case", "catch", "char", "checked", "class", "const", "continue", "decimal", "default", "delegate", "do", "double", "else", "enum", "event", "explicit", "extern", "false", "finally", "fixed", "float", "for", "foreach", "goto", "if", "implicit", "in", "int", "interface", "internal", "is", "lock", "long", "namespace", "new", "null", "object", "operator", "out", "override", "params", "private", "protected", "public", "readonly", "ref", "return", "sbyte", "sealed", "short", "sizeof", "stackalloc", "static", "string", "struct", "switch", "this", "throw", "true", "try", "typeof", "uint", "ulong", "unchecked", "unsafe", "ushort", "using", "virtual", "void", "volatile", "while" }; public static IReadOnlyCollection<string> Keywords { get { return _keywords; } } public static bool TryParseRawIdentifier(string str, out string parsed) { if (string.IsNullOrEmpty(str) || _keywords.Contains(str)) { parsed = null; return false; } StringBuilder sb = new StringBuilder(str.Length); int verbatimCharWidth = str[0] == '@' ? 1 : 0; for (int i = verbatimCharWidth; i < str.Length; ) //Manual increment { char c = str[i]; if (c == '\\') { char next = str[i + 1]; int charCodeLength; if (next == 'u') charCodeLength = 4; else if (next == 'U') charCodeLength = 8; else { parsed = null; return false; } //No need to check for escaped backslashes or special sequences like \n, //as they not valid identifier characters int charCode; if (!TryParseHex(str.Substring(i + 2, charCodeLength), out charCode)) { parsed = null; return false; } sb.Append(char.ConvertFromUtf32(charCodeLength)); //Handle characters above 2^16 by converting them to a surrogate pair i += 2 + charCodeLength; } else if (char.GetUnicodeCategory(str, i) == UnicodeCategory.Format) { //Use (string, index) in order to handle surrogate pairs //Skip this character if (char.IsSurrogatePair(str, i)) i += 2; else i += 1; } else { sb.Append(c); i++; } } parsed = sb.ToString(); return true; } private static bool TryParseHex(string str, out int result) { return int.TryParse(str, NumberStyles.AllowHexSpecifier, CultureInfo.InvariantCulture, out result); //NumberStyles.AllowHexSpecifier forces all characters to be hex digits } public static bool IsValidParsedIdentifier(string str) { if (string.IsNullOrEmpty(str)) return false; if (!IsValidParsedIdentifierStart(str, 0)) return false; int firstCharWidth = char.IsSurrogatePair(str, 0) ? 2 : 1; for (int i = firstCharWidth; i < str.Length; ) //Manual increment { if (!IsValidParsedIdentifierPart(str, i)) return false; if (char.IsSurrogatePair(str, i)) i += 2; else i += 1; } return true; } //(String, index) pairs are used instead of chars in order to support surrogate pairs //(Unicode code-points above 2^16 represented using two 16-bit characters) public static bool IsValidParsedIdentifierStart(string s, int index) { return s[index] == '_' || char.IsLetter(s, index) || char.GetUnicodeCategory(s, index) == UnicodeCategory.LetterNumber; } public static bool IsValidParsedIdentifierPart(string s, int index) { if (s[index] == '_' || (s[index] >= '0' && s[index] <= '9') || char.IsLetter(s, index)) return true; switch (char.GetUnicodeCategory(s, index)) { case UnicodeCategory.LetterNumber: //Eg. Special Roman numeral characters (not covered by IsLetter()) case UnicodeCategory.DecimalDigitNumber: //Includes decimal digits in other cultures case UnicodeCategory.ConnectorPunctuation: case UnicodeCategory.NonSpacingMark: case UnicodeCategory.SpacingCombiningMark: //UnicodeCategory.Format handled in TryParseRawIdentifier() return true; default: return false; } } } Comments
More recent solution is using Roslyn APIs from Microsoft.CodeAnalysis.CSharp
SyntaxFacts.IsValidIdentifier('identifierToCheck') Comments
The longer way, plus it is much slower, is to use reflection to iterate over members of a class/namespace and compare by checking if the reflected member**.ToString()** is the same as the string input, this requires having the assembly loaded beforehand.
Another way of doing it (a much longer way round it that overcomes the use of regex, by using an already available Antlr scanner/parser) borders on parsing/lexing C# code and then scanning for member names (i.e. variables) and comparing to the string used as an input, for example, input a string called 'fooBar', then specify the source (such as assembly or C# code) and scan it by analyzing looking specifically for declaration of members such as for example
private int fooBar;
Yes, it is complex but a powerful understanding will arise when you realize what compiler writers are doing and will enhance your knowledge of the C# language to a level where you get quite intimate with the syntax and its peculiarities.
Comments
In WPF this can be uses to check if a string is a valid variable name. But it does not regognize reserved strings like "public".
// works only in WPF! public static bool CheckIfStringIsValidVarName(string stringToCheck) { if (string.IsNullOrWhiteSpace(stringToCheck)) return false; TextBox textBox = new TextBox(); try { // stringToCheck == ""; // !!! does NOT throw !!! // stringToCheck == "Name$"; // throws // stringToCheck == "0"; // throws // stringToCheck == "name with blank"; // throws // stringToCheck == "public"; // does NOT throw // stringToCheck == "ValidName"; textBox.Name = stringToCheck; } catch (ArgumentException ex) { return false; } return true; }