27

In Java, there are methods called isJavaIdentifierStart and isJavaIdentifierPart on the Character class that may be used to tell if a string is a valid Java identifier, like so:

public boolean isJavaIdentifier(String s) { int n = s.length(); if (n==0) return false; if (!Character.isJavaIdentifierStart(s.charAt(0))) return false; for (int i = 1; i < n; i++) if (!Character.isJavaIdentifierPart(s.charAt(i))) return false; return true; } 

Is there something like this for C#?

0

8 Answers 8

39

Yes:

// using System.CodeDom.Compiler; CodeDomProvider provider = CodeDomProvider.CreateProvider("C#"); if (provider.IsValidIdentifier (YOUR_VARIABLE_NAME)) { // Valid } else { // Not valid } 

From here: How to determine if a string is a valid variable name?

Sign up to request clarification or add additional context in comments.

1 Comment

This does have some perf. implications you should be aware of. Please see my post for more info.
10

I would be wary of the other solutions offered here. Calling CodeDomProvider.CreateProvider requires finding and parsing the Machine.Config file, as well as your app.config file. That's likely to be several times slower than the time required to just check the string your self.

Instead I would advocate you make one of the following changes:

  1. Cache the provider in a static variable.

    This will cause you to take the hit of creating it only once, but it will slow down type loading.

  2. Create the provider directly, by creating a Microsoft.CSharp.CSharpCodeProvider instance your self

    This will skip the config file parsing all together.

  3. Write the code to implement the check your self.

    If you do this, you get the greatest control over how it's implemented, which can help you optimize performance if you need to. See section 2.2.4 of the C# language spec for the complete lexical grammar for C# identifiers.

Comments

9

With Roslyn being open source, code analysis tools are right at your fingertips, and they're written for performance. (Right now they're in pre-release).

However, I can't speak to the performance cost of loading the assembly.

Install the tools using nuget:

Install-Package Microsoft.CodeAnalysis -Pre 

Ask your question:

var isValid = Microsoft.CodeAnalysis.CSharp.SyntaxFacts.IsValidIdentifier("I'mNotValid"); Console.WriteLine(isValid); // False 

Comments

6

Basically something like:

const string start = @"(\p{Lu}|\p{Ll}|\p{Lt}|\p{Lm}|\p{Lo}|\p{Nl})"; const string extend = @"(\p{Mn}|\p{Mc}|\p{Nd}|\p{Pc}|\p{Cf})"; Regex ident = new Regex(string.Format("{0}({0}|{1})*", start, extend)); s = s.Normalize(); return ident.IsMatch(s); 

2 Comments

OMG 7 upvotes, and it doesn't even work, and didn't even compile until I fixed the code...
The original source had been archived before it went offline.
5

Necromancing here.

In .NET Core/DNX, you can do it with Roslyn-SyntaxFacts

Microsoft.CodeAnalysis.CSharp.SyntaxFacts.IsReservedKeyword( Microsoft.CodeAnalysis.CSharp.SyntaxFacts.GetKeywordKind("protected") ); foreach (ColumnDefinition cl in tableColumns) { sb.Append(@" public "); sb.Append(cl.DOTNET_TYPE); sb.Append(" "); // for keywords //if (!Microsoft.CodeAnalysis.CSharp.SyntaxFacts.IsValidIdentifier(cl.COLUMN_NAME)) if (Microsoft.CodeAnalysis.CSharp.SyntaxFacts.IsReservedKeyword( Microsoft.CodeAnalysis.CSharp.SyntaxFacts.GetKeywordKind(cl.COLUMN_NAME) )) sb.Append("@"); sb.Append(cl.COLUMN_NAME); sb.Append("; // "); sb.AppendLine(cl.SQL_TYPE); } // Next cl 


Or in the old variant with Codedom - After a look in the mono sourcecode:

CodeDomProvider.cs

public virtual bool IsValidIdentifier (string value) 286 { 287 ICodeGenerator cg = CreateGenerator (); 288 if (cg == null) 289 throw GetNotImplemented (); 290 return cg.IsValidIdentifier (value); 291 } 292 

Then CSharpCodeProvider.cs

public override ICodeGenerator CreateGenerator() 91 { 92 #if NET_2_0 93 if (providerOptions != null && providerOptions.Count > 0) 94 return new Mono.CSharp.CSharpCodeGenerator (providerOptions); 95 #endif 96 return new Mono.CSharp.CSharpCodeGenerator(); 97 } 

Then CSharpCodeGenerator.cs

protected override bool IsValidIdentifier (string identifier) { if (identifier == null || identifier.Length == 0) return false; if (keywordsTable == null) FillKeywordTable (); if (keywordsTable.Contains (identifier)) return false; if (!is_identifier_start_character (identifier [0])) return false; for (int i = 1; i < identifier.Length; i ++) if (! is_identifier_part_character (identifier [i])) return false; return true; } private static System.Collections.Hashtable keywordsTable; private static string[] keywords = new string[] { "abstract","event","new","struct","as","explicit","null","switch","base","extern", "this","false","operator","throw","break","finally","out","true", "fixed","override","try","case","params","typeof","catch","for", "private","foreach","protected","checked","goto","public", "unchecked","class","if","readonly","unsafe","const","implicit","ref", "continue","in","return","using","virtual","default", "interface","sealed","volatile","delegate","internal","do","is", "sizeof","while","lock","stackalloc","else","static","enum", "namespace", "object","bool","byte","float","uint","char","ulong","ushort", "decimal","int","sbyte","short","double","long","string","void", "partial", "yield", "where" }; static void FillKeywordTable () { lock (keywords) { if (keywordsTable == null) { keywordsTable = new Hashtable (); foreach (string keyword in keywords) { keywordsTable.Add (keyword, keyword); } } } } static bool is_identifier_start_character (char c) { return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || c == '_' || c == '@' || Char.IsLetter (c); } static bool is_identifier_part_character (char c) { return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || c == '_' || (c >= '0' && c <= '9') || Char.IsLetter (c); } 

You get this code:

public static bool IsValidIdentifier (string identifier) { if (identifier == null || identifier.Length == 0) return false; if (keywordsTable == null) FillKeywordTable(); if (keywordsTable.Contains(identifier)) return false; if (!is_identifier_start_character(identifier[0])) return false; for (int i = 1; i < identifier.Length; i++) if (!is_identifier_part_character(identifier[i])) return false; return true; } internal static bool is_identifier_start_character(char c) { return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || c == '_' || c == '@' || char.IsLetter(c); } internal static bool is_identifier_part_character(char c) { return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || c == '_' || (c >= '0' && c <= '9') || char.IsLetter(c); } private static System.Collections.Hashtable keywordsTable; private static string[] keywords = new string[] { "abstract","event","new","struct","as","explicit","null","switch","base","extern", "this","false","operator","throw","break","finally","out","true", "fixed","override","try","case","params","typeof","catch","for", "private","foreach","protected","checked","goto","public", "unchecked","class","if","readonly","unsafe","const","implicit","ref", "continue","in","return","using","virtual","default", "interface","sealed","volatile","delegate","internal","do","is", "sizeof","while","lock","stackalloc","else","static","enum", "namespace", "object","bool","byte","float","uint","char","ulong","ushort", "decimal","int","sbyte","short","double","long","string","void", "partial", "yield", "where" }; internal static void FillKeywordTable() { lock (keywords) { if (keywordsTable == null) { keywordsTable = new System.Collections.Hashtable(); foreach (string keyword in keywords) { keywordsTable.Add(keyword, keyword); } } } } 

1 Comment

It's probably a good idea to also check for "contextual keywords". You can check that a string is not a reserved keyword or contextual keyword with SyntaxFacts.GetKeywordKind(keyword) == SyntaxKind.None && SyntaxFacts.GetContextualKeywordKind(keyword) == SyntaxKind.None or get the full list with SyntaxFacts.GetKeywordKinds().Select(SyntaxFacts.GetText)
4

Recently, I wrote an extension method that validates a string as a valid C# identifier.

You can find a gist with the implementation here: https://gist.github.com/FabienDehopre/5245476

It's based on the MSDN documentation of Identifier (http://msdn.microsoft.com/en-us/library/aa664670(v=vs.71).aspx)

public static bool IsValidIdentifier(this string identifier) { if (String.IsNullOrEmpty(identifier)) return false; // C# keywords: http://msdn.microsoft.com/en-us/library/x53a06bb(v=vs.71).aspx var keywords = new[] { "abstract", "event", "new", "struct", "as", "explicit", "null", "switch", "base", "extern", "object", "this", "bool", "false", "operator", "throw", "breal", "finally", "out", "true", "byte", "fixed", "override", "try", "case", "float", "params", "typeof", "catch", "for", "private", "uint", "char", "foreach", "protected", "ulong", "checked", "goto", "public", "unchekeced", "class", "if", "readonly", "unsafe", "const", "implicit", "ref", "ushort", "continue", "in", "return", "using", "decimal", "int", "sbyte", "virtual", "default", "interface", "sealed", "volatile", "delegate", "internal", "short", "void", "do", "is", "sizeof", "while", "double", "lock", "stackalloc", "else", "long", "static", "enum", "namespace", "string" }; // definition of a valid C# identifier: http://msdn.microsoft.com/en-us/library/aa664670(v=vs.71).aspx const string formattingCharacter = @"\p{Cf}"; const string connectingCharacter = @"\p{Pc}"; const string decimalDigitCharacter = @"\p{Nd}"; const string combiningCharacter = @"\p{Mn}|\p{Mc}"; const string letterCharacter = @"\p{Lu}|\p{Ll}|\p{Lt}|\p{Lm}|\p{Lo}|\p{Nl}"; const string identifierPartCharacter = letterCharacter + "|" + decimalDigitCharacter + "|" + connectingCharacter + "|" + combiningCharacter + "|" + formattingCharacter; const string identifierPartCharacters = "(" + identifierPartCharacter + ")+"; const string identifierStartCharacter = "(" + letterCharacter + "|_)"; const string identifierOrKeyword = identifierStartCharacter + "(" + identifierPartCharacters + ")*"; var validIdentifierRegex = new Regex("^" + identifierOrKeyword + "$", RegexOptions.Compiled); var normalizedIdentifier = identifier.Normalize(); // 1. check that the identifier match the validIdentifer regex and it's not a C# keyword if (validIdentifierRegex.IsMatch(normalizedIdentifier) && !keywords.Contains(normalizedIdentifier)) { return true; } // 2. check if the identifier starts with @ if (normalizedIdentifier.StartsWith("@") && validIdentifierRegex.IsMatch(normalizedIdentifier.Substring(1))) { return true; } // 3. it's not a valid identifier return false; } 

Comments

2

The now-released Roslyn project provides Microsoft.CodeAnalysis.CSharp.SyntaxFacts, with SyntaxFacts.IsIdentifierStartCharacter(char) and SyntaxFacts.IsIdentifierPartCharacter(char) methods just like Java.

Here it is in use, in a simple function I use to turn noun phrases (eg "Start Date") into C# identifiers (eg "StartDate"). N.B I'm using Humanizer to do the camel-case conversion, and Roslyn to check whether a character is valid.

 public static string Identifier(string name) { Check.IsNotNullOrWhitespace(name, nameof(name)); // trim off leading and trailing whitespace name = name.Trim(); // should deal with spaces => camel casing; name = name.Dehumanize(); var sb = new StringBuilder(); if (!SyntaxFacts.IsIdentifierStartCharacter(name[0])) { // the first characters sb.Append("_"); } foreach(var ch in name) { if (SyntaxFacts.IsIdentifierPartCharacter(ch)) { sb.Append(ch); } } var result = sb.ToString(); if (SyntaxFacts.GetKeywordKind(result) != SyntaxKind.None) { result = @"@" + result; } return result; } 

Tests;

 [TestCase("Start Date", "StartDate")] [TestCase("Bad*chars", "BadChars")] [TestCase(" leading ws", "LeadingWs")] [TestCase("trailing ws ", "TrailingWs")] [TestCase("class", "Class")] [TestCase("int", "Int")] [Test] public void CSharp_GeneratesDecentIdentifiers(string input, string expected) { Assert.AreEqual(expected, CSharp.Identifier(input)); } 

3 Comments

Useful fact, but not helpful in that you didn't explain how to utilize this. I can't seem to locate a "Microsoft.CodeAnalysis" NuGet package, nor can I seem to locate an official page explaining where the library can be obtained.
I provided the link in the first setence: github.com/dotnet/roslyn. It notes: nuget install Microsoft.CodeAnalysis # Install Language APIs and Services
You should install Microsoft.CodeAnalysis.CSharp to get the C# rules, too.
1

This can be done using reflection - see How to determine if a string is a valid variable name?

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.