11

Which regular expression can I use to match (allow) any kind of letter from any language?

I need to match any letter including any diacritics (e.g., á, ü, ñ) and exclude any kind of symbol (math symbols, currency signs, dingbats, box-drawing characters, etc.) and punctuation characters.

I'm using ASP.NET MVC 2 with .NET 4. I’ve tried this annotation in my view model

[RegularExpression(@"\p{L}*", ... 

and this one

[RegularExpression(@"\p{L}\p{M}*", ... 

but client-side validation rejects accented characters.

UPDATE: Thank you for all your answers. Your suggestions work but only for .NET, and the problem here is that it also uses the regex for client-side validation with JavaScript.

I had to go with

[^0-9_\|°¬!#\$%/\\\(\)\?¡¿\+\{\}\[\]:\.\,;@ª^\*<>=&] 

which is very ugly and does not cover all scenarios but is the closest thing to what I need.

0

6 Answers 6

5

You can use Char.IsLetter:

Indicates whether the specified Unicode character is categorized as a Unicode letter.

With .Net 4.0:

string onlyLetters = String.Concat(str.Where(Char.IsLetter)); 

On 3.5 String.Concat only excepts an array, so you should also call ToArray.

Sign up to request clarification or add additional context in comments.

2 Comments

This doesn't answer the question, not necessarely the question is to solve a problem, maybe it was made to learn REGEX, i don't know. Ok, it may be a problem, but he specifically asks how to do that with regex (through the question, a tag, and even the title), which is clearly accomplishable. +1 for solving the 'problem', -1 for not answering the question. Neutral.
@Marcelo - Looking more closely on the question, you are probably right. [ suggest this is used as an Attribute, and possibly cannot be replaced by code.
3

\p{L}* should match "any kind of letter from any language". It should work, I used it in a i18n-proof uppercase/lowercase recognition regex in .NET.

1 Comment

Then the problem might be more specific than I thought, I'll update the question
3

Your problem is more likely to the fact that you will only have to have one alpha-char, because the regex will match anything that has at least one char.

By adding ^ as prefix and $ as postfix, the whole sentence should comply to your regex. So this prob works:

^\p{L}*$ 

Regexbuddy explains:

  1. ^ Assert position at beginning of the string
  2. \p{L} A character with the Unicode property 'letter' (any kind of letter from any kind of language) 2a. Between zero and unlimited times, as many as possible (greedy)
  3. $ Assert position at the end of the string

1 Comment

\p{L} is the winner = "Matches any kind of letter from any language"
2

I’ve just had to validate a URL and I chose this regular expression in .NET.

^[(\p{L})?(\p{M})?-]*$ 

Begin and end with a character of any language (optionally either letters or marks) and allow hyphens.

Comments

1

One thing to watch out for is the client-side regex. It uses javascript regex on the client side and .net regex on the server side. Javascript won't support this scenario.

Comments

0

\w - matches any alphanumeric character (including numbers)

In my tests it has matched:

  • ã
  • à
  • ç
  • 8
  • z

and hasn't matched:

  • ;
  • ,
  • \
  • :

In case you know exactly what you want to exclude (like a little list) you cand do the following:

[^;,\`.]

which matches one time any character that isnt:

  • ;
  • ,
  • \
  • `
  • .

Hope it helps!

7 Comments

@eagle hmm.. you're right, at least i've given an alternative. Gonna check it out though
\w - stands for Word. Not letter.
It also matches numbers which the OP does not want.
@Lukas: This is misleading. \w matches a single character, not a word. It will match letters, numbers and the underscore. Whether it matches only ASCII letters or Unicode letters varies between regex flavors - in .NET it's Unicode.
@Tim_Pietzcker I'm actually just learning REGEX, thank you, this was useful even for me =)
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.