4

Is it possible to create a regular expression to allow non-ascii letters along with Latin alphabets, for example Chinese or Greek symbols(eg. A汉语AbN漢語 allowed)?

I currently have the following ^[\w\d][\w\d_\-\.\s]*$ which only allows Latin alphabets.

5
  • Use Chinese characters in a regular expression: stackoverflow.com/questions/9576384/… Commented Oct 25, 2012 at 13:07
  • I want to be able to allow all none-latin chars. Commented Oct 25, 2012 at 13:10
  • Which language/regex flavor are you using? This is crucial information. Commented Oct 25, 2012 at 13:12
  • 1
    By the way, \w already contains \d and _, so you don't need the latter. Commented Oct 25, 2012 at 13:15
  • "All non-latin chars (in addition to latin chars)" - so basically, anything? Commented Oct 25, 2012 at 13:21

1 Answer 1

6

In .NET,

^[\p{L}\d_][\p{L}\d_.\s-]*$ 

is equivalent to your regex, additionally allowing other Unicode letters.

Explanation:

\p{L} is a shorthand for the Unicode property "Letter".

Caveat: I think you wanted to not allow the underscore as initial character (evidenced by its presence only in the second character class). Since \w includes the underscore, your regex did allow it, though. You might want to remove it from the first character class in my solution (it's not included in \p{L}, of course).

In ECMAScript, things are not so easy. You would have to define your own Unicode character ranges. Fortunately, a fellow StackOverflow user has already risen to the occasion and designed a JavaScript regex converter:

https://stackoverflow.com/a/8933546/20670

Sign up to request clarification or add additional context in comments.

4 Comments

hi, it seems like it isnt working, only allows digits and _. is it dependent on the .net framework version?!
its being retrieved from a resource file, and inserted into validationexpression="<%$ H:VT.DimensionNameNoneAscii %>"
Oh, is that something that runs client-side in the browser? Then it can only use ECMAScript regexes, and those don't support Unicode properties.
so, is there a way to get the same functionality using EMACScript regexes?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.