20

I want use \w regex for to allow alpha numeric but I don't want underscore _ to be part of it. Since _ is included in \w. So I have coded like this but doesn't work, what is my mistake?

(/^roger\w{2,3}[0-9a-z]/i) 

I am expecting any character other than A-Z or 1-2 to be exclude

ex - roger3_2 or roger46_ or roger2_

but

roger54 or roger4a or roger455 or rogerAAA

are to be ok

2
  • 2
    How doesn't it work? Please give more detail. Commented Mar 28, 2012 at 15:05
  • You should probably add input and expected output... Commented Mar 28, 2012 at 15:11

5 Answers 5

45

You could try something like:

[^_\W]+ 
Sign up to request clarification or add additional context in comments.

Comments

8
  • A numeric code point is \pN or \p{Number}.
  • A digit code point is \d, \p{digit}, \p{Nd}, \p{Decimal_Number}, or \p{Numeric_Type=Decimal}.
  • An alphabetic code point is \p{alpha} or \p{Alphabetic}. It includes all \p{Digit}, \p{Letter}, and \p{Letter_Number} code points, as well as certain \p{Mark} and \p{Symbol} code points.
  • A programming-word code point is \w, or [\p{Alphabetic}\p{Digit}\p{Mark}\p{Connector_Punctuation}].

An alphanumeric code point by the strictest definition is consequently and necessarily [\p{Alphabetic}\p{Number}], typically abbreviated [\p{alpha}\pN].

Comments

3

Assuming the identifier must begin with an alpha character, and then may contain any number of alpha or numeric, I would do this:

my $string = 'roger54a'; print "Match\n" if $string =~ m/\A\p{alpha}[\p{alpha}\p{Number}]*\z/; 

That anchors to the start and end of the string, precluding any characters that don't match the specific set of a single alpha followed by any quantity of alpha and numerics.

Update: I see tchrist just gave a great explanation of the Unicode properties. This answer provides the context of a full regexp.

If you wanted the leading 'alphas' to be two or three digits followed by alpha-numeric, just add the appropriate quantifier:

$string =~ m/\A\p{alpha}{2,3}[\p{alpha}\p{Number}]*\z/

Update2: I see a stronger definition of what you're looking for in a comment to one of the answers here. Here's my take on it after seeing your clarification:

m/\Aroger[\p{alpha}\p{Number}]{2,3}\z/

Comments

2

Your proposed solution:

(/^roger\w{2,3}[0-9a-z]/i) 

Means:

\w{2,3} -- 2 or 3 alphanumeric, including the _

[0-9a-z] (with the /i) -- a single character that is alphanumeric, not including the _

I didn't see any mention of the acceptable 3 alphanumerics at the beginning. Does that belong?

Both "roger54" and "roger4a" should fail this because the above regex requires at least three characters following "roger." Likewise, "roger_a" would succeed because "_" passes \w{2,3} (specifically \w{3}).

Your request sounded like you wanted more of one of these:

/^roger[0-9a-z]+/i /^roger[0-9a-z]*/i 

that is, "roger" (case insensitive) followed by one or more (+) or zero or more (*) letters and/or numbers.

3 Comments

you nail down my question,\w{2,3} I assume 2 or 3 alpha-numeric will be ok after roger. your solution is ok but I want allowed only 2 or 3 alpha numeric not more not less. how I can limit that. as I said my code work just fine but it's allowed (_) so I don't want that be allowed.
ahh..I got you ../^roger[0-9a-z]{2,3}/i work. thank you so very much!!
I missed the part about only 2 or 3 alphanumerics following "roger." I'm glad you got it. By the way, I was assuming you were doing Perl. I'm no longer sure, but it sure looks like Perl.
0

I was trying to find a solution to this also and this solution did not work for me in C# when trying to do a regex replace. In case someone else is searching:

c# Regex.Replace [^\w ] that also removes underscores?

This is what I use in C#:

cleaned_string = Regex.Replace(input_string, @"[_]+|[^\w]+]", "");

If you want to keep spaces:

cleaned_string = Regex.Replace(input_string, @"[_]+|[^\w\s]+", "");

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.