exclude underscore from alpha numeric regex

Question

I want use \w regex for to allow alpha numeric but I don't want underscore _ to be part of it. Since _ is included in \w. So I have coded like this but doesn't work, what is my mistake?

(/^roger\w{2,3}[0-9a-z]/i)

I am expecting any character other than A-Z or 1-2 to be exclude

ex - roger3_2 or roger46_ or roger2_

but

roger54 or roger4a or roger455 or rogerAAA

are to be ok

How doesn't it work? Please give more detail.

Bojangles
– Bojangles

2012-03-28 15:05:48 +00:00
Commented Mar 28, 2012 at 15:05 — Bojangles
– Bojangles, Commented Mar 28, 2012 at 15:05
You should probably add input and expected output...

Stefan
– Stefan

2012-03-28 15:11:39 +00:00
Commented Mar 28, 2012 at 15:11 — Stefan
– Stefan, Commented Mar 28, 2012 at 15:11

Bogdan Emil Mariesan · Accepted Answer · 2012-03-28 15:06:39Z

45

You could try something like:

[^_\W]+

answered Mar 28, 2012 at 15:06

Bogdan Emil Mariesan

5,6672 gold badges35 silver badges59 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

tchrist · Accepted Answer · 2012-03-28 15:23:21Z

A numeric code point is \pN or \p{Number}.
A digit code point is \d, \p{digit}, \p{Nd}, \p{Decimal_Number}, or \p{Numeric_Type=Decimal}.
An alphabetic code point is \p{alpha} or \p{Alphabetic}. It includes all \p{Digit}, \p{Letter}, and \p{Letter_Number} code points, as well as certain \p{Mark} and \p{Symbol} code points.
A programming-word code point is \w, or [\p{Alphabetic}\p{Digit}\p{Mark}\p{Connector_Punctuation}].

An alphanumeric code point by the strictest definition is consequently and necessarily [\p{Alphabetic}\p{Number}], typically abbreviated [\p{alpha}\pN].

DavidO · Accepted Answer · 2012-03-28 15:59:34Z

Assuming the identifier must begin with an alpha character, and then may contain any number of alpha or numeric, I would do this:

my $string = 'roger54a'; print "Match\n" if $string =~ m/\A\p{alpha}[\p{alpha}\p{Number}]*\z/;

That anchors to the start and end of the string, precluding any characters that don't match the specific set of a single alpha followed by any quantity of alpha and numerics.

Update: I see tchrist just gave a great explanation of the Unicode properties. This answer provides the context of a full regexp.

If you wanted the leading 'alphas' to be two or three digits followed by alpha-numeric, just add the appropriate quantifier:

$string =~ m/\A\p{alpha}{2,3}[\p{alpha}\p{Number}]*\z/

Update2: I see a stronger definition of what you're looking for in a comment to one of the answers here. Here's my take on it after seeing your clarification:

m/\Aroger[\p{alpha}\p{Number}]{2,3}\z/

Hambone · Accepted Answer · 2012-03-28 15:29:22Z

Your proposed solution:

(/^roger\w{2,3}[0-9a-z]/i)

Means:

\w{2,3} -- 2 or 3 alphanumeric, including the _

[0-9a-z] (with the /i) -- a single character that is alphanumeric, not including the _

I didn't see any mention of the acceptable 3 alphanumerics at the beginning. Does that belong?

Both "roger54" and "roger4a" should fail this because the above regex requires at least three characters following "roger." Likewise, "roger_a" would succeed because "_" passes \w{2,3} (specifically \w{3}).

Your request sounded like you wanted more of one of these:

/^roger[0-9a-z]+/i /^roger[0-9a-z]*/i

that is, "roger" (case insensitive) followed by one or more (+) or zero or more (*) letters and/or numbers.

you nail down my question,\w{2,3} I assume 2 or 3 alpha-numeric will be ok after roger. your solution is ok but I want allowed only 2 or 3 alpha numeric not more not less. how I can limit that. as I said my code work just fine but it's allowed (_) so I don't want that be allowed.
ahh..I got you ../^roger[0-9a-z]{2,3}/i work. thank you so very much!!
I missed the part about only 2 or 3 alphanumerics following "roger." I'm glad you got it. By the way, I was assuming you were doing Perl. I'm no longer sure, but it sure looks like Perl.

David Bentley · Accepted Answer · 2017-10-05 20:41:09Z

I was trying to find a solution to this also and this solution did not work for me in C# when trying to do a regex replace. In case someone else is searching:

c# Regex.Replace [^\w ] that also removes underscores?

This is what I use in C#:

cleaned_string = Regex.Replace(input_string, @"[_]+|[^\w]+]", "");

If you want to keep spaces:

cleaned_string = Regex.Replace(input_string, @"[_]+|[^\w\s]+", "");

Collectives™ on Stack Overflow

exclude underscore from alpha numeric regex

5 Answers 5

Comments

Comments

Comments

3 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

Comments

3 Comments

Comments

Linked

Related