3

Thanks for taking a look.

My goal is to come up with a regexp that will match input that contains no digits, whitespace or the symbols !@£$%^&*()+= or any other symbol I may choose.

I am however struggling to grasp precisely how regular expressions work.

I started out with the simple pattern /\D/, which from my understanding will match the first non-digit character it can find. This would match the string 'James' which is correct but also 'James1' which I don't want.

So, my understanding is that if I want to ensure that a pattern is not found anywhere in a given string, I use the ^ and $ characters, as in /^\D$/. Now because this will only match a single character that is not a digit, I needed to use + to specify that 1 or more digits should not be founds in the entire string, giving me the expression /^\D+$/. Brilliant, it no longer matches 'James1'.

Question 1

Is my reasoning up to this point correct?

The next requirement was to ensure no whitespace is in the given string. \s will match a single whitespace and [^\s] will match the first non-whitespace character. So, from my understanding I just had to add this to what I have already to match strings that contain no digits and no whitespace. Again, because [^\s] will only match a single non-white space character, I used + to match one or more whitespace characters, giving the new regexp of /^\D+[^\s]+$/.

This is where I got lost, as the expression now matches 'James1' or even 'James Smith25'. What? Massively confused at this point.

Question 2

Why is /^\D+[^\s]+$/ matching strings that contain spaces?

Question 3

How would I go about writing the regular expression I'm trying to solve?

While I am keen to solve the problem I am more interested in figuring where my understanding of regular expressions is lacking, so any explanations would be helpful.

3 Answers 3

4
  1. Not quite; ^ and $ are actually "anchors" - they mean "start" and "end", it's actually a little more complicated, but you can consider them to mean the start and end of a line for now - look up the various modifiers on regular expressions if you're interested in learning more about this. Unfortunately ^ has an overloaded meaning; if used inside square brackets it means "not", which is the meaning you are already acquainted with. It's very important that you understand the difference between these two meanings and that the definition in your head actually applies only to character range matching!

    Contributing further to your confusion is that \d means "a numerical digit" and \D means "not a numerical digit". Similarly \s means "a whitespace (space/tab/newline/etc.) character" and \S means "not a whitespace character."

    It's worth noting that \d is effectively a shortcut for [0-9] (note that - has a special meaning inside square brackets), and \D is a shortcut for [^0-9].

  2. The reason it's matching strings that contain spaces is that you've asked for "1+ non-numerical digits followed by 1+ non-space characters" - so it'll match lots of strings! I think that perhaps you don't understand that regular expressions match bits of strings, you're not adding constraints as you go, but rather building up bots of matchers that will match bits of corresponding strings.

  3. /^[^\d\s!@£$%^&*()+=]+$/ is the answer you're looking for - I'd look at it like this:

    i. [] - match a range of characters

    ii. []+ - match one or more of that range of characters

    iii. [^\d\s]+ - match one or more characters that do not match \d (numerical digit) or \s (whitespace)

    iv. [^\d\s!@£$%^&*()+=]+ - here's a bunch of other characters I don't want you to match

    v. ^[^\d\s!@£$%^&*()+=]+$ - now there are anchors applied, so this matcher has to apply to the whole line otherwise it fails to match

A useful website to explore regexs is http://regexr.com/3b9h7 - which I supply with my suggested solution as an example. Edit: Pruthvi Raj's link to debuggerx is awesome!

Sign up to request clarification or add additional context in comments.

Comments

2

Is my reasoning up to this point correct?

Almost. /\D/ matches any character other than a digit, but not just the first one (if you use g option).

and [^\s] will match the first non-whitespace character

Almost, [^\s] will match any non-whitespace character, not just the first one (if you use g option).

/^\D+[^\s]+$/ matching strings that contain spaces?

Yes, it does, because \D matches a space (space is not a digit).

Why is /^\D+[^\s]+$/ matching strings that contain spaces?

Because \D+ in /^\D+[^\s]+$/can match spaces.

Conclusion:

Use

^[^\d\s!@£$%^&*()+=]+$ 

It will match strings that have no digits and spaces, and the symbols you do not allow.

Mind that to match a literal -, ] or [ with a character class, you either need to escape them, or use at the start or end of the expression. To play it safe, escape them.

1 Comment

Looks like I overlooked the !@£$%^&*()+= requirement, just added them to the final solution. I also hope my final note will be of value for you.
2

Just insert every character you don't want to include in a negated character class as follows:

^[^\s\d!@£$%^&*()+=]*$ 

DEMO

Regular expression visualization

Debuggex Demo

^ - start of the string [^...] - matches one character that is not in `...` \s - matches a whitespace (space, newline,tab) \d - matches a digit from 0 to 9 * - a quantifier that repeats immediately preceeding element by 0 or more times 

so the regex matches any string that has

1. string that has a beginning 2. containing 0 or more number of characters that is not whitesapce, digit, and all the symbols included in the character class ( In this example !@£$%^&*()+=) i.e., characters that are not included in the character class `[...]` 3.that has ending 

NOTE:

If the symbols you don't want it to have also includes - , a hyphen, don't put it in between some other characters because it is a metacharacter in character class, put it at last of character class

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.