6

I need a regex to match tags that looks like <A>, <BB>, <CCC>, but not <ABC>, <aaa>, <>. so the tag must consist of the same uppercase letter, repeated. I've tried <[A-Z]+>, but that doesn't work. of course I can write something like <(A+|B+|C+|...)> and so on, but I wonder if there's a more elegant solution.

1 Answer 1

8

You can use something like this (see this on rubular.com):

<([A-Z])\1*> 

This uses capturing group and backreference. Basically:

  • You use (pattern) to "capture" a match
  • You can then use \n in your pattern, where n is the group number, to "refer back" to what that group matched

So in this case:

  • Group 1 captures ([A-Z]), an uppercase letter immediately following <
  • Then we see if we can match \1*, i.e. zero or more of that same letter

References

Sign up to request clarification or add additional context in comments.

1 Comment

@John: indeed, why not start with "what is regular expression".

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.