I'm looking for a regular expression to grep whole words, including words separated by digits or underscore. \\b considers digits and underscore as parts of words, not as boundaries.
For example, I'd like to catch MOUSE in "DOG MOUSE CAT", in "DOG MOUSE:CAT" but also in "DOG_MOUSE9CAT" and at the end or the beginning of an expression, as in "MOUSE9CAT" and "DOG_MOUSE". Basically, the boundary I'm looking for is any non-uppercase-alpha character plus beginning and end of line/expression (maybe missing some other cases caught by \\b here).
I've tried:
"[[0-9_]\\b]MOUSE[[0-9_]\\b]" "[[0-9_]|\\b]MOUSE[[0-9_]|\\b]" "[$|[^A-Z]]MOUSE[^|[^A-Z]]" "[?<=^|[^A-Z]]MOUSE[?=$|[^A-Z]]" None of them work.
I'm actually looking for several words (based on a long vector of values), so the final result should look something like
grep(paste("\\b", paste(searchwords, collapse = "\\b|\\b"), "\\b"), targettext) (with a different delimiter because \\b is too restrictive for me).
(This is a similar question to the one asked by user Nick Sabbe in a comment here: Using grep in R to find strings as whole words (but not strings as part of words))