1

In a very long text, I'd like to search for strings with the following properties:

  • Starts with (let's say) #.
  • Ends with (let's say) @.
  • Does not contain # or @.
  • Does not contain the substring (let's say) blah.

I'm aware that this is a "negative lookahead" problem and that Emacs regexp cannot exclude a whole string like blah. Some answers on Stackexchange (eg this and various answers to this) say that one could use elisp to match and then complement, but they don't say how.

Any ideas on how I could accomplish the regexp search above?

1 Answer 1

3

Does not contain # or @.

[^#@]

Does not contain the substring (let's say) blah.

Note that you can do this without zero-width look-aheads; it's just cumbersome.

\([^b]\|b[^l]\|bl[^a]\|bla[^h]\) 

Some answers on Stackexchange (eg this and various answers to this) say that one could use elisp to match and then complement, but they don't say how.

A typical search loop would be:

(while (re-search-forward "#[^#@]*@" nil t) (unless (string-match-p "blah" (match-string 0)) (message "%s" (match-string 0)))) 
2
  • 3
    In different cases you might need do a bit of manual backtracking (e.g. repositioning point just after the match-beginning if you detected the unwanted value within the matched string). In this example, though, neither of the match delimiters are permitted to appear elsewhere within the match, and that means it's not possible to have valid sub-matches inside or overlapping a "blah"-containing match, and therefore you don't need to backtrack (and also don't need to specify non-greedy matching, which might also be useful in other scenarios). Commented 8 hours ago
  • Thank you! I understand the logic of that regexp, very interesting. Extremely useful to see the elisp code! Commented 7 hours ago

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.