Why doesn't grep -E work as I expect for negative whitespace? i.e. [^\s]+
I wrote a regex to parse my .ssh/config
grep -Ei '^host\s+[^*\s]+\s*$' ~/.ssh/config
# cat ~/.ssh/config Host opengrok-01-Eight Hostname opengrok-01.company.com Host opengrok-02-SIX Hostname opengrok-02.company.com Host opengrok-03-forMe Hostname opengrok-03.company.com Host opengrok-04-ForSam Hostname opengrok-04.company.com Host opengrok-05-Okay Hostname opengrok-05.company.com Host opengrok-05-Okay opengrok-03-forMe IdentityFile /path/to/file Host opengrok-* User root What I got was
Host opengrok-01-Eight Host opengrok-03-forMe Host opengrok-05-Okay Host opengrok-05-Okay opengrok-03-forMe Where are SIX and Sam!
It took me some time to realise that [^\s*]+ i.e. Match anything that isn't white space or *, 1 or more times was actually match anything that isn't \, s or *, 1 or more times!
The fix is surprisingly easy because that regex works on rex101.com (which uses perl) i.e. switch -E for -P
# grep -Pi '^host\s+[^*\s]+\s*$' ~/.ssh/config Host opengrok-01-Eight Host opengrok-02-SIX Host opengrok-03-forMe Host opengrok-04-ForSam Host opengrok-05-Okay What scares me is I have been using grep -E for years in lots of scripts and not spotted that before. Maybe I've just got lucky but more likely my test cases have missed that edge case!
Questions:
- Other than changing to use
grep -Pfor all my extended regex how should I be writing mygrep -Efor this case? - Are there any other nasty gotchas that I have been missing with
-Eor that will bite me if I use-P?
grep (GNU grep) 3.1 Copyright (C) 2017 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>. Running on Windows 10, WSL running Ubuntu 18.04 (bash) ... but I got the same from a proper Linux install