I have a file mac-hosts containing MAC addresses and their associated host names:
e4:5f:01:21:79:01 PF3 e4:5f:01:21:79:03 PF3-BR0 e4:5f:01:21:79:be PF2 e4:5f:01:21:79:c0 PF2-BR0 I want to get a count of the number of lines with properly formatted MAC addresses and host names, and I'm using this expression:
FILTERED=$(cat mac-hosts | grep -P -c '/^[a-f0-9]{2}([:-])([a-f0-9]{2}\1){4}[a-f0-9]{2} [a-z0-9]*([-][a-z0-9]*)?$/i') In every version of this expression, I get FILTERED = 0 as a result.
I verified on https://regex101.com/ that every line of the mac-hosts file properly matches the filter expression without errors or warnings in every flavor offered except GoLang and Rust where the back reference has no meaning. I have also studied the man page for grep and cannot find a reason why my filter does not work.
Without the -P I get grep: Invalid back reference so I know Perl compatible regular expression syntax is being used.
I first found this failure occurring on a Raspberry Pi 4B running the latest version of their Linux flavor.
pi@PF2:~ $ uname -a Linux PF2 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr 3 17:24:16 BST 2023 aarch64 GNU/Linux pi@PF2:~ $ grep -V grep (GNU grep) 3.6 Copyright (C) 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Written by Mike Haertel and others; see <https://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>. I have since observed the same behavior with git-bash running under Windows 10.
How can I debug this problem and get the expected result where FILTERED = 4 is the outcome?
UPDATE
Thanks for the responses, it was obvious when I saw the answer: I had been thinking of environments where delimiting slashes are required, not part of the match string, and i is the "ignore case" flag. For command line grep, no delimiters are used, and "ignore case" is set by the -i switch:
FILTERED=$(grep -Pic '^[a-f0-9]{2}([:-])([a-f0-9]{2}\1){4}[a-f0-9]{2} [a-z0-9]*([-][a-z0-9]*)?$' mac-hosts) UPDATE 2
I still had problems with the hostnames that did not have a second part (hyphen and more almum). It turned out to be there is whitespace at the end of those names which (not surprisingly) I didn't see on the screen. I added another component to the match string to find any trailing whitespace. The final test now works correctly:
FILTERED=$(grep -Pic '^[a-f0-9]{2}([:-])([a-f0-9]{2}\1){4}[a-f0-9]{2} [a-z0-9]*([-][a-z0-9]*)?[[:space:]]$' mac-hosts) There was a suggested edit I rolled back wherein the author removed the test for the line ending. However, that would not filter out invalid lines that would be allowed by, for example, punctuation after the hostname which is not allowed in this format.
cat mac-hosts | grep -P -c '^([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2}) [0-9A-Za-z].*$'/^and end with$/i, which it obviously does not.