2

I'm looking for a regular expression for grep that filters out IPv4 and IPv6 addresses from an arbitrary file containing them. I'd like it to behave like this one for IPv4 addresses:

grep -E -o "(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)" 

I'm aware there are several similar questions with answers here but most focus on just IPv4 addresses and the best answer I've found does not work for me. The expression does not output any IP address for me when using grep.

As this question is apparently ambiguous, I'm looking for a combined regex. One that will output any valid IP address. As a bonus, even multiple ones on a single line.

If for some reason this is not easy to do with grep, I'm open to alternatives, provided they are simple, work on a BSD system and do not require GNU tools.

10
  • This isn't grep, but may be helpful. codereview.stackexchange.com/questions/192726/… Commented Feb 8, 2020 at 23:43
  • For IPv4 adresses: How to check if any IP address is present in a file using shell scripting? Commented Feb 8, 2020 at 23:44
  • The whole point of my question is a quest for a combined expression. What I wonder most is that there's not a simple tool out there to archive this increasingly common task. Commented Feb 9, 2020 at 12:02
  • 1
    @herrbischoff, if you're looking for a combined expression, you should (have) mention(ed) that in your post. Also, you don't seem to be telling why the solution you linked to "does not work for you". Also, there is some leeway in what is considered a valid IP address, both for IPv4 and for IPv6 (e.g. is 8.8.2056 a valid IPv4 address? Are leading zeroes allowed or should they be normalized away? Must or must not :: be used in IPv6?). If you want a validating expression, you need to specify what counts a valid. Commented Feb 9, 2020 at 12:22
  • 1
    This is a common task but not a simple one. Requiring that it be done with a regular expression makes it quite complex. Programs generally call inet_pton() or equivalent. Commented Feb 9, 2020 at 12:47

4 Answers 4

3

Alternative non-grep, perl based approach using the Regexp::Common package (Available as a FreeBSD port under the name p5-Regexp-Common):

perl -MRegexp::Common=net -nE 'say $& while /$RE{net}{IPv4}|$RE{net}{IPv6}/g' input.txt 

Example:

$ cat input.txt some words a line with 127.0.0.1 and 192.168.1.1 in it. more words some line with ::1 in it. $ perl -MRegexp::Common=net -nE 'say $& while /$RE{net}{IPv4}|$RE{net}{IPv6}/g' input.txt 127.0.0.1 192.168.1.1 ::1 
9
  • The RE for matching a IPv6 address is 3129 characters long, btw. Commented Feb 9, 2020 at 11:48
  • "3129 characters..." -- which is kinda what I meant with that comment about an unwieldy regex, though this one seems to have a bunch of unnecessary groups, and could be shortened to just 2200 or so characters ;) Commented Feb 9, 2020 at 11:58
  • @ilkkachu It's not unwieldy when you don't have to include it in a script yourself thanks to someone packaging it up for easy use. Commented Feb 9, 2020 at 12:00
  • This only outputs the first matching address. I am looking to output all matching addresses, as stated in the question. Commented Feb 9, 2020 at 12:07
  • 1
    @herrbischoff See update for one that works with multiple addresses on one line. Commented Feb 9, 2020 at 12:11
2

Since your Operating System (FreeBSD) comes with a compiler and a lexer by default (just like any Unix system should), better use them to write a little program, rather than some ass-fugly regexes that nobody will ever be able to understand.

$ cat > ipv46.l <<'EOT' %{ #include <sys/socket.h> #include <netinet/in.h> #include <arpa/inet.h> %} W [0-9A-Za-z_]+ I4 ([0-9]+[.]){3}[0-9]+ I6 ([0-9a-fA-F]|::)[0-9a-fA-F:]*{I4}? %% {I6}|{I4} { struct in6_addr a6; struct in_addr a; char b[INET6_ADDRSTRLEN]; if(inet_pton(AF_INET6, yytext, &a6)) printf("%s\n", inet_ntop(AF_INET6, &a6, b, sizeof b)); else if(inet_pton(AF_INET, yytext, &a)) printf("%s\n", inet_ntop(AF_INET, &a, b, sizeof b)); } {W}|.|\n ; EOT $ lex ipv46.l && cc lex.yy.c -o ipv46 -ll $ ./ipv46 <file $ ./ipv46 ::0:0:1 1:::1 :: ::1 :: ::FFFF:127.0.0.1:80 ::ffff:127.0.0.1 ... 

This is rather strict; it will not pull the address 127.0.0.1 from foo127.0.0.1.12 or foo:127.0.0.1bar. But it will be able to pull it from tcpdump's address.port form or from the usual ipv4:port, and it will be able to handle "mixed" ipv4/ipv6 addresses.

4
  • Thanks for this solution. It works quite well, although it is very strict, as you say. I just tested it on a tcpdump excerpt and it couldn't make out any IP address in a line containing block in on vtnet0: 213.109.234.47.19888 > 213.109.163.124.23. Is there possibly a modification that is able to read that format? Commented Feb 22, 2020 at 11:45
  • I've updated to a version which should handle the host.port thing (Which aren't actually valid ipv4 or ipv6 addresses). But this is less robust than simply tokenizing words of [a-zA-Z0-9_:.]+ and applying inet_pton to them. Commented Feb 23, 2020 at 11:20
  • Anyways, this was one of the xyest xy-questions ;-) You should've started with telling that's a tcpdump output, not an arbitrary file. It's unlikely that those addresses need any validation, something simple like awk '/ > /{print$10; print $12}' file | sed 's/\.[^.]*$//' could've done. Commented Feb 23, 2020 at 11:36
  • I’ve not led with that because the tcpdump file is just an extreme example. I was looking for a general solution, not super-specific to one use case. If it were just those logs, writing even a regex would be quite straight-forward. Commented Feb 23, 2020 at 11:39
0

This should extract the IPv4 and IPv6 addresses:

grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}|([0-9a-fA-F]{0,4}:){1,7}[0-9a-fA-F]{0,4}' 

However, it does not check that the IPv6 addresses are truly valid as they could contain more than 1 ::.

1
  • This regex also accepts invalid IPv4 addresses, such as when one or more of the octets are greater than 255. It is also very unusual to do this but try ping 2130706433 . Commented Jul 18, 2023 at 14:16
-2

This should do the error checking for IPv4 and is more compact

grep -Eo '([0-255]\.){3}[0-255]|([0-9a-fA-F]{0,4}:){1,7}[0-9a-fA-F]{0,4}' 
1
  • 3
    As far as I know, [0-255] will not accept any number between 0 and 255, but will accept digits 0, 1, 2 and 5. Commented Jul 19, 2023 at 19:34

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.