6

So I need to match an ipv6 address which may or may not have a mask. Unfortunately I can't just use a library to parse the string.

The mask bit is easy enough, in this case:

(?:\/\d{1,3})?$/ 

The hard part is the different formats of an ipv6 address. It needs to match ::beef, beef::, beef::beef, etc.

An update: I'm almost there..

/^(\:\:([a-f0-9]{1,4}\:){0,6}?[a-f0-9]{0,4}|[a-f0-9]{1,4}(\:[a-f0-9]{1,4}){0,6}?\:\:|[a-f0-9]{1,4}(\:[a-f0-9]{1,4}){1,6}?\:\:([a-f0-9]{1,4}\:){1,6}?[a-f0-9]{1,4})(\/\d{1,3})?$/i 

I am, in this case restricted to using perl's regex.

9
  • 2
    The language I'm using is perl. The changes needed to allow the use of other libraries would be more work. Define homework? I'm at home and I'm working. It's for one of the projects I'm working on - if you mean for educational purposes, then no. Commented Nov 26, 2009 at 13:14
  • 6
    "The changes needed to allow the use of other libraries would be more work." -- I doubt it. At least not in the long run. Generally speaking, Perl without using modules from CPAN is only half the language. Refusing to even use simple pure Perl modules and then asking others for solutions to previously solved problems seems... inefficient. Commented Nov 26, 2009 at 13:41
  • 3
    What changes? It's either use SomeModule or you got to CPAN, download the module and do a copy and paste. Commented Nov 26, 2009 at 13:47
  • 5
    Indeed. It's just that some ways are a lot better than others. Commented Nov 26, 2009 at 15:11
  • 5
    And ideally those ways should not include asking others to redo work that has already been done. If you want to find another way of doing it, go nuts, but you're asking us to get involved now too. Commented Nov 27, 2009 at 0:41

9 Answers 9

13

This contains a patch to Regexp::Common demonstrating a complete, accurate, tested IPv6 regex. Its a straight translation of the IPv6 grammar. Regexp::IPv6 is also accurate.

More importantly, it contains a test suite. Running it with your regex shows you're still a ways off. 10 out of 19 missed. 1 out of 12 false positives. IPv6 contains a lot of special shorthands making it very easy to get subtly wrong.

Best place to read up on what goes into an IPv6 address is RFC 3986 section 3.2.2.

Sign up to request clarification or add additional context in comments.

2 Comments

Anyone know if there's a Python version of this?
@jcollie Those regexes aren't using any funny Perl features. They should be a rote translation to Python.
10

What do you mean you can't just use a library? How about a module? Regexp::IPv6 will give you what you need.

Comments

5

I'm not an IPv6 expert, but please trust me when I tell you that matching (let alone validating) IPv6 addresses is not easy with a very simple regex such as the one you suggest. There's many shorthands and various conventions for combining the address with a port, just to name an example. One such shorthand is that you can write 0:0:0:0:0:0:0:1 as ::1, but there's more. If you read German, I would suggest looking at the slides of Steffen Ullrich's talk at the 11th German Perl Workshop.

You say you can't use a library, but if you're going to reinvent the whole complexity of the library, then you could as well just import it verbatim into your project.

Comments

2

This mostly works...

^([0-9a-fA-F]{0,4}|0)(\:([0-9a-fA-F]{0,4}|0)){7}$ 

Cons: :: like cases not handled correctly

Comments

2

Try:

/^(((?=(?>.*?(::))(?!.+\3)))\3?|([\dA-F]{1,4}(\3|:(?!$)|$)|\2))(?4){5}((?4){2}|((2[0-4]|1\d|[1-9])?\d|25[0-5])(\.(?7)){3})\z/ai 

From: http://home.deds.nl/~aeron/regex/

Comments

1

Try this:

^([0-9a-fA-F]{4}|0)(\:([0-9a-fA-F]{4}|0)){7}$ 

From Regular Expression Library: IPv6 address

You should also read this: A Regular Expression for IPv6 Addresses

1 Comment

This fails to match 2001:db8:85a3:0:0:8a2e:370:7334 2001:db8:85a3::8a2e:370:7334 2001:0db8:0000:0000:0000::1428:57ab ::ffff:c000:280 and bunches more.
1

If you need in perl check if a string is an IPv6 address you can try this:

if (/(([\da-f]{0,4}:{0,2}){1,8})/i) { print("$1") }; 

Comments

1

here is the one worked for all the examples of IPv6 I've managed to find:

/^\s*((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?\s*$/ 

make sure it's the one line before using. it's been found here:

https://community.helpsystems.com/forums/intermapper/miscellaneous-topics/5acc4fcf-fa83-e511-80cf-0050568460e4

verified on all examples from the question page, the community page and the wikipedia site from here:

https://en.wikipedia.org/wiki/IPv6

the tool for the verification being used the one from here:

https://regex101.com/

1 Comment

worked for me in a log file. even gave me the following lines command channel listening on ::1#xxx // DNS Socket created at [::], FD 3 // Accepting NAT intercepted HTTP Socket connections at local=[::]:xxxx remote=[::] FD 4 flags=33 // Accepting HTTP Socket connections at local=[::]:xxxx remote=[::] FD 5 flags=8 // listening on ::1 // listening on xxxx::xxxx:xxxx:xxxx:xxxx%eth0 // listening on xxxx::xxxx:xxxx:xxxx:xxxx%eth1 // session opened for user admin@::ffff:xxx.xxx.xxx.xxx
0

This is a comprehensive IPv6 regular expression that tests all the valid IPv6 text notations (expanded, compressed, expanded-mixed, compressed-mixed) with an optional prefix length. It will also capture the various parts into capture groups. You can skip the capture groups by putting a ?: right after the opening paren for a capture group.

This is the regular expression I created and use in my IPvX IP calculator for both IPv4 and IPv6.

^# Anchor (# BEGIN Compressed-mixed *** Group 1 *** (# BEGIN Hexadecimal Notation *** Group 2 *** (?: (?:[0-9A-F]{1,4}:){5}[0-9A-F]{1,4} # No :: | (?:[0-9A-F]{1,4}:){4}:[0-9A-F]{1,4} # 4::1 | (?:[0-9A-F]{1,4}:){3}(?::[0-9A-F]{1,4}){1,2} # 3::2 | (?:[0-9A-F]{1,4}:){2}(?::[0-9A-F]{1,4}){1,3} # 2::3 | [0-9A-F]{1,4}:(?::[0-9A-F]{1,4}){1,4} # 1::4 | (?:[0-9A-F]{1,4}:){1,5} # :: End | :(?::[0-9A-F]{1,4}){1,5} # :: Start | : # :: Only ): )# END Hexadecimal Notation (# BEGIN Dotted-decimal Notation *** Group 3 *** (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\. # 0 to 255. *** Group 4 *** (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\. # 0 to 255. *** Group 5 *** (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\. # 0 to 255. *** Group 6 *** (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9]) # 0 to 255 *** Group 7 *** )# END Dotted-decimal Notation )# END Compressed-mixed | (# BEGIN Compressed *** Group 8 *** (?:# BEGIN Hexadecimal Notation (?:[0-9A-F]{1,4}:){7}[0-9A-F]{1,4} # No :: | (?:[0-9A-F]{1,4}:){6}:[0-9A-F]{1,4} # 6::1 | (?:[0-9A-F]{1,4}:){5}(?::[0-9A-F]{1,4}){1,2} # 5::2 | (?:[0-9A-F]{1,4}:){4}(?::[0-9A-F]{1,4}){1,3} # 4::3 | (?:[0-9A-F]{1,4}:){3}(?::[0-9A-F]{1,4}){1,4} # 3::4 | (?:[0-9A-F]{1,4}:){2}(?::[0-9A-F]{1,4}){1,5} # 2::5 | [0-9A-F]{1,4}:(?::[0-9A-F]{1,4}){1,6} # 1::6 | (?:[0-9A-F]{1,4}:){1,7}: # :: End | :(?::[0-9A-F]{1,4}){1,7} # :: Start | :: # :: Only ) # END Hexadecimal Notation )# END Compressed (?:# BEGIN Optional Length /(12[0-8]|1[0-1][0-9]|[1-9]?[0-9]) # /0 to /128 *** Group 9 *** )? # END Optional Length $# Anchor 

Bonus IPv4 regular expression:

^# Anchor (?:# BEGIN Dotted-decimal Notation (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\. # 0 to 255. *** Group 1 *** (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\. # 0 to 255. *** Group 2 *** (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\. # 0 to 255. *** Group 3 *** (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9]) # 0 to 255 *** Group 4 *** ) # END Dotted-decimal Notation (?:# BEGIN Optional Length /(3[0-2]|[1-2]?[0-9]) # /0 to /32 *** Group 5 *** )? # END Optional Length $# Anchor 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.