11
\$\begingroup\$

Write a function or program to validate an e-mail address against RFC 5321 (some grammar rules found in 5322) with the relaxation that you can ignore comments and folding whitespace (CFWS) and generalised address literals. This gives the grammar

Mailbox = Local-part "@" ( Domain / address-literal ) Local-part = Dot-string / Quoted-string Dot-string = Atom *("." Atom) Atom = 1*atext atext = ALPHA / DIGIT / ; Printable US-ASCII "!" / "#" / ; characters not including "$" / "%" / ; specials. Used for atoms. "&" / "'" / "*" / "+" / "-" / "/" / "=" / "?" / "^" / "_" / "`" / "{" / "|" / "}" / "~" Quoted-string = DQUOTE *QcontentSMTP DQUOTE QcontentSMTP = qtextSMTP / quoted-pairSMTP qtextSMTP = %d32-33 / %d35-91 / %d93-126 quoted-pairSMTP = %d92 %d32-126 Domain = sub-domain *("." sub-domain) sub-domain = Let-dig [Ldh-str] Let-dig = ALPHA / DIGIT Ldh-str = *( ALPHA / DIGIT / "-" ) Let-dig address-literal = "[" ( IPv4-address-literal / IPv6-address-literal ) "]" IPv4-address-literal = Snum 3("." Snum) IPv6-address-literal = "IPv6:" IPv6-addr Snum = 1*3DIGIT ; representing a decimal integer value in the range 0 through 255 

Note: I've skipped the definition of IPv6-addr because this particular RFC gets it wrong and disallows e.g. ::1. The correct spec is in RFC 2373.

Restrictions

You may not use any existing e-mail validation library calls. However, you may use existing network libraries to check IP addresses.

If you write a function/method/operator/equivalent it should take a string and return a boolean or truthy/falsy value, as appropriate for your language. If you write a program it should take a single line from stdin and indicate valid or invalid via the exit code.

Test cases

The following test cases are listed in blocks for compactness. The first block are cases which should pass:

[email protected] [email protected] [email protected] [email protected] [email protected] [email protected] email@[123.123.123.123] "email"@domain.com [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] ""@domain.com "e"@domain.com "\@"@domain.com email@domain "Abc\@def"@example.com "Fred Bloggs"@example.com "Joe\\Blow"@example.com "Abc@def"@example.com customer/[email protected] [email protected] !def!xyz%[email protected] [email protected] _somename@[IPv6:::1] [email protected] [email protected] [email protected] 

The following test cases should not pass:

plainaddress #@%^%#$@#$@#.com @domain.com Joe Smith <[email protected]> email.domain.com email@[email protected] [email protected] [email protected] [email protected] [email protected] [email protected] (Joe Smith) [email protected] [email protected] email@[IPv6:127.0.0.1] email@[127.0.0] email@[.127.0.0.1] email@[127.0.0.1.] email@IPv6:::1] [email protected]] email@[256.123.123.123] 
\$\endgroup\$
5
  • \$\begingroup\$ since IPv6-addr has been left undefined, and there are test cases that have ipv6 addresses, is there a correct way to validate them? \$\endgroup\$ Commented Feb 23, 2013 at 4:28
  • \$\begingroup\$ Why should [email protected] and [email protected] fail? \$\endgroup\$ Commented Feb 23, 2013 at 8:04
  • 1
    \$\begingroup\$ @ardnew, I've added a link to the relevant RFC. I don't want to inline it because the question is already quite long. \$\endgroup\$ Commented Feb 23, 2013 at 9:16
  • \$\begingroup\$ @grc, good question. I've checked them, because no-one raised this during the several months that the question was in the sandbox, but I can't see why they should fail so I've moved them to the "Pass" side. \$\endgroup\$ Commented Feb 23, 2013 at 9:19
  • \$\begingroup\$ Are length limits required as well? 254 for entire email address/64 for local-part/63 for each domain label? \$\endgroup\$ Commented Mar 2, 2013 at 22:30

2 Answers 2

3
\$\begingroup\$

Python 3.3, 261

import re,ipaddress try:v,p=re.match(r'^(?!\.)(((^|\.)[\w!#-\'*+\-/=?^-~]+)+|"([ !#-[\]-~]|\\[ -~])*")@(((?!-)[a-zA-Z\d-]+(?<!-)($|\.))+|\[(IPv6:)?(.*)\])(?<!\.)$',input()).groups()[7:];exec("if p:ipaddress.IPv%dAddress(p)"%(v and 6or 4)) except:v=5 print(v!=5) 

Python 3.3 is needed for the ipaddress module, which is used to validate IPv4 and IPv6 addresses.

Less golfed version:

import re, ipaddress dot_string = r'(?!\.)((^|\.)[\w!#-\'*+\-/=?^-~]+)+' # negative lookahead to check that string doesn't start with . # each atom must start with a . or the beginning of the string quoted_string = r'"([ !#-[\]-~]|\\[ -~])*"' # - is used for character ranges (also in dot_string) domain = r'((?!-)[a-zA-Z\d-]+(?<!-)($|\.))+(?<!\.)' # negative lookahead/lookbehind to check each subdomain doesn't start/end with - # each domain must end with a . or the end of the string # negative lookbehind to check that string doesn't end with . address_literal = r'\[(IPv6:)?(.*)\]' # captures the is_IPv6 and ip_address groups final_regex = r'^(%s|%s)@(%s|%s)$' % (dot_string, quoted_string, domain, address_literal) try: is_IPv6, ip_address = re.match(final_regex, input(), re.VERBOSE).groups()[7:] # if input doesn't match, calling .groups() will throw an exception if ip_address: exec("ipaddress.IPv%dAddress(ip_address)" % (6 if is_IPv6 else 4)) # IPv4Address or IPv6Address will throw an exception if ip_address isn't valid except: is_IPv6 = 5 print(is_IPv6 != 5) # is_IPv6 is used as a flag to tell whether an exception was thrown 
\$\endgroup\$
3
  • 1
    \$\begingroup\$ very nice. i can't immediately find any duplicate patterns (to replace with a shorter variable identifier). but it looks like ALPHA in augmented BNF and the char literals constructing a Quoted-string are all case-insensitive. can you shave a few chars by specifying case-insensitivity and ditching one of those char class ranges? btw, if you're feeling frisky, can you give a short description of how you developed this? \$\endgroup\$ Commented Feb 24, 2013 at 6:55
  • \$\begingroup\$ @ardnew: Thanks. I've added a less golfed version with a few comments trying to explain some of the trickier parts. I developed the regex in four individual pieces (dot-string, quoted-string, domain and address-literal), then merged them together and added the ip validation. Needless to say, golfing it got really messy. \$\endgroup\$ Commented Feb 24, 2013 at 13:06
  • \$\begingroup\$ No length limits? \$\endgroup\$ Commented Mar 2, 2013 at 22:37
2
\$\begingroup\$

PHP 5.4.9, 495

function _($e){return preg_match('/^(?!(?>"?(?>\\\[ -~]|[^"])"?){255,})(?!"?(?>\\\[ -~]|[^"]){65,}"?@)(?>([!#-\'*+\/-9=?^-~-]+)(?>\.(?1))*|"(?>[ !#-\[\]-~]|\\\[ -~])*")@(?!.*[^.]{64,})(?>([a-z0-9](?>[a-z0-9-]*[a-z0-9])?)(?>\.(?2)){0,126}|\[(?:(?>IPv6:(?>([a-f0-9]{1,4})(?>:(?3)){7}|(?!(?:.*[a-f0-9][:\]]){8,})((?3)(?>:(?3)){0,6})?::(?4)?))|(?>(?>IPv6:(?>(?3)(?>:(?3)){5}:|(?!(?:.*[a-f0-9]:){6,})(?5)?::(?>((?3)(?>:(?3)){0,4}):)?))?(25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)(?>\.(?6)){3}))\])$/iD', $e);} 

And just for further interest, here's one for RFC 5322 grammar which allows for nested CFWS and obsolete local-parts:

(764)

function _($e){return preg_match('/^(?!(?>(?1)"?(?>\\\[ -~]|[^"])"?(?1)){255,})(?!(?>(?1)"?(?>\\\[ -~]|[^"])"?(?1)){65,}@)((?>(?>(?>((?>(?>(?>\x0D\x0A)?[\t ])+|(?>[\t ]*\x0D\x0A)?[\t ]+)?)(\((?>(?2)(?>[\x01-\x08\x0B\x0C\x0E-\'*-\[\]-\x7F]|\\\[\x00-\x7F]|(?3)))*(?2)\)))+(?2))|(?2))?)([!#-\'*+\/-9=?^-~-]+|"(?>(?2)(?>[\x01-\x08\x0B\x0C\x0E-!#-\[\]-\x7F]|\\\[\x00-\x7F]))*(?2)")(?>(?1)\.(?1)(?4))*(?1)@(?!(?1)[a-z\d-]{64,})(?1)(?>([a-z\d](?>[a-z\d-]*[a-z\d])?)(?>(?1)\.(?!(?1)[a-z\d-]{64,})(?1)(?5)){0,126}|\[(?:(?>IPv6:(?>([a-f\d]{1,4})(?>:(?6)){7}|(?!(?:.*[a-f\d][:\]]){8,})((?6)(?>:(?6)){0,6})?::(?7)?))|(?>(?>IPv6:(?>(?6)(?>:(?6)){5}:|(?!(?:.*[a-f\d]:){6,})(?8)?::(?>((?6)(?>:(?6)){0,4}):)?))?(25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)(?>\.(?9)){3}))\])(?1)$/isD', $e);} 

And if length-limits are not a requirement:

RFC 5321 (414)

function _($e){return preg_match('/^(?>([!#-\'*+\/-9=?^-~-]+)(?>\.(?1))*|"(?>[ !#-\[\]-~]|\\\[ -~])*")@(?>([a-z0-9](?>[a-z0-9-]*[a-z0-9])?)(?>\.(?2)){0,126}|\[(?:(?>IPv6:(?>([a-f0-9]{1,4})(?>:(?3)){7}|(?!(?:.*[a-f0-9][:\]]){8,})((?3)(?>:(?3)){0,6})?::(?4)?))|(?>(?>IPv6:(?>(?3)(?>:(?3)){5}:|(?!(?:.*[a-f0-9]:){6,})(?5)?::(?>((?3)(?>:(?3)){0,4}):)?))?(25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)(?>\.(?6)){3}))\])$/iD', $e);} 

RFC 5322 (636)

function _($e){return preg_match('/^((?>(?>(?>((?>(?>(?>\x0D\x0A)?[\t ])+|(?>[\t ]*\x0D\x0A)?[\t ]+)?)(\((?>(?2)(?>[\x01-\x08\x0B\x0C\x0E-\'*-\[\]-\x7F]|\\\[\x00-\x7F]|(?3)))*(?2)\)))+(?2))|(?2))?)([!#-\'*+\/-9=?^-~-]+|"(?>(?2)(?>[\x01-\x08\x0B\x0C\x0E-!#-\[\]-\x7F]|\\\[\x00-\x7F]))*(?2)")(?>(?1)\.(?1)(?4))*(?1)@(?1)(?>([a-z\d](?>[a-z\d-]*[a-z\d])?)(?>(?1)\.(?1)(?5)){0,126}|\[(?:(?>IPv6:(?>([a-f\d]{1,4})(?>:(?6)){7}|(?!(?:.*[a-f\d][:\]]){8,})((?6)(?>:(?6)){0,6})?::(?7)?))|(?>(?>IPv6:(?>(?6)(?>:(?6)){5}:|(?!(?:.*[a-f\d]:){6,})(?8)?::(?>((?6)(?>:(?6)){0,4}):)?))?(25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)(?>\.(?9)){3}))\])(?1)$/isD', $e);} 
\$\endgroup\$
0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.