6

I realize that there are a ton of regex email validations, but I can't seem to find one that adheres to the RFC 2822 standard.

The ones I find keep letting in junk like [email protected] get through.

Forgive me if the one of the questions is already answered adhering to RFC 2822 (but not annotated that it is).

7
  • Are you trying to validate the email addresses? That standard appears to be more concerned with the messaging format itself. If so, what's wrong with [email protected] - I would have thought that would be valid enough... Commented May 24, 2011 at 20:30
  • 2
    @Reddog I thought he meant literally "[email protected]". Commented May 24, 2011 at 20:32
  • @Chris - aaah, then surely that particular case could be added to any regex? Commented May 24, 2011 at 20:38
  • @Reddog, I mean I want to validate the email addresses that RFC 2822 states is acceptable. Half the regex patterns you find through google allow junk to filter through, such as my example [email protected] (and that's just ONE example of the "valid" emails that pass through these patterns I find through google. Ugh. Commented May 24, 2011 at 20:38
  • To make matters worse, I found that many pages do appear to not accept valid adresses as valid. I happen to have a domain with a german umlaut in it, and there are quite a few sites out there (and I am not referring to "mom and pop" pages) that reject adresses with domains that contain "non-ascii" characters. Commented May 24, 2011 at 21:55

4 Answers 4

4

I did a post on this a short while ago. Yes, it is possible using .NET regex, since they have a non-regular feature called "balancing groups".

The Perl RFC822 one that is often posted doesn't fully match email addresses, since it requires preprocessing to remove comments. It's also for a very old RFC (from 1982!).

This regex is for RFC5322, which is current. It also handles all comments and folding whitespace correctly.

Here is the regex:

^(?'localPart'((((\((((?'paren'\()|(?'-paren'\))|([\u0021-\u 0027\u002a-\u005b\u005d-\u007e]|[\u0001-\u0008\u000b\u000c\u 000e-\u001f\u007f])|([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)| \\([\u0021-\u007e]|[ \t]|[\r\n\0]|[\u0001-\u0008\u000b\u000c \u000e-\u001f\u007f]))*(?(paren)(?!)))\))|([ \t]+((\r\n)[ \t ]+)?|((\r\n)[ \t]+)+))*?(([a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+)|( "(([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)?(([\u0021\u0023-\u 005b\u005d-\u007e]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u 007f])|\\([\u0021-\u007e]|[ \t]|[\r\n\0]|[\u0001-\u0008\u000 b\u000c\u000e-\u001f\u007f])))*([ \t]+((\r\n)[ \t]+)?|((\r\n )[ \t]+)+)?"))((\((((?'paren'\()|(?'-paren'\))|([\u0021-\u00 27\u002a-\u005b\u005d-\u007e]|[\u0001-\u0008\u000b\u000c\u00 0e-\u001f\u007f])|([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)|\\ ([\u0021-\u007e]|[ \t]|[\r\n\0]|[\u0001-\u0008\u000b\u000c\u 000e-\u001f\u007f]))*(?(paren)(?!)))\))|([ \t]+((\r\n)[ \t]+ )?|((\r\n)[ \t]+)+))*?)(\.(((\((((?'paren'\()|(?'-paren'\))| ([\u0021-\u0027\u002a-\u005b\u005d-\u007e]|[\u0001-\u0008\u0 00b\u000c\u000e-\u001f\u007f])|([ \t]+((\r\n)[ \t]+)?|((\r\n )[ \t]+)+)|\\([\u0021-\u007e]|[ \t]|[\r\n\0]|[\u0001-\u0008\ u000b\u000c\u000e-\u001f\u007f]))*(?(paren)(?!)))\))|([ \t]+ ((\r\n)[ \t]+)?|((\r\n)[ \t]+)+))*?(([a-zA-Z0-9!#$%&'*+/=?^_ `{|}~-]+)|("(([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)?(([\u00 21\u0023-\u005b\u005d-\u007e]|[\u0001-\u0008\u000b\u000c\u00 0e-\u001f\u007f])|\\([\u0021-\u007e]|[ \t]|[\r\n\0]|[\u0001- \u0008\u000b\u000c\u000e-\u001f\u007f])))*([ \t]+((\r\n)[ \t ]+)?|((\r\n)[ \t]+)+)?"))((\((((?'paren'\()|(?'-paren'\))|([ \u0021-\u0027\u002a-\u005b\u005d-\u007e]|[\u0001-\u0008\u000 b\u000c\u000e-\u001f\u007f])|([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)|\\([\u0021-\u007e]|[ \t]|[\r\n\0]|[\u0001-\u0008\u0 00b\u000c\u000e-\u001f\u007f]))*(?(paren)(?!)))\))|([ \t]+(( \r\n)[ \t]+)?|((\r\n)[ \t]+)+))*?))*))@(?'domain'((((\((((?' paren'\()|(?'-paren'\))|([\u0021-\u0027\u002a-\u005b\u005d-\ u007e]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007f])|([ \t ]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)|\\([\u0021-\u007e]|[ \t]| [\r\n\0]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007f]))*(? (paren)(?!)))\))|([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+))*?( ([a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+)|("(([ \t]+((\r\n)[ \t]+)?| ((\r\n)[ \t]+)+)?(([\u0021\u0023-\u005b\u005d-\u007e]|[\u000 1-\u0008\u000b\u000c\u000e-\u001f\u007f])|\\([\u0021-\u007e] |[ \t]|[\r\n\0]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007 f])))*([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)?"))((\((((?'pa ren'\()|(?'-paren'\))|([\u0021-\u0027\u002a-\u005b\u005d-\u0 07e]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007f])|([ \t]+ ((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)|\\([\u0021-\u007e]|[ \t]|[\ r\n\0]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007f]))*(?(p aren)(?!)))\))|([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+))*?)(\ .(((\((((?'paren'\()|(?'-paren'\))|([\u0021-\u0027\u002a-\u0 05b\u005d-\u007e]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u0 07f])|([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)|\\([\u0021-\u0 07e]|[ \t]|[\r\n\0]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\ u007f]))*(?(paren)(?!)))\))|([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+))*?(([a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+)|("(([ \t]+((\r\ n)[ \t]+)?|((\r\n)[ \t]+)+)?(([\u0021\u0023-\u005b\u005d-\u0 07e]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007f])|\\([\u0 021-\u007e]|[ \t]|[\r\n\0]|[\u0001-\u0008\u000b\u000c\u000e- \u001f\u007f])))*([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)?")) ((\((((?'paren'\()|(?'-paren'\))|([\u0021-\u0027\u002a-\u005 b\u005d-\u007e]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007 f])|([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)|\\([\u0021-\u007 e]|[ \t]|[\r\n\0]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u0 07f]))*(?(paren)(?!)))\))|([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t ]+)+))*?))*)|(((\((((?'paren'\()|(?'-paren'\))|([\u0021-\u00 27\u002a-\u005b\u005d-\u007e]|[\u0001-\u0008\u000b\u000c\u00 0e-\u001f\u007f])|([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)|\\ ([\u0021-\u007e]|[ \t]|[\r\n\0]|[\u0001-\u0008\u000b\u000c\u 000e-\u001f\u007f]))*(?(paren)(?!)))\))|([ \t]+((\r\n)[ \t]+ )?|((\r\n)[ \t]+)+))*?\[(([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t] +)+)?([!-Z^-~]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007f ]))*([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)?\]((\((((?'paren '\()|(?'-paren'\))|([\u0021-\u0027\u002a-\u005b\u005d-\u007e ]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007f])|([ \t]+((\ r\n)[ \t]+)?|((\r\n)[ \t]+)+)|\\([\u0021-\u007e]|[ \t]|[\r\n \0]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007f]))*(?(pare n)(?!)))\))|([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+))*?))\z 

Some caveats, however. RFC5322 is more liberal with domain names than the actual domain RFCs, and there are other restrictions that apply from various RFCs such as the actual SMTP RFC itself (which specifies a maximum length). So even though an email is correct according to 5322 it can be invalid by various other measures.

The golden test is still to send an email to the address with a validation code.

Sign up to request clarification or add additional context in comments.

Comments

2

This is for RFC822, not for the newer one. But it seems the address format has not been changed, so should be what you're looking for.

(note the remark below the regexp--it still assumes that the address has been preprocessed)

4 Comments

Wow, am assuming that one will work; but good luck debugging it if it's not.
Actually, good luck debugging the one Cheeso posted. 32K!? Looks like someone's kid pounded on the keyboard for 20 minutes!
2822 is also obsolete. It even says so on the page :)
@Porges: You're right, I didn't even notice that even 2822 is obsolete. So technically you'd want 5322 to be most up-to-date.
1

This runs in PCRE: http://code.iamcal.com/php/rfc822/full_regexp.txt

It's 32k, apparently.

Seriously - maybe consider backing off using a single regexp, or accepting ALL possible email forms.

Comments

0

I would look at this: http://www.regular-expressions.info/email.html, which explains a lot about using regular expressions to match email addresses, and includes a full RFC 2822 expression which honestly I would almost never recommended using.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.