2

I have been trying to extract the names and email addresses from the following String that consists of multiple lines through regex in Java:

From: Kane Smith <[email protected]> To: John Smith <[email protected]>, Janes Smith <[email protected]>, Tom Barter <[email protected]>, Other Weird @#$@<>#^Names <[email protected]>, Long Long Long Long Name <[email protected]> Date: Tue, 25 Oct 2011 15:45:59 +0000 

I tried this regex: To:\s?(([.*]+)\s*<([\w\d@\.]*)>,(\s|\n)*)+ But it doesn't work.

My intention is to extract each of the names and email addresses and put each name its email address together into groups. What I have done however, seems to work only when there is one single name and address. What should my regex be to do this?

2 Answers 2

3
 String s = "To: John Smith <[email protected]>, Janes Smith\n" + "<[email protected]>, Tom Barter <[email protected]>, Other \n" + "Weird @#$@<>#^Names <[email protected]>, \n" + "Long Long Long Long Name <[email protected]>"; s = s.substring(3); // filter TO: System.out.println(s); // Use DOTALL pattern Pattern p = Pattern.compile("(.*?)<([^>]+)>\\s*,?",Pattern.DOTALL); Matcher m = p.matcher(s); while(m.find()) { // filter newline String name = m.group(1).replaceAll("[\\n\\r]+", ""); String email = m.group(2).replaceAll("[\\n\\r]+", ""); System.out.println(name + " -> " + email); } 
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks! But because this was actually from an email header, I may not be able to do a substring and then replaceAll after that. I updated my question on the search string to show that because what's above the To and below the To, there are also other contents. Is is possible for me to still extract them out if the To is not just as clean?
why don't you use an other regex to get "TO" content and then use this code to extract emails and names?
@Fred: What happens if the Subject or the name or if there is another header say Delivery-To:, I will get more than just one content for searching the "To".
Also, if the name consist of a < character, the regex would match wrongly too.
1

you can split each line on "," and then use javax.mail.internet.InternetAddress. That will take care of extracting the name and address.

Btw, where are you getting the headers from and why can't they be key values as they should be?

1 Comment

The content is directly from the POP. I don't intend to use the javamail library.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.