0

I have a whole bunch of emails that I need pull information from. I recently took on a site that stored all their contact information for customers in emails. They want to start storing this in a database. I'm working with Java trying to pull this information out. I'm kind of stuck.

I have been able to load the emails themselves, but have been unable to extract the information. Here's an example email:

> ---------------------------------------------------------------------- > Name: Person's Name > Phone:=20 > Email: [email protected] > Street:=20 > City:=20 > State:=20 > Zip:=20 > Country:=20 > Arrival: 15 Nov 2010 > Departure: 22 Nov 2010 > Message: This is a message > ---------------------------------------------------------------------- > Name: Second Person > Phone:=555-5554 > Email: [email protected] > Street:=1234 Main St. > City:=20 > State:=20 > Zip:=23412 > Country:=20 > Arrival: 15 Nov 2010 > Departure: 22 Nov 2010 > Message: This is a message > ---------------------------------------------------------------------- 

I need to pull everywhere there is not an =20. I need to somehow get all this information into a table or CSV file so I can import it into a mysql database.

Edit:

This is actually what the file looks more like

> ---------------------------------------------------------------------- > Name: Erin > Phone: 401- > Email: eri > Street: 737 > City: Paw > State: > Zip: 02 > Country: USA > Arrival: 17 Jul 2011 > Departure: 23 Jul 2011 > Message: I .=20 > ---------------------------------------------------------------------- >=20 > A representative will be in touch shortly. > Thank You, > >=20 Begin forwarded message: > From: > Date: July 8, 2010 12:35:13 PM EDT > To: > Subject: Thank you for completing our contact form! >=20 > Thank you for completing our contact form! We received the following = information from you: > ---------------------------------------------------------------------- > Name: Ludd > Phone:=20 > Email: aedu > Street: 25 > City: Signal > State: > Zip: > Country: USA > Arrival: 25 Nov 2010 > Departure: 30 Nov 2010 > Message: Not sure if > ---------------------------------------------------------------------- >=20 > A representative will be in touch shortly. > Thank You, > >=20 Begin forwarded message: > From: > Date: July 8, 2010 11:29:49 AM EDT > To: > Subject: Thank you for completing our contact form! >=20 > Thank you for completing our contact form! We received the following = information from you: > ---------------------------------------------------------------------- > Name: Stephanie > Phone: 41 > Email: sgor > Street: 2- > City: > State: On > Zip: 1J6 > Country: > Arrival: 18 Aug 2010 > Departure: 21 Aug 2010 > Message:=20 > ---------------------------------------------------------------------- >=20 > A representative will be in touch shortly. > Thank You, >=20 Begin forwarded message: > From: > Date: July 8, 2010 11:16:36 AM EDT > To: > Subject: Thank you for completing our contact form! >=20 > Thank you for completing our contact form! We received the following = information from you: > ---------------------------------------------------------------------- > Name: Stacey > Phone: 001 > Email: staceymou > Street: 60 > City: New York > State: NY > Zip: 0 > Country: USA > Arrival: 10 Dec 2010 > Departure: 14 Dec 2010 > Message: Looking to reserve > ---------------------------------------------------------------------- 

2 Answers 2

2

Here is a method that extracts all such headers to a Map<String, String>. It uses Google's Guava library to simplify things a lot:

public static Map<String, String> readValuesFromFile(final File f) throws IOException{ final Splitter splitter = Splitter.on(':').trimResults().omitEmptyStrings(); final Map<String, String> map = Maps.newHashMap(); for(final String line : Lists.transform( Files.readLines(f, Charsets.UTF_8), new Function<String, String>(){ @Override public String apply(final String input){ return input != null && input.startsWith("> ") ? input.substring(2) : input; } })){ if(line.startsWith("---")){ break; } final String[] items = Iterables.toArray(splitter.split(line), String.class); if(items.length == 2 && !items[1].startsWith("=20")){ map.put(items[0], items[1]); } } return map; } 
Sign up to request clarification or add additional context in comments.

4 Comments

Hey man. Thank you so much for your help. I've been messing a little more with your method but can't make it work exactly as I need it to. It seems to only get the last entry. Say I have 3 of those entries it only gets the last one, not multiple entries. I'm not sure why that is happening. I had to take the line.startsWith("---")) out because it would not work. Any ideas? Thanks for the help.
@dham about line.startsWith() sorry, didn't see that the header also was dashes. also, you didn't mention that you had multiple values. What do you men: multiple rows with the same prefix or multiple values in one row. Please append that to your question
@dham Oh, I think I get you, you are reading more than one file per method call. Don't do that: one map represents one file. Keep a List of Maps.
Hey man. I updated the original post with an example. Thank you so much for your help!
0

Read the file until you get that ">-------" line. Read every line (BufferedReader.readLine()), find the position of ":" in it, take the part of the line before and past it (use String.indexOf(), String.substring(), String.trim()). Now you have a name of field and its value. Unless value is "=20", put it in a database or a CSV record.

If you encounter a ">-------" line again, the record is over. You can easily detect it by the fact that there's no ':' in it either.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.