0

I have a file containing an email in "plain text MIME message format". I am not sure if this is the EML format. The email contains an attachment and I want to extract the attachment and create those files again. This is how the attachment part looks like -

... ... Receive, deliver details ... ... From: sac ascsac <[email protected]> Date: Thu, 20 Jan 2011 18:05:16 +0530 Message-ID: <[email protected]> Subject: Test attachments To: [email protected] Content-Type: multipart/mixed; boundary=20cf3054ac85d97721049a465e12 --20cf3054ac85d97721049a465e12 Content-Type: multipart/alternative; boundary=20cf3054ac85d97717049a465e10 --20cf3054ac85d97717049a465e10 Content-Type: text/plain; charset=ISO-8859-1 hello this is a test mail. It contains two attachments --20cf3054ac85d97717049a465e10 Content-Type: text/html; charset=ISO-8859-1 hello this is a test mail. It contains two attachments<br> --20cf3054ac85d97717049a465e10-- --20cf3054ac85d97721049a465e12 Content-Type: text/plain; charset=US-ASCII; name="simple_test.txt" Content-Disposition: attachment; filename="simple_test.txt" Content-Transfer-Encoding: base64 X-Attachment-Id: f_gj5n2yx60 aGVsbG8gd29ybGQKYWMgYXNj ... encoded things here ... ZyBmZyAKCjIKNDIzCnQ2Mwo= --20cf3054ac85d97721049a465e12 Content-Type: application/x-httpd-php; name="oscomm_backup_code.php" Content-Disposition: attachment; filename="oscomm_backup_code.php" Content-Transfer-Encoding: base64 X-Attachment-Id: f_gj5n5gxn1 PD9waHAKCg ... ... encoded things here ... X2xpbmsoRklMRU5BTUVfQkFDS1VQKSk7Cgo/Pgo= --20cf3054ac85d97721049a465e12-- 

I can see that the part between X-Attachment-Id: f_gj5n2yx60 and ZyBmZyAKCjIKNDIzCnQ2Mwo=, both including is the content of the first attachment. I want to parse those attachments (file names and contents and create those files).

I got this file after parsing a dbx format file using a DBX Parser class available in PHP classes.

I searched in many places and did not find much discussion regarding this here in SO other than Script to parse emails for attachments. May be I missed some terms while searching. In that answer it is mentioned -

you can use the boundries to extract the base64 encoded information

But I am not sure which are the boundaries and how exactly to use the boundaries? There already must be some libraries or some well defined method of doing this. I guess I will commit many mistakes if I try reinventing the wheel here.

1 Answer 1

1

There's an PHP Mailparse extension, have you tried it?

The manual way would be, process the mail line by line. When you hit your first Content-Type header (this one in your example): Content-Type: multipart/mixed; boundary=20cf3054ac85d97721049a465e12

You have the boundary. This string is used as the boundary between your multiple parts (that's why they call it multipart). Everytime a line starts with the dashes and this string, a new part begin. In your example: --20cf3054ac85d97721049a465e12

Every part will start with headers, a blank line, and content. By looking at the content-type of the headers you can determine which are attachments, what their type is and their filename. Read the whole content, strip the spaces, base64_decode it, and you've got the binary contents of the file. Does this help?

Sign up to request clarification or add additional context in comments.

2 Comments

Definitely helpful. Thanks, I am trying out the Mailparse extension however.
Google's PHP mime mail parser would be very helpful here . Check out code.google.com/p/php-mime-mail-parser

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.