1

My problem is that I've been a heavy user of GMail's pop3-import feature and used it to pull mail from other GMail Accounts. Back then I did, however, only import new mail, not all the mail previously stored on the server. Now that I've started to use mutt as my mail client of choice I've decided to import all my former mail accounts and thus archive all my old mail locally.

My initial hope was that I could easily weed out the duplicate mails by using a tool like fdupes, but what I did not anticipate was that the Mail-Header is altered slightly by GMail when it it retrieves a mail via Pop3 as it can be seen here:

@@ -1,7 +1,16 @@ Return-Path: <[email protected]> Delivered-To: unknown Received: from pop.gmail.com (74.125.43.109:995) by localhost with POP3-SSL; - 10 May 2011 13:35:06 -0000 + 10 May 2011 14:29:41 -0000 +Delivered-To: [email protected] +Received: by 10.204.52.199 with SMTP id j7cs172325bkg; + Sun, 2 May 2010 15:33:19 -0700 (PDT) +Received: by 10.204.136.15 with SMTP id p15mr6011875bkt.172.1272839446530; + Sun, 02 May 2010 15:30:46 -0700 (PDT) +Received-SPF: softfail (google.com: best guess record for domain of transitioning [email protected] does not designate 84.167.28.93 as permitted sender) client-ip=84.167.28.93; +Received: by 10.188.26.17 with POP3 id 17mf826641bwz.107; + Sun, 02 May 2010 15:30:46 -0700 (PDT) +X-Gmail-Fetch-Info: [email protected] 1 smtp.gmail.com 995 xxxx Received: from aequitas ( [84.167.28.93]) by mx.google.com with ESMTPS id e20sm18902485fga.1.2008.01.04.07.58.46 (version=TLSv1/SSLv3 cipher=RC4-MD5); 

The original looked like this: http://pastebin.com/U6YzNySP Is there an easy way to get rid of those "duplicate files" in an easy way?

1 Answer 1

3

Use the ESMTPS id from mx.google.com to identify duplicates. These should be unmodified. In the example above: by mx.google.com with ESMTPS id e20sm18902485fga.1.2008.01.04.07.58.46

A very simple implementation would put all mails in one dir, extract the id and symlink the file to the id without using -f. Like:

for FILE in *; do smtpid=$(do_extract_smtp_id_here) if test -f ${smptid}; then echo "DUPE: ${FILE}" else ln -s ${FILE} ${smtpid} fi done 
1
  • For the most part this worked (finding many duplicates, but sadly not everything apparently [some with multiple daisy-chained SMTPids as it passed over multiple redirection]). But I also figured that I could just go to GMail and delete all the retrieved Mails (as I auto-labeled them anyway). So I did that and it re-downloading them via pop helped me clean out everything. Commented May 11, 2011 at 5:34

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.