My problem is that I've been a heavy user of GMail's pop3-import feature and used it to pull mail from other GMail Accounts. Back then I did, however, only import new mail, not all the mail previously stored on the server. Now that I've started to use mutt as my mail client of choice I've decided to import all my former mail accounts and thus archive all my old mail locally.
My initial hope was that I could easily weed out the duplicate mails by using a tool like fdupes, but what I did not anticipate was that the Mail-Header is altered slightly by GMail when it it retrieves a mail via Pop3 as it can be seen here:
@@ -1,7 +1,16 @@ Return-Path: <[email protected]> Delivered-To: unknown Received: from pop.gmail.com (74.125.43.109:995) by localhost with POP3-SSL; - 10 May 2011 13:35:06 -0000 + 10 May 2011 14:29:41 -0000 +Delivered-To: [email protected] +Received: by 10.204.52.199 with SMTP id j7cs172325bkg; + Sun, 2 May 2010 15:33:19 -0700 (PDT) +Received: by 10.204.136.15 with SMTP id p15mr6011875bkt.172.1272839446530; + Sun, 02 May 2010 15:30:46 -0700 (PDT) +Received-SPF: softfail (google.com: best guess record for domain of transitioning [email protected] does not designate 84.167.28.93 as permitted sender) client-ip=84.167.28.93; +Received: by 10.188.26.17 with POP3 id 17mf826641bwz.107; + Sun, 02 May 2010 15:30:46 -0700 (PDT) +X-Gmail-Fetch-Info: [email protected] 1 smtp.gmail.com 995 xxxx Received: from aequitas ( [84.167.28.93]) by mx.google.com with ESMTPS id e20sm18902485fga.1.2008.01.04.07.58.46 (version=TLSv1/SSLv3 cipher=RC4-MD5); The original looked like this: http://pastebin.com/U6YzNySP Is there an easy way to get rid of those "duplicate files" in an easy way?