4

How can I mask a list of e-mail addresses like:

John Doe <[email protected]> John Doe <[email protected]> Jane Doe <[email protected]> 

... with sed (or awk) into something like:

John Doe <j******e@g***l.com> John Doe <j*****e@h*****l.net> Jane Doe <j***e@o*****k.org> 

In other words: replace most of the e-mail address with asterisks but make it somehow recognizable by users who know the actual e-mail address.

1

4 Answers 4

2

Another Perl version:

perl -pe 's/(<.)(.*)(@.)(.*)(.\..*>)/$1."*" x length($2).$3."*" x length($4).$5/e' 

Example:

$ perl -pe 's/(<.)(.*)(@.)(.*)(.\..*>)/$1."*" x length($2).$3."*" x length($4).$5/e' foo John Doe <j*******@g***l.com> John Doe <j******@h*****l.net> Jane Doe <j****@o*****k.org> 

With sed, replacing the characters by an equal number of * is complicated. See this SO post for examples involving sed, perl and awk.

1
  • I guess to truly answer the original poster you need a slight tweak: (.@.), plus I'm not sure the \..* is good enough in case there are subdomains. Commented Nov 17, 2015 at 1:07
1

Perl to the rescue:

perl -pe ' sub asteriskify { my $s = shift; substr $s, 1, -1, "*" x (length($s) - 2); return $s } s/<(.*)@(.*)(?=\..*>)/ "<" . asteriskify($1) . "@" . asteriskify($2) /e; ' < input > output 

The substr replaces characters in the string from the second one to the last but one by dots, the number of the dots being the length of the string - 2.

The substitution captures the username to $1 and the domain name without the final part to $2, the ?= part just makes sure it's followed by a dot, whatever and > (see Look Around Assertions in perlre).

1

Since you asked how to do it with awk, I thought I would prove that it was not too hard. So here goes:

echo "John Doe <[email protected]> John Doe <[email protected]> Jane Doe <[email protected]>" | \ awk \ ' {print repl($0)} function repl(s, m) { if (match(s,"(<.)([^>]*)(.@.)([^>]*)(.\\.[a-z]*>)", m)) { return substr(s, 1, RSTART-1) m[1] \ gensub(".","*","g",m[2]) m[3] \ gensub(".","*","g",m[4]) m[5] \ repl(substr(s,RSTART+RLENGTH)) } else return s } ' 
0

using sed:

sed 's/.$//' foo.txt | sed 's#\<\(.\).*\(.@.\).*\(\..*\)#\1***\2***\3#' 

first sed get rid of the ending >, and 2nd sed will mask the mail address.

output

J****e@g****.com J****e@h****.net J****e@o****.org 
1
  • 1
    You could shorten that to sed 's#\<\(.\).*\(.@.\).*\(\..*\)\>#\1***\2***\3#' foo.txt and save one sed-process. Commented Sep 18, 2019 at 6:18

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.