-1

I'm trying to mask some sensitive data in a log file.

I first need to filter out specific lines from the file with a matching pattern and then for those specific lines I need to replace any text that is inside double quotes but leave alone any text that is not.

In the file, all lines that matches with the pattern, that has double quotes, anything inside double quotes needs to be be replaced in a way that any A-Z gets replaced by X, any a-z by x and any digit 0-9 by 0.

In one line, there can be multiple quoted strings. Inside quotes can be also special characters, like ',', '-', '.', '@' and those should be preserved as-is.

An example file contents (filtering word in this case is 'KEYWORD'):

2020-04-18 15:01:12 [EVENT] :log-event-with-KEYWORD: {:entry1 {:entry2 {:value "Replace This"}}} -> {:entry1 {:entry2 {:value "Replace ALSO this."}}} 2020-04-18 15:01:13 [EVENT] :log-event-with-KEYWORD: {:entry1 {:entry2 {:value "REplace. THIS 12345"}}} 2020-04-18 15:01:15 [EVENT] :this_has--the-KEYWORD: {:entry1 {:entry2 {:value "[email protected]"}}} -> {:entry1 {:entry2 {:value "[email protected]"}}} 2020-04-18 15:01:18 [EVENT] :log-event-without-keyword: {:entry1 {:entry2 {:value "Do NOT replace this."}}} -> {:entry1 {:entry2 {:value "Do-NoT replace this either"}}} 

That file as input would be processed into this output:

2020-04-18 15:01:12 [EVENT] :log-event-with-KEYWORD: {:entry1 {:entry2 {:value "Xxxxxxx Xxxx"}}} -> {:entry1 {:entry2 {:value "Xxxxxxx XXXX xxxx."}}} 2020-04-18 15:01:13 [EVENT] :log-event-with-KEYWORD: {:entry1 {:entry2 {:value "XXxxxxx. XXXX 00000"}}} 2020-04-18 15:01:15 [EVENT] :this_has--the-KEYWORD: {:entry1 {:entry2 {:value "[email protected]"}}} -> {:entry1 {:entry2 {:value "[email protected]"}}} 2020-04-18 15:01:18 [EVENT] :log-event-without-keyword: {:entry1 {:entry2 {:value "Do NOT replace this."}}} -> {:entry1 {:entry2 {:value "Do-NoT replace this either"}}} 

The changed lines need to be updated in the file or the whole file with these modifications should be thrown into standard output (also those lines that did not have the keyword(s), the line order, etc. details should be preserved.

Is this possible to accomplish this using bash scripting/command line tools like grep and/or sed?

2
  • If you're trying to obfuscate passwords, you might not want to leave behind clues about which characters are upper/lower/digits -- change every char to "X". Commented Apr 17, 2020 at 20:18
  • No, not any secrets like that. But data that should not be available except for the casing and the character type so the input can be used for debugging in that level. Commented Apr 17, 2020 at 23:38

4 Answers 4

4
awk '/KEYWORD/{ n=split($0,a,"\"") for(i=2;i<=n;i=i+2){ gsub(/[A-Z]/,"X",a[i]) gsub(/[a-z]/,"x",a[i]) gsub(/[0-9]/,"0",a[i]) } sep="" for (i=1;i<=n;i++){ printf "%s%s",sep,a[i] sep="\"" } printf "\n" next } 1' file 

For example, on your updated input file

2020-04-18 15:01:12 [EVENT] :log-event-with-KEYWORD: {:entry1 {:entry2 {:value "Replace This"}}} -> {:entry1 {:entry2 {:value "Replace ALSO this."}}} 2020-04-18 15:01:13 [EVENT] :log-event-with-KEYWORD: {:entry1 {:entry2 {:value "REplace. THIS 12345"}}} 2020-04-18 15:01:15 [EVENT] :this_has--the-KEYWORD: {:entry1 {:entry2 {:value "[email protected]"}}} -> {:entry1 {:entry2 {:value "[email protected]"}}} 2020-04-18 15:01:18 [EVENT] :log-event-without-keyword: {:entry1 {:entry2 {:value "Do NOT replace this."}}} -> {:entry1 {:entry2 {:value "Do-NoT replace this either"}}} 

This awk produces the desired output

2020-04-18 15:01:12 [EVENT] :log-event-with-KEYWORD: {:entry1 {:entry2 {:value "Xxxxxxx Xxxx"}}} -> {:entry1 {:entry2 {:value "Xxxxxxx XXXX xxxx."}}} 2020-04-18 15:01:13 [EVENT] :log-event-with-KEYWORD: {:entry1 {:entry2 {:value "XXxxxxx. XXXX 00000"}}} 2020-04-18 15:01:15 [EVENT] :this_has--the-KEYWORD: {:entry1 {:entry2 {:value "[email protected]"}}} -> {:entry1 {:entry2 {:value "[email protected]"}}} 2020-04-18 15:01:18 [EVENT] :log-event-without-keyword: {:entry1 {:entry2 {:value "Do NOT replace this."}}} -> {:entry1 {:entry2 {:value "Do-NoT replace this either"}}} 
4

Using sed:

sed -E '/KEYWORD/{ :lower s/("[^"]*)[a-z]([^"]*")/\1_\2/; t lower; :upper s/("[^"]*)[A-Z]([^"]*")/\1-\2/; t upper; :digit s/("[^"]*)[0-9]([^"]*")/\1*\2/; t digit; }; y/*_-/0xX/' infile 

This will run the set of codes in block /KEYWORD/{...} only when a line matched with a string KEYWORD.

This ("[^"]*)[###]([^"]*") matches to a " and anything after that until first lower-case [a-z]/upper-case[A-Z]/digit[0-9] character found which flowed by anything until another quote matched.

Every part will loop over again and again until all those characters were converted lower-case to _, upper-case to - and digits to * (note: chose different characters if these may occurred in your file; the reason is we didn't replace directly with x or X or 0 because it will cause infinite loop for sed since of using sed's loops to replace every lower/upper/digit characters).

After all done, those characters *_- will translate to 0xX.

Add -i option to above command to update the changes in your input file like sed -i -E ....


Update: The command for the revised question:

sed -E '/KEYWORD/{ :lower s/^(([^"]*("[^"]*"){0,1})*)("[^"]*)[a-z]([^"]*")/\1\4_\5/; t lower; :upper s/^(([^"]*("[^"]*"){0,1})*)("[^"]*)[A-Z]([^"]*")/\1\4+\5/; t upper; :digit s/^(([^"]*("[^"]*"){0,1})*)("[^"]*)[0-9]([^"]*")/\1\4*\5/; t digit; }; y/*_+/0xX/' infile 
5
  • "every part will loop over again and again until all those characters were converted" -> Is this looping again because we use 't' command right ? Commented Apr 17, 2020 at 19:16
  • @StalinVigneshKumar yes, for every successful s/// it will jump to the beginning of the command with label specified. lower/upper/digit are label name here I choose Commented Apr 17, 2020 at 19:21
  • Thank you, I would've never figured this out. The only problem in this solution for me is that there can be some special characters in some of the fields, e.g. email (sorry for my inaccurate example), thus I need to figure out three characters that are not allowed as the input. Although, I must preserve any special characters in the input as-is to allow some level of debugging. Commented Apr 18, 2020 at 11:09
  • @αғsнιη I have updated the question. Commented Apr 18, 2020 at 12:25
  • @Sinipelto see the edit. and please read answer a bit closely "note: chose different characters if these may occurred in your file;" Commented Apr 18, 2020 at 14:03
2

Using perl :

$ perl -ne 'if ( $_ =~ /KEYWORD/){ ($first,$matched,$last) = ($1,$2,$3) if ( $_ =~ /^(.*)?\"(.*)\"(.*)$/ ); $matched =~ tr/[a-z]/x/;$matched =~ tr/[A-Z]/X/;$matched =~ tr/0-9/0/; print $first."\"".$matched."\"".$last."\n"; } else { print }' <<inputFile>> 

Edited : If pattern occurs multiple times .Below will work ;

$ perl -ne ' { if ( $_ =~ /KEYWORD/ ){ $line=$_;$val=1; while($val) { ($first,$matched,$last) = ($1,$2,$3) if ( $line =~ m/(.*?)\"(.*?)\"(.*)$/ ); $val = $line =~ s/\".*?\"/_/; $matched =~ tr/[a-z]/x/;$matched =~ tr/[A-Z]/X/;$matched =~ tr/0-9/0/; $matched = "_".$matched."_"; $line=$first.$matched.$last; } $line =~ s/[_]*_/"/g; print "$line\n"; }else { print } }' <<inputFile>> 
0
 <infile tr '\n' '#' | tr '"' '\n' | sed '2~2 {s/[A-Z]/X/g;s/[a-z]/x/g;s/[0-9]/0/g}'| sed '2~2 s/.*/"&"/' | tr -d '\n' | tr '#' '\n' 2020-04-18 15:01:12 [EVENT] :log-event-with-KEYWORD: {:entry1 {:entry2 {:value "Xxxxxxx Xxxx"}}} -> {:entry1 {:entry2 {:value "Xxxxxxx XXXX xxxx."}}} 2020-04-18 15:01:13 [EVENT] :log-event-with-KEYWORD: {:entry1 {:entry2 {:value "XXxxxxx. XXXX 00000"}}} 2020-04-18 15:01:15 [EVENT] :this_has--the-KEYWORD: {:entry1 {:entry2 {:value "[email protected]"}}} -> {:entry1 {:entry2 {:value "[email protected]"}}} 2020-04-18 15:01:18 [EVENT] :log-event-without-keyword: {:entry1 {:entry2 {:value "Xx XXX xxxxxxx xxxx."}}} -> {:entry1 {:entry2 {:value "Xx-XxX xxxxxxx xxxx xxxxxx"}}} 

Hash symbol # is used as temporary linefeed marker. Use any character not present in input file.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.