1

I have a database controlfile on a Linux system that I want to filter out (for training purposes). However, I am unable to find a proper way to get rid of "block-like" characters:

▒▒▒▒ ▒▒▒▒ ▒▒▒▒ ▒▒▒▒▒ ▒▒▒{ ▒▒▒▒▒▒9 ▒▒▒▒ ▒▒▒▒▒ 

I've tried many ways, but they do not get rid of the block chars:

258 strings o1_mf_d3rrgv0l_.ctl|grep -vE '▒' 259 strings o1_mf_d3rrgv0l_.ctl|grep -vE '@|?|+|(|)|<|>' 260 strings o1_mf_d3rrgv0l_.ctl|grep -vE '@|\?|\+|(|)|<|>' 261 strings o1_mf_d3rrgv0l_.ctl|grep -vE '@|\?|\+|\(|\)|\<|\>' 262 strings o1_mf_d3rrgv0l_.ctl|grep -vE '@|\?|\+|\(|\)|\<|\>|!' 263 strings o1_mf_d3rrgv0l_.ctl|grep -vE '@|\?|\+|\(|\)|\<|\>|!|\^|\%|\`' 264 strings o1_mf_d3rrgv0l_.ctl|grep -vE '@|\?|\+|\(|\)|\<|\>|!|\^|\%|\`|\$' 265 strings o1_mf_d3rrgv0l_.ctl|grep -v '[^[:print:]]' 266 strings o1_mf_d3rrgv0l_.ctl|grep -v '[[:print:]]' 267 strings o1_mf_d3rrgv0l_.ctl|grep '[[:print:]]' 268 strings o1_mf_d3rrgv0l_.ctl|grep -v '[[:cntrl:]]' 269 strings o1_mf_d3rrgv0l_.ctl|grep -v '\x{09}' 270 strings o1_mf_d3rrgv0l_.ctl|grep -vP '[^\x00-\x7f]' 271 strings o1_mf_d3rrgv0l_.ctl|tr -dc '\007-\011\012-\015\040-\376' 272 strings -1 o1_mf_d3rrgv0l_.ctl|tr -dc '\007-\011\012-\015\040-\376' 273 strings o1_mf_d3rrgv0l_.ctl|tr -dc '[:print:]\n\r' 274 strings o1_mf_d3rrgv0l_.ctl|grep -vE '@|\?|\+|\(|\)|\<|\>|!|\^|\%|\`|\$' 275 strings o1_mf_d3rrgv0l_.ctl|grep -vE '@|\?|\+|\(|\)|\<|\>|!|\^|\%|\`|\;|\:|\=|\$' 276 strings o1_mf_d3rrgv0l_.ctl|grep -vE '@|\?|\+|\(|\)|\<|\>|!|\^|\%|\`|\;|\:|\=|\$|\"' 277 strings o1_mf_d3rrgv0l_.ctl|grep -vE '@|\?|\+|\(|\)|\<|\>|!|\^|\%|\`|\;|\:|\=|\$|\"|\&|\#' 
4
  • grep -v is not the right tool, as it removes entire lines that contain the regular expression. You could try tr as shown at unix.stackexchange.com/questions/201751/…, followed by sed. Or start with cat -v, which represents non-printable characters like ^A, then also filter them out with sed. The problem with cat -v is that it doesn't distinguish between a ^ character and an unprintable character. I am sure there are other solutions. Commented Jan 26, 2021 at 9:14
  • Welcome, you want to remove the characters or remove the lines that contain them? Commented Jan 26, 2021 at 10:03
  • yes, I want to remove the lines that contain these weird brackets. Commented Jan 26, 2021 at 11:36
  • cat -v displays a following output: ^@▒^@^@▒▒^@^@^@^@^@^@^@^@^@^@<▒^@^@^@^@@^@^@^@^D~z{|}^@^@^▒^@^@^@^@^@^@^@^@^@^@ Commented Jan 26, 2021 at 11:40

2 Answers 2

1

It may be easier to remove anything except "known good" characters. e.g. to limit output to standard ASCII characters you could use

tr -dc '[^ -~\012\015]' 

That will only keep characters between SPACE and ~ (character 126) and the CR/LF characters. Everything else will be removed.

Alternatively you might want to replace them with another character, e.g. a space

tr -c '[^ -~\012\015]' ' ' 

which will keep any indentation levels

Finally, you might be seeing this because of locale settings; eg if the OS thinks you have UTF8 but the terminal isn't, then you might see this.

So setting LANG=C before running the command might change the output

LANG=C strings o1_mf_d3rrgv0l_.ctl 

That'll change what the strings command considers to be a printable character.

2
  • can't vote yet, but this actually works! Many thanks! Commented Feb 1, 2021 at 16:46
  • Which version worked? The tr or the LANG=C? Or both? :-) Commented Feb 2, 2021 at 0:58
0

After an interesting journey I hope this answer your question, with GNU grep:

Sample file.txt:

▒▒▒▒ ▒▒▒▒ ▒▒▒▒ foo bar @▒^@^@▒▒^@^@^@^@^@^@^@^@^@^@<▒^@^@^@^@@^@^@^@^D~z{|}^@^@^▒^@^@^@^@^@^@^@^@^@^@ 

$ grep -v $(printf %b \\U2592) file.txt foo bar 
3
  • unfortunately grep@AIX does not have the -P parameter available Commented Jan 26, 2021 at 11:54
  • @user452948 it was not necessary, it remained from previous tests. Remove it and try. Commented Jan 26, 2021 at 12:01
  • It has no effect: BEFORE: strings o1_mf_d3rrgv0l_.ctl |head -10 ~z{|} H▒DB_CHRIS 7aM▒▒;▒Q ▒'y?Yܣ ▒;▒^ ▒▒X?Yܩ= (▒(A'y ?S▒▒Fr▒ ▒▒▒▒ ▒▒▒▒ . . and AFTER: strings o1_mf_d3rrgv0l_.ctl |head -10 | grep -v $(printf %b \\U2592) ~z{|} H▒DB_CHRIS 7aM▒▒;▒Q ▒'y?Yܣ ▒;▒^ ▒▒X?Yܩ= (▒(A'y ?S▒▒Fr▒ ▒▒▒▒ ▒▒▒▒ Commented Jan 26, 2021 at 12:10

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.