We have a textfile that we want to clear from "bad" characters. If we open it with vim (with ":set number"):
57000044 zo¥<9a>¥ge¥o¥graph¥i¥cal¥ly 39999999 pariá¹<83>Å<9b>a The ex.: "<9a>" and "<83>" and "<9b>" is marked blue in vim and these two lines looks like this outside vim:
$ sed '57000044,57000044!d' toclean.txt zo���ge�o�graph�i�cal�ly $ sed '57000044,57000044!d' toclean.txt | cat -vte - zoM-%M-^ZM-%geM-%oM-%graphM-%iM-%calM-%ly$ $ and
$ sed '39999999,39999999!d' toclean.txt pariṃśa $ sed '39999999,39999999!d' toclean.txt | cat -vte - pariM-aM-9M-^CM-EM-^[a$ $ Question: How do we find out that what is the HEX ASCII char for the mentioned "<9a>" and "<83>" and "<9b>"? Or "¹" or "¥"...
The hex code is needed to remove it all from the file to make it cleaner. Example this code removes HEX ASCII "x09", so the "Horizontal Tab":
sed -i 's/[\x09]//g' toclean.txt We tried using "9A" or "A5" in hex, it didn't helped..
$ sed '57000044,57000044!d' toclean.txt | sed 's/[\x9A]//g; s/[\xA5]//g' zo���ge�o�graph�i�cal�ly zo���ge�o�graph�i�cal�ly $