I would like to search for text with accents in files. I know that I can use grep for searching regular text:
grep -rnw './' -e 'KORONA' ...but it doesn't work for words with accent characters, like KORONAVÍRUS, obmedzená.
Any recommendation?
If the encoding of all the files is the same, you just need to write the searched sentence in that encoding. That brings up two possible conditions:
The encoding on the command line (or where the command is executed) (probably set by one of the locale variables LC_*) is the same as the encoding of all the files, then, just grep as usual:
grep -rn 'KORONAVÍRUS, obmedzená.' Use the -w option only if you want to match the whole line.
If the encoding of all files is different, change the search string to that encoding.
$ echo 'KORONAVÍRUS, obmedzená.' >orig $ grep -ran "$(cat orig | iconv -t CP1252)" Here, the -a option allows grep to search inside files with diferent encodings that may be detected as binary.
If the files could contain different encodings then there is no solution possible. There is no way to auto-detect a file encoding.
It is not possible to search inside a list of files if the files doesn't have an uniform encoding.
Related:
How to use grep/ack with files in arbitrary encoding?
grep -a "$(echo 'KORONAVÍRUS, obmedzená.' | iconv -t CP1252)" also works :)
KORONAV[[=I=]]RUS.will fail, as the "any character" has to be a character valid in the encoding being used to run the command.