1. The files which are already in UTF-8 should not be changed 1
When I recently had this issue, I solved it by first finding all files in need of conversion.
I did this by excluding the files that should not be converted. This includes binary files, pure ASCII files (which by definition already have a valid UTF-8 encoding), and files that contain at least some valid non-ASCII UTF-8 characters.
In short, I recursively searched the files that probably should be converted :
$ find . -type f -name '*' -exec sh -c 'for n; do file -i "$n" | grep -Ev "binary|us-ascii|utf-8"; done' sh {} +
I had a subdirectory tree containing some 300 – 400 files. About half a dozen of them turned out to be wrongly encoded, and typically returned responses like :
./<some-path>/plain-text-file.txt: text/plain; charset=iso-8859-1 ./<some-other-path>/text-file.txt: text/plain; charset=unknown-8bit
Note how the encoding was either iso-8859-1, or unknown-8bit.
This makes sense – any non-ASCII Windows-1252 character can either be a valid ISO 8859-1 character – or – it can be one of the 27 characters in the 128 – 159 (x80 – x9F) range for which no printable ISO 8859-1 characters are defined.
1. a. A caveat with the find . -exec solution 2
A problem with the find . -exec solution is that it can be very slow – a problem that grows with the size of the subdirectory tree under scrutiny.
In my experience, it might be faster – potentially much faster – to run a number of commands instead of the single command suggested above, as follows :
$ file -i * | grep -Ev "binary|us-ascii|utf-8" $ file -i */* | grep -Ev "binary|us-ascii|utf-8" $ file -i */*/* | grep -Ev "binary|us-ascii|utf-8" $ file -i */*/*/* | grep -Ev "binary|us-ascii|utf-8" $ …
Continue increasing the depth in these commands until the response is something like this:
*/*/*/*/*/*/*: cannot open `*/*/*/*/*/*/*' (No such file or directory)
Once you see cannot open / (No such file or directory), it is clear that the entire subdirectory tree has been searched.
2. Convert the culprit files
Now that all suspicious files have been found, I prefer to use a text editor to help with the conversion, instead of using a command line tool like recode.
2. a. On Windows, consider using Notepad++
On Windows, I like to use Notepad++ for converting files.
Have a look at this excellent post if you need help on that.
2. b. On Linux or macOS, consider using Visual Studio Code
On Linux and macOS, try VS Code for converting files. I've given a few hints in this post.
References
1 Section 1 relies on using the file command, which unfortunately isn't completely reliable. As long as all your files are smaller than 64 kB, there shouldn't be any problem. For files (much) larger than 64 kB, there is a risk that non-ASCII files will falsely be identified as pure ASCII files. The fewer non-ASCII characters in such files, the bigger the risk that they will be wrongly identified. For more on this, see this post and its comments.
2 Subsection 1. a. is inspired by this answer.