Having spent quite a lot of time working in software companies that deal in medium to large data sets, I've seen that a choke point in their processes is often prepping data files for loading into databases. For example, searching for formatting errors, peculiar whitespace or line-end characters, wrangling with Unicode formats, that sort of thing.
Everyone I've ever known deals with this in a bespoke manner because the requirements are so varied and often unique to the companies involved.
I normally just fiddle about in a combination of hex editors, excel, powershell and SQL to get the job done. But this is such a perennial problem that I find it hard to believe that there isn't some standards already in place to take some of the basic grunt-work out of the process.
Is there a standard technique that is tailored for the purpose of cleaning data files?