I have a file of genomic data that is approximately 5 million lines long and should have only the characters A, T, C, and G in it. The problem is, I know how large the file should be, but it's slightly larger than that. Which means, something went wrong in an analysis, or there are lines that contain something other than genomic data.
Is there a way to find any line that has something other than an A, T, C, or G? Due to the nature of the file, any other letter, spaces, numbers, symbols shouldn't be present. I've gone through searching symbol by symbol, so I was hoping there would be an easier way.