I have 2 lists. One is a list of MD5 and SHA1 hashes from files from a computer I have (ListA). The other is a list of MD5 and SHA1 hashes I have downloaded form NSRL (ListB). Its a compilation of MD5 and SHA1 hashes from files included in many different applications.
I am trying to find a quick way to compare these lists to each other.
Just for the reference of performance, the hashes from the system is a 7.2gb text file and the NSRL hash list is approximately 20gb. I have a system with 32gb of ram to perform the processing, so it should have enough memory to load both files into memory if need be.
I've looked into Except, and also considered reading each line in from the ListA and comparing it to ListB, but there has to be a better way than this. Any ideas?
Also, this is a comparison of the hashes from a machine to known hashes from a hash database. Its pretty common practice in forensics (from what I understand), so I'm open to the suggestion of applications that exist to do this already.