CSV_1.csv has the structure:
ABC DEF GHI JKL MNO PQR CSV_2.csv has the structure:
XYZ DEF ABC CSV_2.csv is a lot smaller than CSV_1.csv and a lot of the rows that exist in CSV_2.csv appears in CSV_1.csv. I want to figure out if there are rows that exist in CSV_2.csv but not in CSV_1.csv.
These files are not sorted.
The bigger csv has closer to 10 million rows, the smaller table has around 7 million rows.
How would I go about doing this? I tried python but taking each row from CSV_2.csv and comparing with 10 million rows in CSV_1.csv takes a lot of time.
Here is what I tried in python:
with open('old.csv', 'r') as t1, open('new.csv', 'r') as t2: fileone = t1.readlines() filetwo = t2.readlines() with open('update.csv', 'a') as outFile: for line in filetwo: if line not in fileone: outFile.write(line) awk comes to mind. What would the exact code be for awk?