I am splitting a csv into two csvs, based on a value in a column in the original csv. This code works, but takes about an hour to run on a csv with about 10000 records. I have tried enumerating the list, but I don't think that was the correct approach to speeding this up.
I am extremely slow and very new to this programming and would appreciate if someone would be able to explain where to focus my next efforts to make this faster. I know the least number of lines is best, but I don't understand how to loop through when creating two separate csvs. Is the loop even the issue here?
myList = ['2','12','20','33'...] with open(originalCSV, 'rb') as f: reader = csv.DictReader(f) rows = [row for row in reader if row['Column 10'] in myList] for row in rows: with open(inmylistCSV, 'wb') as w: fieldnames = ['Column 1', 'Column 2', 'Column 5', 'Column 10'] csvwriter = csv.DictWriter(w, fieldnames=fieldnames) csvwriter.writeheader() csvwriter.writerows(rows) with open(originalCSV, 'rb') as f: reader = csv.DictReader(f) rows = [row for row in reader if row['Column 10'] not in myList] for row in rows: with open(notinmylistCSV, 'wb') as w: fieldnames = ['Column 1', 'Column 2', 'Column 5', 'Column 10'] csvwriter = csv.DictWriter(w, fieldnames=fieldnames) csvwriter.writeheader() csvwriter.writerows(rows)