slow python script when looping through a list

Question

I am splitting a csv into two csvs, based on a value in a column in the original csv. This code works, but takes about an hour to run on a csv with about 10000 records. I have tried enumerating the list, but I don't think that was the correct approach to speeding this up.

I am extremely slow and very new to this programming and would appreciate if someone would be able to explain where to focus my next efforts to make this faster. I know the least number of lines is best, but I don't understand how to loop through when creating two separate csvs. Is the loop even the issue here?

myList = ['2','12','20','33'...] with open(originalCSV, 'rb') as f: reader = csv.DictReader(f) rows = [row for row in reader if row['Column 10'] in myList] for row in rows: with open(inmylistCSV, 'wb') as w: fieldnames = ['Column 1', 'Column 2', 'Column 5', 'Column 10'] csvwriter = csv.DictWriter(w, fieldnames=fieldnames) csvwriter.writeheader() csvwriter.writerows(rows) with open(originalCSV, 'rb') as f: reader = csv.DictReader(f) rows = [row for row in reader if row['Column 10'] not in myList] for row in rows: with open(notinmylistCSV, 'wb') as w: fieldnames = ['Column 1', 'Column 2', 'Column 5', 'Column 10'] csvwriter = csv.DictWriter(w, fieldnames=fieldnames) csvwriter.writeheader() csvwriter.writerows(rows)

AlanSTACK · Accepted Answer · 2023-02-04 18:07:23Z

The issue is that you are repeating the loop for the 10,000 records twice, resulting in doing twice the amount of work, which is 20,000 records.

# This is what your doing for x in range(10000): if is_odd(x): print('I am odd') for x in range(10000): if is_even(x): print('I am even')

A simple fix would be simply to combine your logic into this

# This is what you should be doing for x in range(10000): if is_odd(x): print('I am odd') else: print('I am even')

So, in conclusion, you have 2 things you should do right now

combine the following lines logically

rows = [row for row in reader if row['Column 10'] in myList] rows = [row for row in reader if row['Column 10'] not in myList]

optimize the csv writing portion of code

with open(notinmylistCSV | inmylistCSV, 'wb') as w: fieldnames = ['Column 1', 'Column 2', 'Column 5', 'Column 10'] csvwriter = csv.DictWriter(w, fieldnames=fieldnames) csvwriter.writeheader() csvwriter.writerows(rows)

Chris Curvey · Accepted Answer · 2016-03-09 17:02:17Z

why not just read through the original CSV and distribute the rows to the other CSVs?

myList = ['2','12','20','33'...] fieldnames = ['Column 1', 'Column 2', 'Column 5', 'Column 10'] in_list = open(inmylistCSV, 'wb') in_list_csvwriter = csv.DictWriter(in_list, fieldnames=fieldnames) in_list_csvwriter.writeheader() not_in_list = with open(notinmylistCSV, 'wb') not_in_list_csvwriter = csv.DictWriter(not_in_list, fieldnames=fieldnames) not_in_list_csvwriter.writeheader() with open(originalCSV, 'rb') as f: reader = csv.DictReader(f) for row in reader: if row['Column 10'] in myList: in_list_csvwriter.writerow(row) else: not_in_list_csvwriter.writerow(row)

Thank you, I tired this too and it also worked perfectly. Much appreciated

Collectives™ on Stack Overflow

slow python script when looping through a list

2 Answers 2

1 Comment

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Related