0

I am splitting a csv into two csvs, based on a value in a column in the original csv. This code works, but takes about an hour to run on a csv with about 10000 records. I have tried enumerating the list, but I don't think that was the correct approach to speeding this up.

I am extremely slow and very new to this programming and would appreciate if someone would be able to explain where to focus my next efforts to make this faster. I know the least number of lines is best, but I don't understand how to loop through when creating two separate csvs. Is the loop even the issue here?

myList = ['2','12','20','33'...] with open(originalCSV, 'rb') as f: reader = csv.DictReader(f) rows = [row for row in reader if row['Column 10'] in myList] for row in rows: with open(inmylistCSV, 'wb') as w: fieldnames = ['Column 1', 'Column 2', 'Column 5', 'Column 10'] csvwriter = csv.DictWriter(w, fieldnames=fieldnames) csvwriter.writeheader() csvwriter.writerows(rows) with open(originalCSV, 'rb') as f: reader = csv.DictReader(f) rows = [row for row in reader if row['Column 10'] not in myList] for row in rows: with open(notinmylistCSV, 'wb') as w: fieldnames = ['Column 1', 'Column 2', 'Column 5', 'Column 10'] csvwriter = csv.DictWriter(w, fieldnames=fieldnames) csvwriter.writeheader() csvwriter.writerows(rows) 

2 Answers 2

2

The issue is that you are repeating the loop for the 10,000 records twice, resulting in doing twice the amount of work, which is 20,000 records.

# This is what your doing for x in range(10000): if is_odd(x): print('I am odd') for x in range(10000): if is_even(x): print('I am even') 

A simple fix would be simply to combine your logic into this

# This is what you should be doing for x in range(10000): if is_odd(x): print('I am odd') else: print('I am even') 

So, in conclusion, you have 2 things you should do right now

  1. combine the following lines logically
rows = [row for row in reader if row['Column 10'] in myList] rows = [row for row in reader if row['Column 10'] not in myList] 
  1. optimize the csv writing portion of code
with open(notinmylistCSV | inmylistCSV, 'wb') as w: fieldnames = ['Column 1', 'Column 2', 'Column 5', 'Column 10'] csvwriter = csv.DictWriter(w, fieldnames=fieldnames) csvwriter.writeheader() csvwriter.writerows(rows) 
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for the explanation, this solved my problem!
0

why not just read through the original CSV and distribute the rows to the other CSVs?

myList = ['2','12','20','33'...] fieldnames = ['Column 1', 'Column 2', 'Column 5', 'Column 10'] in_list = open(inmylistCSV, 'wb') in_list_csvwriter = csv.DictWriter(in_list, fieldnames=fieldnames) in_list_csvwriter.writeheader() not_in_list = with open(notinmylistCSV, 'wb') not_in_list_csvwriter = csv.DictWriter(not_in_list, fieldnames=fieldnames) not_in_list_csvwriter.writeheader() with open(originalCSV, 'rb') as f: reader = csv.DictReader(f) for row in reader: if row['Column 10'] in myList: in_list_csvwriter.writerow(row) else: not_in_list_csvwriter.writerow(row) 

1 Comment

Thank you, I tired this too and it also worked perfectly. Much appreciated

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.