0

I was encouraged to step out of my comfort zone and use python with little to no experience and now I'm stuck. I'm trying to compare two CSV files (fileA.csv and fileB.csv), and append any missing user rows to fileA.csv from fileB.csv. The only fields I can compare with are user's first and last names (in this case, it's row[0] and row[2] from each file).

From my understanding, you cannot append information to a file that you currently have open so I'm open to suggestions without having to create a third file (if possible). Below has me on the right track, but there's a lot of data so I'll need a loop. Please help.

import csv reader1 = csv.reader(open('fileA', 'rb'), delimiter=',', quotechar='|') row1 = reader1.next() reader2 = csv.reader(open('fileB', 'rb'), delimiter=',', quotechar='|') row2 = reader2.next() ##For Loop... if (row1[0] == row2[0]) and (row1[2] == row2[2]): ## Compare next else: ## Append entire row to fileA.csv 

Example FileA.csv:

John,Thomas,Doe,some,other,stuff Jane, ,Smith,some,other,stuff 

Example FileB.csv:

John, ,Doe,other,personal,data Jane,Elizabeth,Smith,other,personal,data Robin,T,Williams,other,personal,data 

The only row that should append from FileB to FileA is Robin's complete Row so that FileA looks like:

DesiredResult_FileA:

John,Thomas,Doe,some,other,stuff Jane, ,Smith,some,other,stuff Robin,T,Williams,other,personal,data 

2 Answers 2

1

Store the information found in file A in memory first, in a set.

Then, reopen file A in append mode, and loop over file B. Any name from B not found in the set, can then be added to file A:

csv_dialect = dict(delimiter=',', quotechar='|') names = set() with open('fileA', 'rb') as file_a: reader1 = csv.reader(file_a, **csv_dialect) next(reader1) for row in reader1: names.add((row[0], row[2])) # `names` is now a set of all names (taken from columns 0 and 2) found in file A. with open('fileA', 'ab') as file_a, open('fileB', 'rb') as file_b: writer = csv.writer(file_a, **csv_dialect) reader2 = csv.reader(file_b, **csv_dialect) next(reader2) for row in reader2: if (row[0], row[2]) not in names: # This row was not present in file A, add it. writer.writerow(row) 

The combined with line requires Python 2.7 or newer. In earlier Python versions, simply nest the two statements:

with open('fileA', 'ab') as file_a: with open('fileB', 'rb') as file_b: # etc. 
Sign up to request clarification or add additional context in comments.

2 Comments

Is that syntax correct? file_a, open('fileB', 'rb') as file_b:
@justin: It is in Python 2.7. In earlier versions, just nest the two (so make them separate lines and indent one level extra).
0

You can try pandas, that might help you handle csv files easier, and seems its more readable:

import pandas as pd df1 = pd.read_csv('FileA.csv', header=None) df2 = pd.read_csv('FileB.csv', header=None) for i in df2.index: # Don't append if that row is existed in FileA if i in df1.index: if df1.ix[i][0] == df2.ix[i][0] and df1.ix[i][2] == df2.ix[i][2]: continue df1 = df1.append(df2.ix[i]) df1.to_csv('FileA.csv', index=None, header=None) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.