Python to exclude only rows from a csv file that is present in another csv file

Question

Trying to write a script that exclude only rows from csv files under a specific directory, that is present in another csv file, and redirect the output to another csv. This something like an exception rule to apply.

Like from below input with considering the exception as below:

inDirectory/input.csv: Id Name Location Data Services Action 10 John IN 1234 mail active 12 Samy GR 5678 phone disable 28 Doug UK 9123 phone active excDirectory/exception.csv: 12 Samy GR 5678 phone disable

Wanted to redirect output as below:

outDirectory/output.csv: Id Name Location Data Services Action 10 John IN 1234 mail active 28 Doug UK 9123 phone active

All i am able to write as below, which is incomplete and i am looking for a solution that perform the same. Any idea? i am very much new to Python scripting.

import pandas as pd inDir = os.listdir('csv_out_tmp') excFile = pd.read_csv('exclude/exception.csv', sep=',', index_col=0) for csv in inDir: inFile = pd.read_csv('csv_out_tmp/' + csv) diff = set(inFile)^set(excFile) df[diff].to_csv('csv_out/' + csv, index=False)

Another way code i am writing as per @neotrinity

inDir = os.listdir('csv_out_tmp') excFile = 'exclude/exception.csv' for csv in inDir: inFile = open('csv_out_tmp/' + csv) excRow = set(open(excFile)) with open('csv_out/' + csv, 'w') as f: for row in open(inFile): if row not in excRow: f.write(row)

With the above code the error i am getting as below

for row in open(inFile): TypeError: coercing to Unicode: need string or buffer, file found

the exception would be if the same row or line present in the exception.csv. So the output csv can contain only rows or lines that is not present in the exception.csv. — Rio
– Rio, Commented Sep 13, 2018 at 8:56
Possible duplicate of : stackoverflow.com/questions/36891977/… — Max
– Max, Commented Sep 13, 2018 at 9:08
I have just updated the script section as per the post @Max shared. Please let me know if that is the appropriate way of writing or i need to modify. — Rio
– Rio, Commented Sep 13, 2018 at 9:27
Post shared : try set(InDir)^set(excFile) (for those that don't know what I shared as I deleted it before [never used pandas before so didn't think it'd be helpful]) — Max
– Max, Commented Sep 13, 2018 at 9:31

neotrinity · Accepted Answer · 2018-09-13 10:38:48Z

-1

To be honest you do not need pandas for this.

in_file = 'in.csv' out_file = 'out.csv' exception_file = 'exp.csv' exception_rows = set(open(exception_file)) with open('out.csv', 'w') as f: for row in open('in.csv'): if row not in exception_rows: f.write(row)

this uses the least resources. only load the exception file in memory.

iterate over the input file row by row and write into the output file (row by row) consuming the least memory.

after that you can use pandas for your data analytics.

answered Sep 13, 2018 at 10:38

neotrinity

2301 silver badge5 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

neotrinity Over a year ago

its just a snippet. Not writing prod code ¯_(ツ)_/¯. The outfile is "properly" closed via the context manager. The same logic applies.

bruno desthuilliers Over a year ago

It's not "just a snippet", it's an answer in a technical knowledge database that will be read by people of all levels of experience - including complete newbies - and as such should show good practices.

neotrinity Over a year ago

well that should have been the first comment. Not a condescending message followed by a down vote. I would have gladly updated the snippet.

Rio Over a year ago

@neotrinity While doing so with the below modification, getting TypeError: coercing to Unicode: need string or buffer, set found. Here is the code

inDir = os.listdir('csv_out_tmp') excFile = 'exclude/exception.csv' for csv in inDir: inFile = set(open('csv_out_tmp/' + csv)) excRow = set(open(excFile)) with open('csv_out/' + csv, 'w') as f: for row in open(inFile): if row not in excRow: f.write(row)

neotrinity Over a year ago

this line inFile = set(open('csv_out_tmp/' + csv)) is incorrect. Should just be inFile = open('csv_out_tmp/' + csv). However as the other commenter has mentioned. Its better open the files via a context manager.

|

Collectives™ on Stack Overflow

Python to exclude only rows from a csv file that is present in another csv file

1 Answer 1

6 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Linked

Related