0

Trying to write a script that exclude only rows from csv files under a specific directory, that is present in another csv file, and redirect the output to another csv. This something like an exception rule to apply.

Like from below input with considering the exception as below:

inDirectory/input.csv: Id Name Location Data Services Action 10 John IN 1234 mail active 12 Samy GR 5678 phone disable 28 Doug UK 9123 phone active excDirectory/exception.csv: 12 Samy GR 5678 phone disable 

Wanted to redirect output as below:

outDirectory/output.csv: Id Name Location Data Services Action 10 John IN 1234 mail active 28 Doug UK 9123 phone active 

All i am able to write as below, which is incomplete and i am looking for a solution that perform the same. Any idea? i am very much new to Python scripting.

import pandas as pd inDir = os.listdir('csv_out_tmp') excFile = pd.read_csv('exclude/exception.csv', sep=',', index_col=0) for csv in inDir: inFile = pd.read_csv('csv_out_tmp/' + csv) diff = set(inFile)^set(excFile) df[diff].to_csv('csv_out/' + csv, index=False) 

Another way code i am writing as per @neotrinity

inDir = os.listdir('csv_out_tmp') excFile = 'exclude/exception.csv' for csv in inDir: inFile = open('csv_out_tmp/' + csv) excRow = set(open(excFile)) with open('csv_out/' + csv, 'w') as f: for row in open(inFile): if row not in excRow: f.write(row) 

With the above code the error i am getting as below

for row in open(inFile): TypeError: coercing to Unicode: need string or buffer, file found 
8
  • And what should the exception be? Commented Sep 13, 2018 at 8:52
  • the exception would be if the same row or line present in the exception.csv. So the output csv can contain only rows or lines that is not present in the exception.csv. Commented Sep 13, 2018 at 8:56
  • 1
    Possible duplicate of : stackoverflow.com/questions/36891977/… Commented Sep 13, 2018 at 9:08
  • I have just updated the script section as per the post @Max shared. Please let me know if that is the appropriate way of writing or i need to modify. Commented Sep 13, 2018 at 9:27
  • Post shared : try set(InDir)^set(excFile) (for those that don't know what I shared as I deleted it before [never used pandas before so didn't think it'd be helpful]) Commented Sep 13, 2018 at 9:31

1 Answer 1

-1

To be honest you do not need pandas for this.

in_file = 'in.csv' out_file = 'out.csv' exception_file = 'exp.csv' exception_rows = set(open(exception_file)) with open('out.csv', 'w') as f: for row in open('in.csv'): if row not in exception_rows: f.write(row) 

this uses the least resources. only load the exception file in memory.

iterate over the input file row by row and write into the output file (row by row) consuming the least memory.

after that you can use pandas for your data analytics.

Sign up to request clarification or add additional context in comments.

6 Comments

its just a snippet. Not writing prod code ¯_(ツ)_/¯. The outfile is "properly" closed via the context manager. The same logic applies.
It's not "just a snippet", it's an answer in a technical knowledge database that will be read by people of all levels of experience - including complete newbies - and as such should show good practices.
well that should have been the first comment. Not a condescending message followed by a down vote. I would have gladly updated the snippet.
@neotrinity While doing so with the below modification, getting TypeError: coercing to Unicode: need string or buffer, set found. Here is the code inDir = os.listdir('csv_out_tmp') excFile = 'exclude/exception.csv' for csv in inDir: inFile = set(open('csv_out_tmp/' + csv)) excRow = set(open(excFile)) with open('csv_out/' + csv, 'w') as f: for row in open(inFile): if row not in excRow: f.write(row)
this line inFile = set(open('csv_out_tmp/' + csv)) is incorrect. Should just be inFile = open('csv_out_tmp/' + csv). However as the other commenter has mentioned. Its better open the files via a context manager.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.