0

I have a function that should clean up csv files included on my list:

fileListLocalDaily = (glob.glob("/path/to/my/directory/*.csv") for y in fileListLocalDaily: data = y def prepare_file(y): data = y lines = pd.read_csv(data, sep=",", quoting=csv.QUOTE_NONE) new_lines = lines.replace('something', '', regex=True) f = StringIO(data) # extract date date = next(f) date = date.split('_')[1] date = os.path.splitext(date)[0] new_lines.insert(2,'date',date) new_lines.drop(new_lines.columns[0:2], axis=1, inplace=True) new_lines.drop(new_lines.columns[6], axis=1, inplace=True) new_lines=new_lines.sort_values(by=['Something'], ascending=False) new_lines.to_csv('/path/to/my/output/'+date+'.csv', index = False) complete = prepare_file(data) runFunction = prepare_file(y) 

It seems that the above function saved only one file and kept overwriting it over and over in an endless loop. Could someone help me understand how could I run this function to all csv files in my directory one after one? thanks

3
  • Do all your files have the same date, which would cause the output file to be the same? Have you considered using the name of the input file to determine what to name the output file, since you know that all the input files have distinct names? Commented Sep 8, 2020 at 14:35
  • 1
    Looks like an indentation error. In its current structure, runFunction = prepare_file(y) should it inside the for loop. Also, move the function itself outside the loop. Commented Sep 8, 2020 at 14:35
  • Hi Green Cloak Guy, all my filenames are unique and have different dates at the end of the file name that looks something like _00-00-0000.csv Commented Sep 8, 2020 at 14:40

2 Answers 2

1

Based on the code you’ve provided, your loop isn’t actually doing anything. You’re defining the function over and over again, but you don’t call it in the loop because your indentation for the last line is outside of the loop. Your function also called itself at the end, so it was entering an infinite loop. You should define the function once and then call it inside of the loop:

def prepare_file(data): lines = pd.read_csv(data, sep=",", quoting=csv.QUOTE_NONE) new_lines = lines.replace('something', '', regex=True) f = StringIO(data) # extract date date = next(f) date = date.split('_')[1] date = os.path.splitext(date)[0] new_lines.insert(2,'date',date) new_lines.drop(new_lines.columns[0:2], axis=1, inplace=True) new_lines.drop(new_lines.columns[6], axis=1, inplace=True) new_lines=new_lines.sort_values(by=['Something'], ascending=False) new_lines.to_csv('/path/to/my/output/'+date+'.csv', index = False) fileListLocalDaily = (glob.glob("/path/to/my/directory/*.csv") for data in fileListLocalDaily: prepare_file(data) 

prepare_file doesn’t return anything, so the assignment operator was just assigning None there, so I removed the assignment. I also renamed y directly to data in both the loop and the function.

Sign up to request clarification or add additional context in comments.

Comments

0

I like to use os.walk to get all files recursively

import os top = '/path/to/my/directory' for root, dirs, files in os.walk(top): for name in files: if os.path.splitext(name) == ".csv": # do stuff with name here # use os.path.join(root, name) for the full file path 

1 Comment

This would be better placed as a comment, as it does not answer the question; but is a critique of the iteration method. Additionally, OP uses glob which returns the full file path, and will not need to join the parts, as walk requires. (I’m a walk to glob convertee ... so I understand where you’re coming from).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.