How to run this python function for each file in my directory?

Question

I have a function that should clean up csv files included on my list:

fileListLocalDaily = (glob.glob("/path/to/my/directory/*.csv") for y in fileListLocalDaily: data = y def prepare_file(y): data = y lines = pd.read_csv(data, sep=",", quoting=csv.QUOTE_NONE) new_lines = lines.replace('something', '', regex=True) f = StringIO(data) # extract date date = next(f) date = date.split('_')[1] date = os.path.splitext(date)[0] new_lines.insert(2,'date',date) new_lines.drop(new_lines.columns[0:2], axis=1, inplace=True) new_lines.drop(new_lines.columns[6], axis=1, inplace=True) new_lines=new_lines.sort_values(by=['Something'], ascending=False) new_lines.to_csv('/path/to/my/output/'+date+'.csv', index = False) complete = prepare_file(data) runFunction = prepare_file(y)

It seems that the above function saved only one file and kept overwriting it over and over in an endless loop. Could someone help me understand how could I run this function to all csv files in my directory one after one? thanks

Do all your files have the same date, which would cause the output file to be the same? Have you considered using the name of the input file to determine what to name the output file, since you know that all the input files have distinct names? — Green Cloak Guy
– Green Cloak Guy, Commented Sep 8, 2020 at 14:35
Looks like an indentation error. In its current structure, runFunction = prepare_file(y) should it inside the for loop. Also, move the function itself outside the loop. — s3dev
– s3dev, Commented Sep 8, 2020 at 14:35
Hi Green Cloak Guy, all my filenames are unique and have different dates at the end of the file name that looks something like _00-00-0000.csv — Baobab1988
– Baobab1988, Commented Sep 8, 2020 at 14:40

dantiston · Accepted Answer · 2020-09-09 05:25:40Z

Based on the code you’ve provided, your loop isn’t actually doing anything. You’re defining the function over and over again, but you don’t call it in the loop because your indentation for the last line is outside of the loop. Your function also called itself at the end, so it was entering an infinite loop. You should define the function once and then call it inside of the loop:

def prepare_file(data): lines = pd.read_csv(data, sep=",", quoting=csv.QUOTE_NONE) new_lines = lines.replace('something', '', regex=True) f = StringIO(data) # extract date date = next(f) date = date.split('_')[1] date = os.path.splitext(date)[0] new_lines.insert(2,'date',date) new_lines.drop(new_lines.columns[0:2], axis=1, inplace=True) new_lines.drop(new_lines.columns[6], axis=1, inplace=True) new_lines=new_lines.sort_values(by=['Something'], ascending=False) new_lines.to_csv('/path/to/my/output/'+date+'.csv', index = False) fileListLocalDaily = (glob.glob("/path/to/my/directory/*.csv") for data in fileListLocalDaily: prepare_file(data)

prepare_file doesn’t return anything, so the assignment operator was just assigning None there, so I removed the assignment. I also renamed y directly to data in both the loop and the function.

theParanoidAndroid · Accepted Answer · 2020-09-08 14:40:56Z

0

I like to use os.walk to get all files recursively

import os top = '/path/to/my/directory' for root, dirs, files in os.walk(top): for name in files: if os.path.splitext(name) == ".csv": # do stuff with name here # use os.path.join(root, name) for the full file path

answered Sep 8, 2020 at 14:40

theParanoidAndroid

4072 gold badges8 silver badges18 bronze badges

1 Comment

s3dev Over a year ago

This would be better placed as a comment, as it does not answer the question; but is a critique of the iteration method. Additionally, OP uses glob which returns the full file path, and will not need to join the parts, as walk requires. (I’m a walk to glob convertee ... so I understand where you’re coming from).

Collectives™ on Stack Overflow

How to run this python function for each file in my directory?

2 Answers 2

Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Related