i'll try to explain what i want to achieve with my code:
- i open a csv file
- i pick up every element of the first row and search for this string in every file in every subdirectory starting from rootdir.
with the design showed below, it is extremely slow even with 2 directories and one file in each directory. It takes approximately 1 second for each entry on the main file. i've got 400000 records on that file...
import csv import os rootdir = 'C:\Users\ST\Desktop\Sample' f = open('C:\Users\ST\Desktop\inputIds.csv') f.readline() snipscsv_f=csv.reader(f, delimiter=' ') for row in snipscsv_f: print 'processing another ID' for subdir, dir, files in os.walk(rootdir): print 'processing another folder' for file in files: print 'processing another file' if 'csv' in file: #i want only csv files to be processed ft = open(os.path.join(subdir, file)) for ftrow in ft: if row[0] in ftrow: print row[0] ft.close()
breakout of the loop so you don't read the rest of the file. If the files you're searching are small enough, you may also get some speedup by reading them into memory instead of line by line.cut -d, -f1 inputIds.csv > Ids.txtthengrep -f Ids.txt -r *.csv? (Edit, oh right, Windows. unxutils.sourceforge.net has Win32 builds of GNU utils for cut and grep, if you want)