I am splitting a text file using the number of lines as variable. I wrote this function in order to save in a temporary directory the spitted files. Each file has 4 millions of lines expect the last file.
import tempfile from itertools import groupby, count temp_dir = tempfile.mkdtemp() def tempfile_split(filename, temp_dir, chunk=4000000): with open(filename, 'r') as datafile: groups = groupby(datafile, key=lambda k, line=count(): next(line) // chunk) for k, group in groups: output_name = os.path.normpath(os.path.join(temp_dir + os.sep, "tempfile_%s.tmp" % k)) for line in group: with open(output_name, 'a') as outfile: outfile.write(line) the main problem is the speed of this function. In order to split one file of 8 million of lines in two files of 4 millions of line the time is than more of 30 min of my windows OS and Python 2.7