Can someone please explain the groupby operation and the lambda function being used on this SO post?
key=lambda k, line=count(): next(line) // chunk
import tempfile from itertools import groupby, count temp_dir = tempfile.mkdtemp() def tempfile_split(filename, temp_dir, chunk=4000000): with open(filename, 'r') as datafile: # The itertools.groupby() function takes a sequence and a key function, # and returns an iterator that generates pairs. # Each pair contains the result of key_function(each item) and # another iterator containing all the items that shared that key result. groups = groupby(datafile, key=lambda k, line=count(): next(line) // chunk) for k, group in groups: print(key, list(group)) output_name = os.path.normpath(os.path.join(temp_dir + os.sep, "tempfile_%s.tmp" % k)) for line in group: with open(output_name, 'a') as outfile: outfile.write(line) Edit: It took me a while to wrap my head around the lambda function used with groupby. I don't think I understood either of them very well.
Martijn explained it really well, however I have a follow up question. Why is line=count() passed as an argument to the lambda function every time? I tried assigning the variable line to count() just once, outside the function.
line = count() groups = groupby(datafile, key=lambda k, line: next(line) // chunk) and it resulted in TypeError: <lambda>() missing 1 required positional argument: 'line'
Also, calling next on count() directly within the lambda expression, resulted in all the lines in the input file getting bunched together i.e a single key was generated by the groupby function.
groups = groupby(datafile, key=lambda k: next(count()) // chunk) I'm learning Python on my own, so any help or pointers to reference materials /PyCon talks are much appreciated. Anything really!