Here's the best O(n) greedy algorithm I got for now. The idea is to greedily append items from the list to a chunk until the sum for the current chunk exceeds the average expected sum for a chunk at that point. The average expected sum is updated constantly. This solution is not perfect, but as I said, it is O(n) and worked not bad with my tests. I am eager to hear feedback and suggestions for improvement.
I left my debug print statements in the code to provide some documentation. Feel free to comment them in to see what's going on in each step.
CODE
def split_list(lst, chunks): #print(lst) #print() chunks_yielded = 0 total_sum = sum(lst) avg_sum = total_sum/float(chunks) chunk = [] chunksum = 0 sum_of_seen = 0 for i, item in enumerate(lst): #print('start of loop! chunk: {}, index: {}, item: {}, chunksum: {}'.format(chunk, i, item, chunksum)) if chunks - chunks_yielded == 1: #print('must yield the rest of the list! chunks_yielded: {}'.format(chunks_yielded)) yield chunk + lst[i:] raise StopIteration to_yield = chunks - chunks_yielded chunks_left = len(lst) - i if to_yield > chunks_left: #print('must yield remaining list in single item chunks! to_yield: {}, chunks_left: {}'.format(to_yield, chunks_left)) if chunk: yield chunk yield from ([x] for x in lst[i:]) raise StopIteration sum_of_seen += item if chunksum < avg_sum: #print('appending {} to chunk {}'.format(item, chunk)) chunk.append(item) chunksum += item else: #print('yielding chunk {}'.format(chunk)) yield chunk # update average expected sum, because the last yielded chunk was probably not perfect: avg_sum = (total_sum - sum_of_seen)/(to_yield - 1) chunks_yielded += 1 chunksum = item chunk = [item]
TEST CODE
import random lst = [1, 6, 2, 3, 4, 1, 7, 6, 4] #lst = [random.choice(range(1,101)) for _ in range(100)] chunks = 3 print('list: {}, avg sum: {}, chunks: {}\n'.format(lst, sum(lst)/float(chunks), chunks)) for chunk in split_list(lst, chunks): print('chunk: {}, sum: {}'.format(chunk, sum(chunk)))
TESTS with your list:
list: [1, 6, 2, 3, 4, 1, 7, 6, 4], avg sum: 17.0, chunks: 2 chunk: [1, 6, 2, 3, 4, 1], sum: 17 chunk: [7, 6, 4], sum: 17 --- list: [1, 6, 2, 3, 4, 1, 7, 6, 4], avg sum: 11.33, chunks: 3 chunk: [1, 6, 2, 3], sum: 12 chunk: [4, 1, 7], sum: 12 chunk: [6, 4], sum: 10 --- list: [1, 6, 2, 3, 4, 1, 7, 6, 4], avg sum: 8.5, chunks: 4 chunk: [1, 6, 2], sum: 9 chunk: [3, 4, 1], sum: 8 chunk: [7], sum: 7 chunk: [6, 4], sum: 10 --- list: [1, 6, 2, 3, 4, 1, 7, 6, 4], avg sum: 6.8, chunks: 5 chunk: [1, 6], sum: 7 chunk: [2, 3, 4], sum: 9 chunk: [1, 7], sum: 8 chunk: [6], sum: 6 chunk: [4], sum: 4
TESTS with random lists of length 100 and elements from 1 to 100 (printing of the random list omitted):
avg sum: 2776.0, chunks: 2 chunk: [25, 8, 71, 39, 5, 69, 29, 64, 31, 2, 90, 73, 72, 58, 52, 19, 64, 34, 16, 8, 16, 89, 70, 67, 63, 36, 9, 87, 38, 33, 22, 73, 66, 93, 46, 48, 65, 55, 81, 92, 69, 94, 43, 68, 98, 70, 28, 99, 92, 69, 24, 74], sum: 2806 chunk: [55, 55, 64, 93, 97, 53, 85, 100, 66, 61, 5, 98, 43, 74, 99, 56, 96, 74, 63, 6, 89, 82, 8, 25, 36, 68, 89, 84, 10, 46, 95, 41, 54, 39, 21, 24, 8, 82, 72, 51, 31, 48, 33, 77, 17, 69, 50, 54], sum: 2746 --- avg sum: 1047.6, chunks: 5 chunk: [19, 76, 96, 78, 12, 33, 94, 10, 38, 87, 44, 76, 28, 18, 26, 29, 44, 98, 44, 32, 80], sum: 1062 chunk: [48, 70, 42, 85, 87, 55, 44, 11, 50, 48, 47, 50, 1, 17, 93, 78, 25, 10, 89, 57, 85], sum: 1092 chunk: [30, 83, 99, 62, 48, 66, 65, 98, 94, 54, 14, 97, 58, 53, 3, 98], sum: 1022 chunk: [80, 34, 63, 20, 27, 36, 98, 97, 7, 6, 9, 65, 91, 93, 2, 27, 83, 35, 65, 17, 26, 41], sum: 1022 chunk: [80, 80, 42, 32, 44, 42, 94, 31, 50, 23, 34, 84, 47, 10, 54, 59, 72, 80, 6, 76], sum: 1040 --- avg sum: 474.6, chunks: 10 chunk: [4, 41, 47, 41, 32, 51, 81, 5, 3, 37, 40, 26, 10, 70], sum: 488 chunk: [54, 8, 91, 42, 35, 80, 13, 84, 14, 23, 59], sum: 503 chunk: [39, 4, 38, 40, 88, 69, 10, 19, 28, 97, 81], sum: 513 chunk: [19, 55, 21, 63, 99, 93, 39, 47, 29], sum: 465 chunk: [65, 88, 12, 94, 7, 47, 14, 55, 28, 9, 98], sum: 517 chunk: [19, 1, 98, 84, 92, 99, 11, 53], sum: 457 chunk: [85, 79, 69, 78, 44, 6, 19, 53], sum: 433 chunk: [59, 20, 64, 55, 2, 65, 44, 90, 37, 26], sum: 462 chunk: [78, 66, 32, 76, 59, 47, 82], sum: 440 chunk: [34, 56, 66, 27, 1, 100, 16, 5, 97, 33, 33], sum: 468 --- avg sum: 182.48, chunks: 25 chunk: [55, 6, 16, 42, 85], sum: 204 chunk: [30, 68, 3, 94], sum: 195 chunk: [68, 96, 23], sum: 187 chunk: [69, 19, 12, 97], sum: 197 chunk: [59, 88, 49], sum: 196 chunk: [1, 16, 13, 12, 61, 77], sum: 180 chunk: [49, 75, 44, 43], sum: 211 chunk: [34, 86, 9, 55], sum: 184 chunk: [25, 82, 12, 93], sum: 212 chunk: [32, 74, 53, 31], sum: 190 chunk: [13, 15, 26, 31, 35, 3, 14, 71], sum: 208 chunk: [81, 92], sum: 173 chunk: [94, 21, 34, 71], sum: 220 chunk: [1, 55, 70, 3, 92], sum: 221 chunk: [38, 59, 56, 57], sum: 210 chunk: [7, 20, 10, 81, 100], sum: 218 chunk: [5, 71, 19, 8, 82], sum: 185 chunk: [95, 14, 72], sum: 181 chunk: [2, 8, 4, 47, 75, 17], sum: 153 chunk: [56, 69, 42], sum: 167 chunk: [75, 45], sum: 120 chunk: [68, 60], sum: 128 chunk: [29, 25, 62, 3, 50], sum: 169 chunk: [54, 63], sum: 117 chunk: [57, 37, 42], sum: 136
As you can see, as expected it gets worse the more chunks you want to generate. I hope I was able to help a bit.
edit: The yield from syntax requires Python 3.3 or newer, if you are using an older version just turn the statement into a normal for loop.
Split a list in n sublists such that the sum of values differ by a minimum? do you need the sublists or the indexes?