Split a list of numbers into n chunks such that the chunks have (close to) equal sums and keep the original order

Question

This is not the standard partitioning problem, as I need to maintain the order of elements in the list.

So for example if I have a list

[1, 6, 2, 3, 4, 1, 7, 6, 4]

and I want two chunks, then the split should give

[[1, 6, 2, 3, 4, 1], [7, 6, 4]]

for a sum of 17 on each side. For three chunks the result would be

[[1, 6, 2, 3], [4, 1, 7], [6, 4]]

for sums of 12, 12, and 10.

Edit for additional explanation

I currently divide the sum with the number of chunks and use that as a target, then iterate till I get close to that target. The problem is that certain data sets can mess the algorithm up, for example trying to divide the following into 3:-

[95, 15, 75, 25, 85, 5]

Sum is 300, target is 100. The first chunk would sum to 95, second would be sum to 90, third would sum to 110, and 5 would be 'leftover'. Appending it where it's supposed to be would give 95, 90, 115, where a more 'reasonable' solution would be 110, 100, 90.

end edit

Background:

I have a list containing text (song lyrics) of varying heights, and I want to divide the text into an arbitrary number of columns. Currently I calculate a target height based on the total height of all lines, but obviously this is a consistent underestimate, which in some cases results in a suboptimal solution (the last column is significantly taller).

Also, do you want this for two sublists or arbitrary sublists? — erip
– erip, Commented Feb 19, 2016 at 23:40
do you think the problem could be reworded as Split a list in n sublists such that the sum of values differ by a minimum? do you need the sublists or the indexes? — Pynchia
– Pynchia, Commented Feb 19, 2016 at 23:41
I think this is a very interesting problem and I might have a greedy approach that runs in O(n) for any given number of chunks. I'll report back tomorrow. — timgeb
– timgeb, Commented Feb 20, 2016 at 0:37

Shawn Sullivan · Accepted Answer · 2016-02-22 23:21:23Z

This approach defines partition boundaries that divide the array in roughly equal numbers of elements, and then repeatedly searches for better partitionings until it can't find any more. It differs from most of the other posted solutions in that it looks to find an optimal solution by trying multiple different partitionings. The other solutions attempt to create a good partition in a single pass through the array, but I can't think of a single pass algorithm that's guaranteed optimal.

The code here is an efficient implementation of this algorithm, but it can be hard to understand so a more readable version is included as an addendum at the end.

def partition_list(a, k): if k <= 1: return [a] if k >= len(a): return [[x] for x in a] partition_between = [(i+1)*len(a)/k for i in range(k-1)] average_height = float(sum(a))/k best_score = None best_partitions = None count = 0 while True: starts = [0]+partition_between ends = partition_between+[len(a)] partitions = [a[starts[i]:ends[i]] for i in range(k)] heights = map(sum, partitions) abs_height_diffs = map(lambda x: abs(average_height - x), heights) worst_partition_index = abs_height_diffs.index(max(abs_height_diffs)) worst_height_diff = average_height - heights[worst_partition_index] if best_score is None or abs(worst_height_diff) < best_score: best_score = abs(worst_height_diff) best_partitions = partitions no_improvements_count = 0 else: no_improvements_count += 1 if worst_height_diff == 0 or no_improvements_count > 5 or count > 100: return best_partitions count += 1 move = -1 if worst_height_diff < 0 else 1 bound_to_move = 0 if worst_partition_index == 0\ else k-2 if worst_partition_index == k-1\ else worst_partition_index-1 if (worst_height_diff < 0) ^ (heights[worst_partition_index-1] > heights[worst_partition_index+1])\ else worst_partition_index direction = -1 if bound_to_move < worst_partition_index else 1 partition_between[bound_to_move] += move * direction def print_best_partition(a, k): print 'Partitioning {0} into {1} partitions'.format(a, k) p = partition_list(a, k) print 'The best partitioning is {0}\n With heights {1}\n'.format(p, map(sum, p)) a = [1, 6, 2, 3, 4, 1, 7, 6, 4] print_best_partition(a, 1) print_best_partition(a, 2) print_best_partition(a, 3) print_best_partition(a, 4) b = [1, 10, 10, 1] print_best_partition(b, 2) import random c = [random.randint(0,20) for x in range(100)] print_best_partition(c, 10) d = [95, 15, 75, 25, 85, 5] print_best_partition(d, 3)

There may be some modifications to make depending on what you are doing with this. For example, to determine whether the best partitioning has been found, this algorithm stops when there is no height difference among partitions, it doesn't find anything better than the best thing it's seen for more than 5 iterations in a row, or after 100 total iterations as a catch-all stopping point. You may need to adjust those constants or use a different scheme. If your heights form a complex landscape of values, knowing when to stop can get into classic problems of trying to escape local maxima and things like that.

Output

Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 1 partitions The best partitioning is [[1, 6, 2, 3, 4, 1, 7, 6, 4]] With heights [34] Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 2 partitions The best partitioning is [[1, 6, 2, 3, 4, 1], [7, 6, 4]] With heights [17, 17] Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 3 partitions The best partitioning is [[1, 6, 2, 3], [4, 1, 7], [6, 4]] With heights [12, 12, 10] Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 4 partitions The best partitioning is [[1, 6], [2, 3, 4], [1, 7], [6, 4]] With heights [7, 9, 8, 10] Partitioning [1, 10, 10, 1] into 2 partitions The best partitioning is [[1, 10], [10, 1]] With heights [11, 11] Partitioning [7, 17, 17, 1, 8, 8, 12, 0, 10, 20, 17, 13, 12, 4, 1, 1, 7, 11, 7, 13, 9, 12, 3, 18, 9, 6, 7, 19, 20, 17, 7, 4, 3, 16, 20, 6, 7, 12, 16, 3, 6, 12, 9, 4, 3, 2, 18, 1, 16, 14, 17, 7, 0, 14, 13, 3, 5, 3, 1, 5, 5, 13, 16, 0, 16, 7, 3, 8, 1, 20, 16, 11, 15, 3, 10, 10, 2, 0, 12, 12, 0, 18, 20, 3, 10, 9, 13, 12, 15, 6, 14, 16, 6, 12, 9, 9, 16, 14, 19, 1] into 10 partitions The best partitioning is [[7, 17, 17, 1, 8, 8, 12, 0, 10, 20], [17, 13, 12, 4, 1, 1, 7, 11, 7, 13, 9], [12, 3, 18, 9, 6, 7, 19, 20], [17, 7, 4, 3, 16, 20, 6, 7, 12], [16, 3, 6, 12, 9, 4, 3, 2, 18, 1, 16], [14, 17, 7, 0, 14, 13, 3, 5, 3, 1, 5, 5], [13, 16, 0, 16, 7, 3, 8, 1, 20, 16], [11, 15, 3, 10, 10, 2, 0, 12, 12, 0, 18], [20, 3, 10, 9, 13, 12, 15, 6, 14], [16, 6, 12, 9, 9, 16, 14, 19, 1]] With heights [100, 95, 94, 92, 90, 87, 100, 93, 102, 102] Partitioning [95, 15, 75, 25, 85, 5] into 3 partitions The best partitioning is [[95, 15], [75, 25], [85, 5]] With heights [110, 100, 90]

Edit

Added the new test case, [95, 15, 75, 25, 85, 5], which this method handles correctly.

Addendum

This version of the algorithm is easier to read and understand, but is a bit longer due to taking less advantage of built-in Python features. It seems to execute in a comparable or even slightly faster amount of time, however.

#partition list a into k partitions def partition_list(a, k): #check degenerate conditions if k <= 1: return [a] if k >= len(a): return [[x] for x in a] #create a list of indexes to partition between, using the index on the #left of the partition to indicate where to partition #to start, roughly partition the array into equal groups of len(a)/k (note #that the last group may be a different size) partition_between = [] for i in range(k-1): partition_between.append((i+1)*len(a)/k) #the ideal size for all partitions is the total height of the list divided #by the number of paritions average_height = float(sum(a))/k best_score = None best_partitions = None count = 0 no_improvements_count = 0 #loop over possible partitionings while True: #partition the list partitions = [] index = 0 for div in partition_between: #create partitions based on partition_between partitions.append(a[index:div]) index = div #append the last partition, which runs from the last partition divider #to the end of the list partitions.append(a[index:]) #evaluate the partitioning worst_height_diff = 0 worst_partition_index = -1 for p in partitions: #compare the partition height to the ideal partition height height_diff = average_height - sum(p) #if it's the worst partition we've seen, update the variables that #track that if abs(height_diff) > abs(worst_height_diff): worst_height_diff = height_diff worst_partition_index = partitions.index(p) #if the worst partition from this run is still better than anything #we saw in previous iterations, update our best-ever variables if best_score is None or abs(worst_height_diff) < best_score: best_score = abs(worst_height_diff) best_partitions = partitions no_improvements_count = 0 else: no_improvements_count += 1 #decide if we're done: if all our partition heights are ideal, or if #we haven't seen improvement in >5 iterations, or we've tried 100 #different partitionings #the criteria to exit are important for getting a good result with #complex data, and changing them is a good way to experiment with getting #improved results if worst_height_diff == 0 or no_improvements_count > 5 or count > 100: return best_partitions count += 1 #adjust the partitioning of the worst partition to move it closer to the #ideal size. the overall goal is to take the worst partition and adjust #its size to try and make its height closer to the ideal. generally, if #the worst partition is too big, we want to shrink the worst partition #by moving one of its ends into the smaller of the two neighboring #partitions. if the worst partition is too small, we want to grow the #partition by expanding the partition towards the larger of the two #neighboring partitions if worst_partition_index == 0: #the worst partition is the first one if worst_height_diff < 0: partition_between[0] -= 1 #partition too big, so make it smaller else: partition_between[0] += 1 #partition too small, so make it bigger elif worst_partition_index == len(partitions)-1: #the worst partition is the last one if worst_height_diff < 0: partition_between[-1] += 1 #partition too small, so make it bigger else: partition_between[-1] -= 1 #partition too big, so make it smaller else: #the worst partition is in the middle somewhere left_bound = worst_partition_index - 1 #the divider before the partition right_bound = worst_partition_index #the divider after the partition if worst_height_diff < 0: #partition too big, so make it smaller if sum(partitions[worst_partition_index-1]) > sum(partitions[worst_partition_index+1]): #the partition on the left is bigger than the one on the right, so make the one on the right bigger partition_between[right_bound] -= 1 else: #the partition on the left is smaller than the one on the right, so make the one on the left bigger partition_between[left_bound] += 1 else: #partition too small, make it bigger if sum(partitions[worst_partition_index-1]) > sum(partitions[worst_partition_index+1]): #the partition on the left is bigger than the one on the right, so make the one on the left smaller partition_between[left_bound] -= 1 else: #the partition on the left is smaller than the one on the right, so make the one on the right smaller partition_between[right_bound] += 1 def print_best_partition(a, k): #simple function to partition a list and print info print ' Partitioning {0} into {1} partitions'.format(a, k) p = partition_list(a, k) print ' The best partitioning is {0}\n With heights {1}\n'.format(p, map(sum, p)) #tests a = [1, 6, 2, 3, 4, 1, 7, 6, 4] print_best_partition(a, 1) print_best_partition(a, 2) print_best_partition(a, 3) print_best_partition(a, 4) print_best_partition(a, 5) b = [1, 10, 10, 1] print_best_partition(b, 2) import random c = [random.randint(0,20) for x in range(100)] print_best_partition(c, 10) d = [95, 15, 75, 25, 85, 5] print_best_partition(d, 3)

Thank you @Shawn Sullivan, your comment on single pass possibly being impossible echoes my thoughts on looking at everyone's solutions. I've tried related single pass methods and it always seems to come up short. I'll have to digest your solution a bit first...
Cool, let me know if you have any questions on how it works. I also made a shorter version of the algorithm by turning some of the for loops into other expressions and worked through the truth tables for the conditional at the end to make the partition adjustment able to be expressed with one line. I posted that too in case you're interested, although it's a little harder to read the code.
@NgOon-Ee I've made a few more improvements to Edit 2 which improve the code, but it's still a bit tougher to follow than the original IMO. However, as long as it's clear how this approach works, I consider the Edit 2 code my current answer. I've left the original version mostly as-is in case it's easier to understand, but if this answer is considered best, I'd make the second implementation the primary answer.
While I think I'll probably use the other answer by timgeb, this is clearly the 'correct' answer due to the unpredictable nature of the problem. I also think the second implementation should be made the primary answer, with the first implementation as an addendum for easier understanding (even that took me quite a while to look through really).
@ShawnSullivan: Thanks a lot! Adaptation for Python 3: gist.github.com/laowantong/ee675108eee64640e5f94f00d8edbcb4

timgeb · Accepted Answer · 2016-02-20 02:51:47Z

Here's the best O(n) greedy algorithm I got for now. The idea is to greedily append items from the list to a chunk until the sum for the current chunk exceeds the average expected sum for a chunk at that point. The average expected sum is updated constantly. This solution is not perfect, but as I said, it is O(n) and worked not bad with my tests. I am eager to hear feedback and suggestions for improvement.

I left my debug print statements in the code to provide some documentation. Feel free to comment them in to see what's going on in each step.

CODE

def split_list(lst, chunks): #print(lst) #print() chunks_yielded = 0 total_sum = sum(lst) avg_sum = total_sum/float(chunks) chunk = [] chunksum = 0 sum_of_seen = 0 for i, item in enumerate(lst): #print('start of loop! chunk: {}, index: {}, item: {}, chunksum: {}'.format(chunk, i, item, chunksum)) if chunks - chunks_yielded == 1: #print('must yield the rest of the list! chunks_yielded: {}'.format(chunks_yielded)) yield chunk + lst[i:] raise StopIteration to_yield = chunks - chunks_yielded chunks_left = len(lst) - i if to_yield > chunks_left: #print('must yield remaining list in single item chunks! to_yield: {}, chunks_left: {}'.format(to_yield, chunks_left)) if chunk: yield chunk yield from ([x] for x in lst[i:]) raise StopIteration sum_of_seen += item if chunksum < avg_sum: #print('appending {} to chunk {}'.format(item, chunk)) chunk.append(item) chunksum += item else: #print('yielding chunk {}'.format(chunk)) yield chunk # update average expected sum, because the last yielded chunk was probably not perfect: avg_sum = (total_sum - sum_of_seen)/(to_yield - 1) chunks_yielded += 1 chunksum = item chunk = [item]

TEST CODE

import random lst = [1, 6, 2, 3, 4, 1, 7, 6, 4] #lst = [random.choice(range(1,101)) for _ in range(100)] chunks = 3 print('list: {}, avg sum: {}, chunks: {}\n'.format(lst, sum(lst)/float(chunks), chunks)) for chunk in split_list(lst, chunks): print('chunk: {}, sum: {}'.format(chunk, sum(chunk)))

TESTS with your list:

list: [1, 6, 2, 3, 4, 1, 7, 6, 4], avg sum: 17.0, chunks: 2 chunk: [1, 6, 2, 3, 4, 1], sum: 17 chunk: [7, 6, 4], sum: 17 --- list: [1, 6, 2, 3, 4, 1, 7, 6, 4], avg sum: 11.33, chunks: 3 chunk: [1, 6, 2, 3], sum: 12 chunk: [4, 1, 7], sum: 12 chunk: [6, 4], sum: 10 --- list: [1, 6, 2, 3, 4, 1, 7, 6, 4], avg sum: 8.5, chunks: 4 chunk: [1, 6, 2], sum: 9 chunk: [3, 4, 1], sum: 8 chunk: [7], sum: 7 chunk: [6, 4], sum: 10 --- list: [1, 6, 2, 3, 4, 1, 7, 6, 4], avg sum: 6.8, chunks: 5 chunk: [1, 6], sum: 7 chunk: [2, 3, 4], sum: 9 chunk: [1, 7], sum: 8 chunk: [6], sum: 6 chunk: [4], sum: 4

TESTS with random lists of length 100 and elements from 1 to 100 (printing of the random list omitted):

avg sum: 2776.0, chunks: 2 chunk: [25, 8, 71, 39, 5, 69, 29, 64, 31, 2, 90, 73, 72, 58, 52, 19, 64, 34, 16, 8, 16, 89, 70, 67, 63, 36, 9, 87, 38, 33, 22, 73, 66, 93, 46, 48, 65, 55, 81, 92, 69, 94, 43, 68, 98, 70, 28, 99, 92, 69, 24, 74], sum: 2806 chunk: [55, 55, 64, 93, 97, 53, 85, 100, 66, 61, 5, 98, 43, 74, 99, 56, 96, 74, 63, 6, 89, 82, 8, 25, 36, 68, 89, 84, 10, 46, 95, 41, 54, 39, 21, 24, 8, 82, 72, 51, 31, 48, 33, 77, 17, 69, 50, 54], sum: 2746 --- avg sum: 1047.6, chunks: 5 chunk: [19, 76, 96, 78, 12, 33, 94, 10, 38, 87, 44, 76, 28, 18, 26, 29, 44, 98, 44, 32, 80], sum: 1062 chunk: [48, 70, 42, 85, 87, 55, 44, 11, 50, 48, 47, 50, 1, 17, 93, 78, 25, 10, 89, 57, 85], sum: 1092 chunk: [30, 83, 99, 62, 48, 66, 65, 98, 94, 54, 14, 97, 58, 53, 3, 98], sum: 1022 chunk: [80, 34, 63, 20, 27, 36, 98, 97, 7, 6, 9, 65, 91, 93, 2, 27, 83, 35, 65, 17, 26, 41], sum: 1022 chunk: [80, 80, 42, 32, 44, 42, 94, 31, 50, 23, 34, 84, 47, 10, 54, 59, 72, 80, 6, 76], sum: 1040 --- avg sum: 474.6, chunks: 10 chunk: [4, 41, 47, 41, 32, 51, 81, 5, 3, 37, 40, 26, 10, 70], sum: 488 chunk: [54, 8, 91, 42, 35, 80, 13, 84, 14, 23, 59], sum: 503 chunk: [39, 4, 38, 40, 88, 69, 10, 19, 28, 97, 81], sum: 513 chunk: [19, 55, 21, 63, 99, 93, 39, 47, 29], sum: 465 chunk: [65, 88, 12, 94, 7, 47, 14, 55, 28, 9, 98], sum: 517 chunk: [19, 1, 98, 84, 92, 99, 11, 53], sum: 457 chunk: [85, 79, 69, 78, 44, 6, 19, 53], sum: 433 chunk: [59, 20, 64, 55, 2, 65, 44, 90, 37, 26], sum: 462 chunk: [78, 66, 32, 76, 59, 47, 82], sum: 440 chunk: [34, 56, 66, 27, 1, 100, 16, 5, 97, 33, 33], sum: 468 --- avg sum: 182.48, chunks: 25 chunk: [55, 6, 16, 42, 85], sum: 204 chunk: [30, 68, 3, 94], sum: 195 chunk: [68, 96, 23], sum: 187 chunk: [69, 19, 12, 97], sum: 197 chunk: [59, 88, 49], sum: 196 chunk: [1, 16, 13, 12, 61, 77], sum: 180 chunk: [49, 75, 44, 43], sum: 211 chunk: [34, 86, 9, 55], sum: 184 chunk: [25, 82, 12, 93], sum: 212 chunk: [32, 74, 53, 31], sum: 190 chunk: [13, 15, 26, 31, 35, 3, 14, 71], sum: 208 chunk: [81, 92], sum: 173 chunk: [94, 21, 34, 71], sum: 220 chunk: [1, 55, 70, 3, 92], sum: 221 chunk: [38, 59, 56, 57], sum: 210 chunk: [7, 20, 10, 81, 100], sum: 218 chunk: [5, 71, 19, 8, 82], sum: 185 chunk: [95, 14, 72], sum: 181 chunk: [2, 8, 4, 47, 75, 17], sum: 153 chunk: [56, 69, 42], sum: 167 chunk: [75, 45], sum: 120 chunk: [68, 60], sum: 128 chunk: [29, 25, 62, 3, 50], sum: 169 chunk: [54, 63], sum: 117 chunk: [57, 37, 42], sum: 136

As you can see, as expected it gets worse the more chunks you want to generate. I hope I was able to help a bit.

edit: The yield from syntax requires Python 3.3 or newer, if you are using an older version just turn the statement into a normal for loop.

Thanks for this, but the edge cases I was talking about (consistent underestimation) still faces an issue with this method. Added an example data set which causes the problem, with this method it actually yields [95, 15], [75], and [25, 85, 5], which is not a bad guess but still not as good as [95, 15], [75, 25], and [85, 5]
@NgOon-Ee yeah, my solution is more tailored towards giving good guesses, not perfect ones. I'm not sure how much better it can get while staying greedy and within O(n). I'll have to think about this some more. One idea I'm having is to use my solution to get the chunks and then make another pass over the chunks to optimize them, switching out first/last elements. Maybe you can try to attack this if you need it fast. In theory, you should get very good guesses with a few extra passes over the chunks.
@NgOon-Ee please try this in the else clause: chunksum = chunksum - avg_sum + item instead of chunksum = item. Comment out/delete the line where avg_sum is updated. This seems to give better results for some cases, for example [95, 15], [75, 25] and [85, 5] for a three-split of [95, 15, 75, 25, 85, 5].
Thanks, this solution is probably the most user-friendly. Unfortunately the more I look at it the more I realize it just postpones the inevitable mistake, mostly because the problem itself is ill-defined for such single pass methods as Shawn Sullivan stated. I'll upvote this, but based on that technicality I think his answer is more correct.

Milind R · Accepted Answer · 2019-01-03 14:33:39Z

Simple and concise way using numpy. Assuming

import numpy.random as nr import numpy as np a = (nr.random(10000000)*1000).astype(int)

Then, assuming you need to divide the list into p parts with approximately equal sums

def equisum_partition(arr,p): ac = arr.cumsum() #sum of the entire array partsum = ac[-1]//p #generates the cumulative sums of each part cumpartsums = np.array(range(1,p))*partsum #finds the indices where the cumulative sums are sandwiched inds = np.searchsorted(ac,cumpartsums) #split into approximately equal-sum arrays parts = np.split(arr,inds) return parts

Importantly, this is vectorised:

In [3]: %timeit parts = equisum_partition(a,20) 53.5 ms ± 962 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

You could checking the quality of the splitting,

partsums = np.array([part.sum() for part in parts]).std()

The splits are not great, but I suspect they are optimal given that the ordering is not changed.

mjkvaak · Accepted Answer · 2019-09-30 12:46:45Z

This is a minorly edited version of @Milind R's numpy-approach (BTW a big thanks, sir). Namely, I realized that in a real-life-scenario, the partitions suggested by the script may end up being sub-optimal, if the elements are not "uniformly" spread in the array in terms of their values. To counter this I "uniformified" the array by rearranging the elements of the as 'smallest', 'largest', 'second smallest', 'second largest', etc. The down part is that this makes the script considerably (~5x) slower.

import numpy.random as nr import numpy as np a = (nr.random(10000000)*1000).astype(int)

The edited partitioning algorithm:

def equisum_partition(arr,p, uniformify=True): #uniformify: rearrange to ['smallest', 'largest', 'second smallest', 'second largest', etc..] if uniformify: l = len(arr) odd = l%2!=0 arr = np.sort(arr) #add a dummy element if odd length if odd: arr = np.append(np.min(arr)-1, arr) l = l+1 idx = np.arange(l) idx = np.multiply(idx, np.subtract(1, np.multiply( np.mod(idx, 2), 2)) ) arr = arr[idx] #remove the dummy element if odd: arr = arr[1:] #cumulative summation ac = arr.cumsum() #sum of the entire array partsum = ac[-1]//p #generates the cumulative sums of each part cumpartsums = np.array(range(1,p))*partsum #finds the indices where the cumulative sums are sandwiched inds = np.searchsorted(ac,cumpartsums) #split into approximately equal-sum arrays parts = np.split(arr,inds) return parts

In the original answer's example this doesn't play too much of a role since due the randomness of the example array.

With uniformify:

%%time parts = equisum_partition(a,20) partsums = np.array([part.sum() for part in parts])# partsums.std() Wall time: 624 ms 266.6111212984185

Without uniformify:

%%time parts = equisum_partition(a,20, uniformify=False) partsums = np.array([part.sum() for part in parts])# partsums.std() Wall time: 105 ms 331.19071544957296

Garrett R · Accepted Answer · 2016-02-20 00:23:14Z

I think a good approach would be to sort the input list. Then add the smallest and largest to one list. The second smallest and second largest to the next list and so on, until all elements are added to the list.

def divide_list(A): A.sort() l = 0 r = len(A) - 1 l1,l2= [],[] i = 0 while l < r: ends = [A[l], A[r]] if i %2 ==0: l1.extend(ends) else: l2.extend(ends) i +=1 l +=1 r -=1 if r == l: smaller = l1 if sum(l1) < sum(l2) else l2 smaller.append(A[r]) return l1, l2 myList = [1, 6, 2, 3, 4, 1, 7, 6, 4] print divide_list(myList) myList = [1,10,10,1] print divide_list(myList)

Output

([1, 7, 2, 6], [1, 6, 3, 4, 4]) ([1, 10], [1, 10])

given the numbers represent words/song lyrics I think the original order of the elements matter

danidee · Accepted Answer · 2016-02-20 01:39:44Z

This is coming kind of late but i came up with a function that does what you need it takes a second parameter that tells it how it should split the list

import math my_list = [1, 6, 2, 3, 4, 1, 7, 6, 4] def partition(my_list, split): solution = [] total = sum(my_list) div = total / split div = math.ceil(div) criteria = [div] * (total // div) criteria.append(total - sum(criteria)) if sum(criteria) != total else criteria temp = [] pivot = 0 for crit in criteria: for count in range(len(my_list) + 1): if sum(my_list[pivot:count]) == crit: solution.append(my_list[pivot:count]) pivot = count break return solution print(partition(my_list, 2)) # Outputs [[1, 6, 2, 3, 4, 1], [7, 6, 4]] print(partition(my_list, 3)) # Outputs [[1, 6, 2, 3], [4, 1, 7], [6, 4]]

it would fail for 4 divisions, because you obviously stated in your question that you want to maintain the order

4 divisions = [9, 9, 9, 7]

and your sequence can't match that

aghast · Accepted Answer · 2016-02-20 02:19:28Z

Here is some code that returns 2-ples of slice indexes for each sublist.

weights = [1, 6, 2, 3, 4, 1, 7, 6, 4] def balance_partitions(weights:list, n:int=2) -> tuple: if n < 1: raise ValueError("Parameter 'n' must be 2+") target = sum(weights) // n results = [] cost = 0 start = 0 for i, w in enumerate(weights): delta = target - cost cost += w if cost >= target: if i == 0 or cost - target <= delta: results.append( (start, i+1) ) start = i+1 elif cost - target > delta: # Better if we didn't include this one. results.append( (start, i) ) start = i cost -= target if len(results) == n-1: results.append( (start, len(weights)) ) break return tuple(results) def print_parts(w, n): result = balance_partitions(w, n) print("Suggested partition indices: ", result) for t in result: start,end = t sublist = w[start:end] print(" - ", sublist, "(sum: {})".format(sum(sublist))) print(weights, '=', sum(weights)) for i in range(2, len(weights)+1): print_parts(weights, i)

Output is:

[1, 6, 2, 3, 4, 1, 7, 6, 4] = 34 Suggested partition indices: ((0, 6), (6, 9)) - [1, 6, 2, 3, 4, 1] (sum: 17) - [7, 6, 4] (sum: 17) Suggested partition indices: ((0, 4), (4, 7), (7, 9)) - [1, 6, 2, 3] (sum: 12) - [4, 1, 7] (sum: 12) - [6, 4] (sum: 10) Suggested partition indices: ((0, 3), (3, 5), (5, 7), (7, 9)) - [1, 6, 2] (sum: 9) - [3, 4] (sum: 7) - [1, 7] (sum: 8) - [6, 4] (sum: 10) Suggested partition indices: ((0, 2), (2, 4), (4, 6), (6, 7), (7, 9)) - [1, 6] (sum: 7) - [2, 3] (sum: 5) - [4, 1] (sum: 5) - [7] (sum: 7) - [6, 4] (sum: 10) Suggested partition indices: ((0, 2), (2, 3), (3, 5), (5, 6), (6, 7), (7, 9)) - [1, 6] (sum: 7) - [2] (sum: 2) - [3, 4] (sum: 7) - [1] (sum: 1) - [7] (sum: 7) - [6, 4] (sum: 10) Suggested partition indices: ((0, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 9)) - [1, 6] (sum: 7) - [2] (sum: 2) - [3] (sum: 3) - [4] (sum: 4) - [1] (sum: 1) - [7] (sum: 7) - [6, 4] (sum: 10) Suggested partition indices: ((0, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8), (8, 9)) - [1, 6] (sum: 7) - [2] (sum: 2) - [3] (sum: 3) - [4] (sum: 4) - [1] (sum: 1) - [7] (sum: 7) - [6] (sum: 6) - [4] (sum: 4) Suggested partition indices: ((0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8), (8, 9)) - [1] (sum: 1) - [6] (sum: 6) - [2] (sum: 2) - [3] (sum: 3) - [4] (sum: 4) - [1] (sum: 1) - [7] (sum: 7) - [6] (sum: 6) - [4] (sum: 4)

erip · Accepted Answer · 2016-02-20 00:52:45Z

0

Here's how I might attack this problem for the case of two desired sublists. It's probably not as efficient as it could be, but it's a first cut.

def divide(l): total = sum(l) half = total / 2 l1 = [] l2 = [] for e in l: if half - e >= 0 or half > abs(half - e): l1.append(e) half -= e else: l2.append(e) return (l1, l2)

You can see it in action here:

(l1, l2) = divide([1, 6, 2, 3, 4, 1, 7, 6, 4]) print(l1) # [1, 6, 2, 3, 4, 1] print(l2) #[7, 6, 4] (l1, l2) = divide([1,1,10,10]) print(l1) # [1, 1, 10] print(l2) #[10]

I'll leave other cases to you as an exercise. :)

edited Feb 20, 2016 at 0:52

answered Feb 19, 2016 at 23:50

erip

17.1k11 gold badges73 silver badges131 bronze badges

4 Comments

erip Over a year ago

Please explain the downvote. Can't learn anything if there's no feedback.

Garrett R Over a year ago

I didn't downvote you, but I"m trying to understand how this works. It looks like you greedily add to l1 until you get to more than half of the total. Then you add to l2. What if you had a list like [1,1,10,10]. Wouldn't this produce [1,1] [10,10] ?

erip Over a year ago

Whoops, indeed! Need to check if next element will cause less of a difference of half. Will update soon

Ng Oon-Ee Over a year ago

Thanks, I'm using something very similar to this right now (almost identical except for naming of variables and I handle more than two sublists), but the problem is when the data tends to provide smaller than expected lists (I've added an example for that) then it tends to overshoot.

Collectives™ on Stack Overflow

Split a list of numbers into n chunks such that the chunks have (close to) equal sums and keep the original order

8 Answers 8

Output

Edit

Addendum

8 Comments

4 Comments

Comments

Comments

Output

3 Comments

Comments

Comments

4 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

Output

Edit

Addendum

8 Comments

4 Comments

Comments

Comments

Output

3 Comments

Comments

Comments

4 Comments

Linked

Related