3

I have a program which works with a for loop, but it's too slow, and I need to speed it up.

I have a reverse-sorted list of probabilities whose sum is 1. There are over 5 million items.

I want to take the highest probabilities, i.e. the first n items whose collective sum is 0.9999.

This was my code:

for b in sorted_list: new_list.append(b) if sum(new_list) > 0.9999: break 

Can anyone suggest a quicker method?

Thank you

Edit: I found that this question was asked before - stackexchange link

however, the suggestions all make use of loops so I don't think they will be any quicker. Someone at the end suggested a list comprehension. So I am going to google that and see what that means! Thank you

3
  • 2
    This is a classic mistake, sum(new_list) every step makes your algorithm very inefficient. Instead, just keep score adding the appended element to the previous total. Commented Oct 9, 2019 at 16:20
  • 2
    can you try np.array(sorted_list)[np.cumsum(sorted_list)< 0.9999] Commented Oct 9, 2019 at 16:23
  • I like the idea that I made a classic mistake. Going to replace my code with the suggestion below and check the outcome. Commented Oct 9, 2019 at 16:38

2 Answers 2

6

Keep a running sum instead of recomputing it every step for the whole list. I.e.

running_sum = 0 for b in sorted_list: new_list.append(b) running_sum += b if running_sum > 0.9999: break 
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks very much for the quick replies. I am going to try this solution. I haven't yet had time to digest why you all think this is quicker, but I will certainly try it, and after I have done it, I'll have probably figured out why my process was inefficient. I will probably come back in an an hour or so and uptick this as solution, but I want to try it first Thanks to all for your comments.
The version provided in your question requires that all elements in new_list are accessed to perform addition to compute the sum on each iteration. This computes in O(n!) time over the course of all iterations. However, the incremental addition only needs to perform a single operation with each iteration of the loop, meaning it performs in O(n) time. This presents a significant speedup as the size of n gets large.
@JacobTurpin The complexity is quadratic, not exponential as you suggest. Simple nested iteration issue. For every outer iteration, the collected inner items are iterated. That amounts to an O(n^2) complexity.
Thanks Jacob. I understand now why the amendment is so much faster.
3

sum(iterable) has to visit all elements to calculate the sum. That is unnecessary as you can reuse the sum from the previous iteration. The built-in tool to accumulate such a sum is, well, itertools.accumulate. Moreover, you don't have to append repeatedly. Instead, you can take a single slice at the end:

 from itertools import accumulate for i, s in enumerate(accumulate(sorted_list)): if s > 0.9999: break new_list = sorted_list[:i+1] 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.