10
\$\begingroup\$

I've become interested prime factorization since solving Project Euler problem 3 (finding the largest prime factor of 600851475143). Learning here that initializing lists with many elements to later prune them is the computational equivalent of swimming with bricks, I challenged myself to code a function that returns all prime factors as efficiently as possible, without the need for initializing a big list beforehand.

The routine takes the input n and divides it by 2 as many times as possible before leaving a remainder, redefining n to be the result of the division each time. When 2 no longer cleanly divides n, it moves to 3, but it is after 3 that the crux of the inefficiency arises.

After 3, if n does not equal 1 and there are still factors to be found, the next factor tried is the next consecutive odd integer after 3. For example, take the number \$129373200 = (2^4) × (3^5) × (5^2) × (11^3)\$. After the algorithm divides all 2's, 3's and 5's away, the next test-number is 7, which I view as efficient because 7 is prime. However, as the routine iterates, 9 is tested before 11, and I see this as inefficient because 9 is composite.

If I edit the code such that it stores all tested numbers in a list, and checks if the next test-number is a multiple of a previously tested one or not before iterating, the runtime slows down considerably. Is there an efficient way to do this sort of check via elementary functions without storing tested numbers in a list (or set, dictionary, etc.)?

TLDR: I want to understand why factoring numbers with 8-digit-long prime factors takes the code 7 seconds versus numbers with 9-digit-long prime factors that take nearly 2.5 minutes to solve. I want to reduce this large jump in runtime.

def pf(n): startTime=datetime.now() factors = [] #Initialize a list to store prime factors. while n % 2 == 0: #While n/2 continues to yield no remainder: factors += [2] #Append 2 to the factor list. n /= 2 #Redefine n as n/2. if n == 1: #If n is 1, all of its prime factors have been found, print datetime.now() - startTime return factors #so return the factor list to the user. p = 3 #Initialize a count at 3, the next prime after 2 while p*p <= n: #While n is greater than or equal to p*p: if n%p == 0: #If p divides n: factors += [p] #Append p to the factor list. n /= p #Redefine n as n/p. else: #If p doesn't divide n: p += 2 #See if the next consecutive odd number up divides n. #Once all smaller factors are found, and n is smaller than p*p, factors += [n] #append n to the factor list, print datetime.now() - startTime, return factors #and then return it to the user. print pf(121) print pf(42768) print pf(19440) print pf(97200) print pf(129373200) #inefficiency example print pf(600851475143) print pf(31610054640417607788145206291543662493274686990) #consecutive primes print pf(4383898882371133212190175441147530134182228613257) #5-6 digit primes print pf(815145012617325671714771027149) #8-digit primes print pf(9657874875862260078751562987967607300225789) #9-digit primes 

Output:

0:00:00 [11, 11] 0:00:00 [2, 2, 2, 2, 3, 3, 3, 3, 3, 11] 0:00:00 [2, 2, 2, 2, 3, 3, 3, 3, 3, 5] 0:00:00 [2, 2, 2, 2, 3, 3, 3, 3, 3, 5, 5] 0:00:00 [2, 2, 2, 2, 3, 3, 3, 3, 3, 5, 5, 11, 11, 11] 0:00:00 [71, 839, 1471, 6857L] 0:00:00 [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113L] 0:00:00.023000 [37489, 59617, 63577, 63841, 74771, 75521, 82217, 97283, 102181, 104717L] 0:00:06.935000 [15485867, 32452843, 32452867, 49979687L] 0:02:24.362000 [122949829, 314606891, 393342743, 674506111, 941083987L] 
\$\endgroup\$
2
  • \$\begingroup\$ It's probably worth including a link to your previous attempt at this. \$\endgroup\$ Commented Sep 11, 2015 at 7:24
  • \$\begingroup\$ yaegermenjensen, What factors do you get for pf(0), pf(1)? \$\endgroup\$ Commented Oct 7 at 14:20

3 Answers 3

14
\$\begingroup\$

Code Comments

First, that is a thoroughly excessive amount of commenting. I appreciate your zeal in documentation, but you do not need to comment every line. Just a couple comments here and there is thoroughly sufficient.

Second, your function should do its job. If you want to time it, that is an external job. Use the timeit library:

def pf(n): # stuff that doesn't involve datetime return factors print timeit.timeit('pf(val)', setup='from __main__ import pf; val = 145721*(145721+2)', number=10) 

That will print how long it took your function to run 10 times. Great library, highly recommended.

Thirdly, factors += [p] should be spelled factors.append(p).

Algorithm Comments

The reason your code performs so poorly is because of the its algorithmic complexity. You are trying every odd number from 2 to sqrt(N), so your algorithm is O(sqrt(N)). But here N is the actual number... if we express N in terms of digits, it's O(sqrt(10^N)) = O(10^(N/2)). So when we go from 30-digit number to a 43-digit number, we would expect to see an increase in time on the order of a million in the worst case (if we were looking at twin primes). The fact that we see an increase of "only" 20x is an artifact of case selection more than anything else. I tried picking multiples of (arbitrary) twin primes, and the runtimes I got after 10 runs each were:

145721 * 145723 0.536s 1117811 * 1117813 2.716s 18363797 * 18363799 45.436s 

Roughly 10x each time. Yay predictions.

So what can we do to speed this up? When our times are this bad, it's either because we have a really stupid bug or we're using a bad algorithm. There isn't anything I see that will save this particular algorithm. At best, we can improve by changing the iteration to avoid multiples of 3 and 5:

jump = 4 while p*p <= n: ... else: p += jump jump ^= 6 #so it alternates between 2 and 4 

That gets us to:

145721 * 145723 0.263s 1117811 * 1117813 2.172s 18363797 * 18363799 34.740s 

Still exponential, but better constants at least. Not sure how much better we're going to do than that. So let's just google some prime factorization algorithms. First one I came across was Pollard's rho. Simply copying the algorithm gives produces:

def pollard(n): from fractions import gcd def get_factor(n): x_fixed = 2 x = 2 cycle_size = 2 factor = 1 while factor == 1: for count in xrange(cycle_size): if factor > 1: break x = (x * x + 1) % n factor = gcd(x - x_fixed, n) cycle_size *= 2 x_fixed = x return factor factors = [] while n > 1: next = get_factor(n) factors.append(next) n //= next return factors 

That gives me the correct response on all three twin primes, with this performance:

145721 * 145723 0.021s (12.5x improvement) 1117811 * 1117813 0.295s ( 7.4x) 18363797 * 18363799 1.131s (30.7x) 1500450269 * 1500450271 5.733s (est. ~1000x) 

Even jumping up to 10-digits, we still see good performance (we'd expect over an hour with the previous algorithm, so we're talking 1000x improvement), and we see better growth to.

Sometimes, you just need a better algorithm.

\$\endgroup\$
2
  • \$\begingroup\$ Thanks for introducing the timeit module. Sorry for the plethora of comments. I like to be very thorough; they are mostly for my own learning. I will omit them in future submissions. About the timeit module, is there an easy way to make a for loop s.t. 'for n in nlist, print average runtime of pf(n) over 10 iterations', where my nlist contains the composite inputs I wish to time? And about the Pollard algorithm, thanks for showing me an efficient Python example. I'll look at the theory in words later and try to implement it in my own way before I work through yours and get back to you. \$\endgroup\$ Commented Sep 11, 2015 at 4:19
  • \$\begingroup\$ A number being the product of twin primes is relatively easy to factor, with "trial division": n = p * (p + 2) , one factor: |sqrt(n)| = p, the other: n / p = p + 2. The largest such a 64 bits number: 4294965671 * 4294965673 takes less than a millisecond. Code (C#): here. \$\endgroup\$ Commented Sep 29, 2017 at 8:42
6
\$\begingroup\$

timing a function

Generally it is not a good idea to combine timing code with your function. There are better ways to determine how much time a function call takes:

  1. Use %timeit in IPython
  2. Write a wrapper function.

An example of a wrapper function:

def time_pf(n): startTime=datetime.now() result = pf(n) # call the original function delta = datetime.now() - startTime print delta, "pf for n =", n, "returned:", result return result time_pf(4383898882371133212190175441147530134182228613257) 

Then your original function pf can still be called when you don't want the timing information.

You an even generalize this wrapper idea and encapsulate it into a decorator. See this article or this other one or this SO question.

better factoring

The algorithm is slow because, well, in some cases you are testing a lot of numbers which are not divisors (or even prime.)

A nice explanation of how some advanced factoring techniques work is given in this "Ask Dr. Math" article. Some of the techniques include:

  • Express \$N\$ as \$(x-y)(x+y)\$
  • Express \$N\$ as sums of squares in two ways: \$N = a^2 + b^2 = c^2 + d^2\$, where \$0 < a < c < d < b\$
  • Pollard P-1 method
  • Elliptic Curve Method
  • Number Field Sieve
  • Fermat's Method
  • Continued Fraction Method
  • SQUFOF
  • … and others

Implementing one of those algorithms is a good exercise.

\$\endgroup\$
3
\$\begingroup\$

I would look up the pollard-rho algorithm. It doesn’t involve any higher mathematics, so you can implement it yourself. If a number n has a prime factor p, that prime factor can be found in about sqrt(p) steps. Since a composite n has a prime factor <= sqrt(n), pollard-rho finds a prime factor of a composite number n^(1/4) steps.

There are some problems. One is that multiple prime factors f can be found simultaneously and you are only told their product. Worst case you have n which is the product of 2, 3 or more primes that are all found after the same number of steps, and you are told that n is divisible by n. Not helpful.

The other problem is that n might be prime. Pollard-rho will tell you after about n^(1/2) steps that n has a factor n. This problem is fixed easily: For composite n you would expect a factor to be found in not much more than n^(1/4) steps. If you don’t find a factor after 5 x n^(1/4) steps, it is quite likely that n is prime, so you do a primality test and this problem is solved.

You can run pollard-rho with a parameter. The number of steps needed to find a factor indicates whether you found a prime factor or a composite factor, so you either prove it is prime or find more factors using a different parameter for pollard-tho. And of course if n has a factor f then you also examine n / f.

\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.