3
\$\begingroup\$

This is my take on this problem. I don't pre determine the distances, it's not suitable for the application I'll use it for. I'll use it in shool to determine some a mean total distance and how to setup a the poles of a power grid.


from random import sample from random import random from random import uniform from random import shuffle from math import sqrt from time import time from itertools import permutations import matplotlib.pyplot as plt import cProfile import pstats import io class NodesLeastDistanceGA: """ Traveling salesman problem genetic algorithm """ def __init__(self, parent, side, verbose=False): """ Constructor """ self._parent = parent self._side = side self._mutate_rate = 0.07 self._population_size = 60 if len(parent) > 10 else 10 self._new_generation_size = self._population_size*2 self._rounds = 200 self._genlen = len(parent) self._verbose = verbose self._cached_distances = {} self._cached_fitness = {} def algorithm(self): """ Initialize population of lists and sets fittest. Yields a new population and mutate a small amount of the individuals to avoid getting stuck. Thorough elitism the most fitest individual is carried through the rounds. """ population = self.generate_population() fitest = min(population, key=self.fitness) total_time = time() for r in range(self._rounds): new_pop = [] while len(new_pop) < self._new_generation_size: father = self.select(population) mother = self.select(population) child = self.crossover(father, mother) if child not in new_pop: new_pop += [child] continue for i in range(len(new_pop)): if random() < self._mutate_rate: new_pop[i] = self.mutate(new_pop[i]) new_fittest = min(population, key=self.fitness) if self.fitness(fitest) > self.fitness(new_fittest): fitest = new_fittest if r % 50 == 0: print(r, self.fitness(min(population, key=self.fitness))) population = self.selection(new_pop) if fitest not in population: population += [fitest] self.result(population, fitest, total_time) def result(self, population, fitest, total_time): if self._verbose: for ind in sorted(population, key=self.fitness): print("Path: {}, Fitness: {:.3f}".format(ind, self.fitness(ind))) print("Cached-> Fitness:{}, Distances: {}".format(len(self._cached_fitness), len(self._cached_distances))) print("Execution Time: {:.3f}s".format(time() - total_time)) print("Best path found: {}, fitness: {:.3f}".format(fitest, self.fitness(fitest))) if self._verbose: self.plot(fitest) def selection(self, new_pop): """ Determines which individuals that survives. Shuffle to destroy symmetry selection over rounds. """ shuffle(new_pop) pop = [] for _ in range(self._population_size): survivor = self.select(new_pop) new_pop.remove(survivor) pop += [survivor] return pop def select(self, pop): """ Selects a individual that might have a low fitness. """ pop_total_fit = sum(1.0 / self.fitness(p) for p in pop) limit = uniform(0.0, pop_total_fit) c = 0 for p in pop: c += 1 / self._cached_fitness[hash(tuple(p))] if c > limit: return p def fitness(self, child): """ Returns the fitness of a individual if it has been calculated. Else it calculates the distance between each node and sum it up, cache it, this is the fitness of current individual. In this case a low fitness is a good fitness. """ h = hash(tuple(child)) if h in self._cached_fitness.keys(): return self._cached_fitness[h] distance = 0 for i in range(len(child)-1): distance += self.point_distance(child[i], child[i+1]) self._cached_fitness[h] = distance return distance @staticmethod def crossover(father, mother): """ Cross two individual thorough a gen by gen approach. For readability and optimization this function is kept ugly. """ child = [None]*len(father) rate = 0.5 for gen in father: parent, other_parent = (father, mother) if random() > rate \ else (mother, father) key = None for key, value in enumerate(parent): if value == gen: break if not child[key]: child[key] = gen continue for key, value in enumerate(other_parent): if value == gen: break if not child[key]: child[key] = gen continue for key, value in enumerate(child): if not value: child[key] = gen break return child @staticmethod def mutate(child): """ Swaps place of two gens. """ i1, i2 = sample(range(1, len(child)-1), 2) child[i1], child[i2] = child[i2], child[i1] return child def point_distance(self, p1, p2): """ Calculates the distance between two points and cache it. """ nodes = hash((p1, p2)) if nodes in self._cached_distances.keys(): return self._cached_distances[nodes] d = sqrt((p2[0]-p1[0])**2 + (p2[1]-p1[1])**2) self._cached_distances[nodes] = d return d def generate_population(self): """ Creates the initial populations. """ pop = [self._parent[:1]+sample( self._parent[1:-1], len(self._parent)-2)+self._parent[-1:] for _ in range(self._population_size)] for p in pop: h = hash(tuple(p)) self._cached_fitness[h] = self.fitness(p) return pop def plot(self, path): plt.axis([-1, self._side+1, -1, self._side+1]) for i in range(0, len(path)-1): plt.plot([path[i][0], path[i+1][0]], [path[i][1], path[i+1][1]], color="brown", marker="o") plt.show() def correct_ans(self, nodes): """ Very bad bruteforce approach. """ if len(nodes) > 11: print("Not a good idea.") raise Exception start = time() best = nodes for path in permutations(nodes[1:-1]): path = nodes[:1]+list(path)+nodes[-1:] if self.fitness(best) > self.fitness(path): best = path print("Correct ans should be: {}: fitness: {:.3f}, solutions: {}".format( str(best), self.fitness(best), len(list(permutations(nodes))))) print("Bruteforce approch: {:.3f}".format(time()-start)) self.plot(best) def profile(self): pr = cProfile.Profile() pr.enable() self.algorithm() pr.disable() s = io.StringIO() sortby = 'cumulative' ps = pstats.Stats(pr, stream=s).sort_stats(sortby) ps.print_stats() stats = s.getvalue().split("\n") stats = "\n".join([x for x in stats if "GridGA.py" in x]) print(stats) def main(): nodes = [(13, 2), (1, 12), (12, 5), (19, 6), (2, 10), (15, 15), (5, 11), (17, 9), (10, 18), (17, 5), #(13, 12), #(1, 17), (2, 6), (7, 16), (19, 2), (3, 7), #(10, 9), (5, 19), (1, 2), (9, 2) ] nodes += nodes[:1] ga = NodesLeastDistanceGA(nodes, 20) ga.profile() if __name__ == '__main__': main() 

What optimization and refactoring should I do?

\$\endgroup\$

2 Answers 2

1
\$\begingroup\$

Your code to cache the distances between points unnecessarily uses hash on all its inputs. This is actually how python dicts operate under the hood already. The call hash on the keys and store them thus. This is also where the limitation comes from that dictionary keys have to be hashable.

So you could replace all your self._cached_distances[hash((p1, p2))] = value etc with self._cached_distances[(p1, p2)] = value, because tuples are hashable (while lists are not, because they are mutable).

However, it would be even better to use a memoization decorator like this:

import functools def memoize(func): cache = func.cache = {} @functools.wraps(func) def wrapper(self, *args): if args not in cache: cache[args] = func(self, *args) return cache[args] return wrapper class NodesLeastDistanceGA: ... @memoize def point_distance(self, p1, p2): """ Calculates the distance between two points.""" return sqrt(sum((p2_i-p1_i)**2 for p1_i, p2_i in zip(p1, p2))) @memoize def fitness(self, child): """ Returns the fitness of a individual by calculating the distance between each node and summing it up. In this case a low fitness is a good fitness. """ return sum(self.point_distance(gen1, gen2) for gen1, gen2 in zip(child, child[1:])) 

This creates a separate cache for every function you decorate it with. It is slightly modified to allow decorating a class method. The functools.wraps makes sure that the name of the function and its docstring are copied to the wrapper function.

I also changed the function to calculate the distance slightly. This is the general form which can be used to calculate the distance between n-dimensional points.

Theoretically, point_distance could now be a staticmethod, since it does not depend on the class any more (it could even be an independent method). But this would mean having to define two decorators, one with and one without the self argument. Or you make your class hashable itself, then args == (hash(self), p1, p2) and everything will work without the explicit self in the decorator.

For the fitness function I also used the python idiom to iterate over pairs of an element of a list and its successor.

Caching the fitness function in this way means you have to make your child a tuple in the first place. There are two places where you would have to change your logic for this, as far as I can see, mutate and crossover. The first is easy to fix:

 @staticmethod def mutate(child): """ Swaps place of two gens. """ i1, i2 = sample(range(1, len(child)-1), 2) _child = list(child) _child[i1], _child[i2] = _child[i2], _child[i1] return tuple(_child) 

The latter is also easy, just return tuple(child).

\$\endgroup\$
5
  • \$\begingroup\$ Actually at a certain point I'll run out of ram for very small numbers of nodes, the possible distances increase with n!. So I think I'm better of keeping different caches for this particular implementation, if I can figure out a better method, I certainly use your memorization method. It rocks. \$\endgroup\$ Commented Oct 11, 2016 at 1:32
  • \$\begingroup\$ @Simon While I agree that at some point you will run out of memory, I don't I understand the other part if the answer. This keeps two caches, one for each decorated function, just like your code? Btw if memory is a problem, have a look at functools.lru_cache, which will only store a specified number if elements \$\endgroup\$ Commented Oct 11, 2016 at 6:48
  • \$\begingroup\$ Combinatorics, it's asking, in how many ways can we arrange n diffrent things, answer being n!. It's like for 0 nodes, there is 1 solution(None), 1 node 1 solutions, 1*2 nodes 2 solutions, 1*2*3 nodes 6 solutions, 1*2*3*4 nodes 24 solutions, 1*2*3*4*5 nodes 120 solutions, ... for 1*2*3*...18*19*20 nodes its 2432902008176640000 solutions. Not important. Very good suggestion. \$\endgroup\$ Commented Oct 11, 2016 at 8:11
  • \$\begingroup\$ Yes I'm aware of that. Unfortunately I don't know an answer to that problem. Actually nobody does, as you seem to be aware of. But if I did I would be rich and not posting it as a review on Code review:-) \$\endgroup\$ Commented Oct 11, 2016 at 8:15
  • \$\begingroup\$ Haha! Yes, seem weird that you didn't, seen your answers (y). \$\endgroup\$ Commented Oct 11, 2016 at 8:40
4
\$\begingroup\$

This is a very superficial review, but you have your generic algorithm code mixed in with the problem you're applying it to. In a general sense, this should be avoided whenever possible. Having only loosely related code immediately beside each other is just asking for something bad to happen during a future change.

Whenever I start on a learn a new language, I usually create a GA implementation for practice, and in case I ever actually need it.

It's a fairly easy concept to abstract out:

  • The class contains information about the set of possible genes (if a closed set), the max population size, maybe percentage chances of gene crossovers and other chance events (provided you want this to be constant across generations).

  • I keep mine simple and only expose a handful of methods. The main method is just a function that automates the processing of an entire generation. It takes the population, runs each genetic sequence through a fitness function (that the caller provides), then chops and repopulates.

By separating the GA code from the use code, you can safely make changes to either without risking breaking some almost unrelated, but coupled code. You also then have the benefit of using your independent GA implementation in any other projects you may need it for without needing to copy and paste select bits from your TSP code.

\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.