Your code looks generally correct, but ignores src and only searches in positive direction. In addition, it can be cleaned up and optimised significantly.
Some general comments first:
- Use full variable names in code that express meaning/purpose. There is no significant cost to using meaningful names, but they can make code much easier to digest.
- Be aware of the host language's features and standards. Avoid re-using the names of builtins (e.g.
min) and try to adhere to coding style standards. - Avoid
numpy unless actually using its inbuilt features. Using numpy.array for direct access is usually slower than list/set/... because values are converted to full Python objects on each access.
Do not make assumptions about the features of your data. In specific, avoid these:
MAX_DISTANCE = 99999 RANGE_ARR = [x for x in range(1, 1001)]
These fail for graphs with distance > 99999 or more than 1000 elements. Either compute them for your input, or use true upper bounds.
Since numbers have a well-defined "maximum", we can use this safely:
INFINITY = float('int')
Since the input graph is an nxn matrix, we can just query its size.
# inside `def dijkstra(graph, source):` indices = range(len(graph))
Let us start with vertex in Q with min dist[u]/dijkstra_get_min. Your algorithm is proper, but we can exploit that Python's builtin min already allows custom weights. The for vertex in Q: becomes the primary argument to min, the if dist[vertex[0], vertex[1]] <= min: becomes the weight key.
def dijkstra_get_min(vertices, distances): return min(vertices, key=lambda vertex: distance[vertex[0]][vertex[1]])
The Dijkstra algorithm consists of two parts – initialisation and search. Your code becomes clearer if we split these two parts – your line dist[0][0] = 0 is the transition from one to the other.
def dijkstra(graph, src=(0, 0)): # dist, prev, Q distances, prev_nodes, unvisited = dijkstra_initial(len(graph)) # set starting point distances[src[0]][src[1]] = 0 dijkstra_search(graph, distances, prev_nodes, unvisited) return distances, prev_nodes
The purpose of initialisation is that every point has the same value. This means we can directly create the matrices with their final value. Also, since the algorithm does not use the "previous node", we can initialise it to a cheap placeholder.
def dijkstra_initial(size): distances = [[INFINITY] * size for _ in range(size)] prev_nodes = [[None] * size for _ in range(size)] unvisited = {(x, y) for x in range(size) for y in range(size)} # dist, prev, Q return distances, prev_nodes, unvisited
Instead of tracking visited nodes as a list ([..., ...]) we use a set ({..., ...}). A set is unordered and supports O(1) membership tests, compared to list O(n) membership tests. This makes it better suited for bookkeeping of visited/unvisited nodes.
To search through the graph, we will be visiting the neighbours repeatedly. This is a key part that can be easily done wrong – unless the Graph implementation provides it, it can be worthwhile to implement explicitly.
def neighbours(node): x, y = node return [ (x + x_offset, y + y_offset) for x_offset in (-1, 0, 1) for y_offset in (-1, 0, 1) if not (x_offset == y_offset == 0) # reject node itself ]
The core of the algorithm stays logically the same: We adjust some names to be more speaking (e.g. u -> node, v -> neighbour). We use the prepared neighbours instead of the lengthy expression.
def dijkstra_search(graph, distances, prev_nodes, unvisited): while unvisited: node = dijkstra_get_min(unvisited, dist) unvisited.remove(node) for neighbour in neighbours(node): if neighbour not in unvisited: continue alt = distances[node[0]][node[1]] + graph[neighbour[0]][neighbour[1]] if alt < distances[neighbour[0]][neighbour[1]]: distances[neighbour[0]][neighbour[1]] = alt prev_nodes[neighbour[0]][neighbour[1]] = node
At this point, the code should be both faster and easier to maintain. The most glaring flaw we still have is the explicit handling of dimensions. Instead of manually accessing each dimension, it would be better if we could directly access points.
# currently distances[neighbour[0]][neighbour[1]] # desirable distances[neighbour]
This can be "fixed" by using dictionaries ({point: value, ...}) instead of nested lists ([[value, ...], ...]). An immediate downside is that this trades memory for simplicity.
However, it can be used to actually reduce memory usage – dictionaries can be naturally sparse, allowing us to simply not store undetermined fields. Since any visited node becomes irrelevant for distances, we can even clear distances of nodes that are already processed.
RANGE_ARR_0? \$\endgroup\$