2

Been trying to build a method that gets all conceivable unique paths through a DAG. Went with recursion because it seemed like the easiest to understand. Ended up with this:

public class Brutus { //the previous nodes visited public ArrayList<Node> resultHistory = new ArrayList<Node>(); //Directed Graph class, contains a HashMap [adjacencies] // that has keys for all nodes that contains all edges public AdjacencyList paths; //A list of all the pathways between nodes represented as a list of nodes public ArrayList<ArrayList<Node>> allPaths = new ArrayList<ArrayList<Node>>(); public Brutus(AdjacencyList paths) { this.paths = paths; } public ArrayList<ArrayList<Node>> findAll() { int counter = 1; for (Node key : paths.adjacencies.keySet()) { System.out.println("[" + counter + "]: " + key.toString()); StringTokenizer st = new StringTokenizer( paths.getAdjacentString(key), ","); while (st.hasMoreTokens()) { String child = st.nextToken(); if (paths.getNodeFromGraph(child) != null) { resultHistory = new ArrayList<Node>(); resultHistory.add(key); findPath(child, resultHistory); } } counter++; } return allPaths; } public void findPath(String child, ArrayList<Node> resultHistory) { if (resultHistory.contains(paths.getNodeFromGraph(child))) { return; } resultHistory.add(paths.getNodeFromGraph(child)); if(!(inList(resultHistory, allPaths))) { allPaths.add(resultHistory); } StringTokenizer st = new StringTokenizer( paths.getAdjacentString(paths.getNodeFromGraph(child)), ","); while (st.hasMoreTokens()) { child = st.nextToken(); if (paths.getNodeFromGraph(child) != null) { findPath(child, resultHistory); } } } public boolean inList(ArrayList<Node> resultHistory, ArrayList<ArrayList<Node>> allPaths) { for (int i = 0; i < allPaths.size();i++) { if (allPaths.get(i).equals(resultHistory)) { return true; } } return false; } 

Problem is, I don't think it works for all paths, since I can't find certain paths inside it. Although as the dataset is 900 nodes, I am unable to find a pattern! Other questions on Stack seem to be somewhat more specialized and as such I attempted to build my own algorithm!

Can anyone either suggest a superior way to perform this, or tell me what I've done wrong? If the algorithms correct, what would be the best way to withdraw all the paths between two nodes?

EDIT: I now realize that new paths don't get created from child nodes of the original, how would I make it so it does?

6
  • why? usually you dont need that. Commented Feb 15, 2013 at 0:42
  • The DAG emulates a metabolic pathway of a bacterium. So ALL paths could be of interest to the biologists, and the ability to find strange pathways they hadn't seen because they're 200 chemicals long is very important. Commented Feb 15, 2013 at 0:46
  • 1
    Why don't you just use a modified BFS, algorithm, where (i) you have a list of PATHS to be visited, instead of a vertex, and (ii) you don't check if a vertex is already visited during the search? Commented Feb 15, 2013 at 0:48
  • Would that be any more efficient? And if so, could you perhaps elaborate as an answer either in pseudo or java code? That way, if it works, I can upvote you and give you the answer mark! :) Edit: I ask about efficiency, because atm there is a 5-6 minute load time to find about 3800 pathways. Commented Feb 15, 2013 at 0:52
  • 1
    It is optimal, because its running time is equal to the size of the answer. I'll give you a pseudocode implementation. Commented Feb 15, 2013 at 13:55

3 Answers 3

3

Here there is an implementation based on the BFS algorithm. I will denote a path as a sequence of vertices l = (v, v', v'', ...) and I will perform the following two operations on it:

  • extend(l, v): puts vertex v at the end of list l;
  • v = back(l): gets the last vertex in list l.

FindPaths(G, v) { // The first path is, simply, the starting node. // It should be the first vertex in topological order. pending_paths = {(v)}; while (pending_paths is not empty) { l = pending_paths.remove_first(); // pop the first pending path output(l); // output it (or save it in a list to be returned, if you prefer) v = back(l); // Get the last vertex of the path foreach(edge (v, v') in G) { // For each edge outgoing from v'... // extend l with v' and put into the list of paths to be examined. pending_paths.push_back(extend(l, v')); } } } 
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, this put me on the right track, posted my own version below incase it helps anyone in the future! Cheers!
I'm glad this helped you with your job :)
3

Here's a simple recursive algorithm, expressed in pseudocode to avoid clouding the issue with lots of Java list manipulation:

AllPaths(currentNode): result = EmptyList() foreach child in children(node): subpaths = AllPaths(child) foreach subpath in subpaths: Append(result, currentNode + subpath) return result 

Calling AllPaths on the root node will give you what you need, and you can improve the running time for nontrivial DAGs by caching the result of AllPaths on each node, so you only need to compute it once rather than once per distinct path that includes it.

6 Comments

Tried to use the algorithm, but getting stackoverflow on the recursion, code here. Not sure if I've done it wrong, or my number of paths is too high.
The maximum recursion depth should be equal to the longest path in your graph. Do you know about what the longest path is? If it's not terribly long, then you've probably got a bug where you're recurring in a case when you shouldn't. If it's long enough, you may need to convert the recursive algorithm to an explicit stack-based algorithm instead (or rewrite in a language with better recursion support, such as Haskell, ML, or Scheme). If you need help holler.
No I don't know the length, but it could easily be incredibly long. It's why I'm building the program - to find things such as that! Could you take a quick look at the code I posted above to make sure my logic is correct? If it is I may have to switch to Haskell!
But it terms of number of paths, netBeans got to 500,000 and crashed using the algorithm I posted (modified to check sub pathways of paths and add them to the list) so it could well be a depth issue.
The only thing I notice is that on line 17 you probably want "continue" instead of "return" -- but that that whole check probably isn't necessary, since this algorithm will traverse each path exactly once. (Also, is inList necessary? Won't contains work?) I would guess you're hitting a real recursion depth problem.
|
2

So while @akappa's Pseudo was a good start, it took me awhile to understand how to make it work, if anyone else comes across this post here's how I did it:

public ArrayList<ArrayList<Node>> searchAll() { try { BufferedWriter out = new BufferedWriter(new FileWriter("output.txt")); //Gets Nodes from Hashmap and puts them into Queue for (Node key : paths.adjacencies.keySet()) { queue.addToQueue(new QueueItem(key.chemName, new ArrayList<Node>())); } while (queue.getSize() > 0) { QueueItem queueItem = queue.getFromQueue(); Node node = paths.getNodeFromGraph(queueItem.getNodeId()); if (node != null) { findRelationAll(node, queueItem, out); } } System.out.println("Cycle complete: Number of Edges: [" + resultHistoryAll.size() + "]"); out.close(); } catch (IOException e) { } return resultHistoryAll; } public void findRelationAll(Node node, QueueItem queueItem, BufferedWriter out) { if (!foundRelation) { StringTokenizer st = new StringTokenizer(paths.getAdjacentString(node), ","); while (st.hasMoreTokens()) { String child = st.nextToken(); ArrayList<Node> history = new ArrayList<Node>(); //Gets previous Nodes history.addAll(queueItem.getHistoryPath()); //Makes sure path is unique if (history.contains(node)) { System.out.println("[" + counter2 + "]: Cyclic"); counter2++; continue; } history.add(node); resultHistory = history; queue.addToQueue(new QueueItem(child, history)); if (!(inList(resultHistory, resultHistoryAll))) { resultHistoryAll.add(resultHistory); try { out.write("[" + counter + "]: " + resultHistory.toString()); out.newLine(); out.newLine(); } catch (IOException e) { } System.out.println("[" + counter + "]: " + resultHistory.toString()); counter++; } else { System.out.println("[" + counter3 + "]: InList"); counter3++; } } } } //This checks path isn't in List already public boolean inList(ArrayList<Node> resultHistory, ArrayList<ArrayList<Node>> allPaths) { for (int i = 0; i < allPaths.size(); i++) { if (allPaths.get(i).equals(resultHistory)) { return true; } } return false; } 

}

The above code does a few extra things that you might not want:

  • It writes pathways to a text file as a list of nodes + it's counter value.
  • Makes sure the path doesn't cross the same node twice
  • Makes sure no two pathways are the same in the final list (in normal circumstances this is unnecessary work)

The QueueItem object is just a way to store the previously visited nodes. It's part of nemanja's code, which is what my code was based off.

Hat tip to him, akappa (for the most efficient answer), and jacobm (for finding a solution like my original code, and explaining it's limitations).

Incase anyone's actually interested in the work; I'm currently processing over 5 million pathways, of which 60,000 are unique pathways between 900 chemicals. And that's just 1,2,3 or 4 chemical pathways... Biology is complicated.

EDIT and Warning: IF anyone is handling huge reams of data like me, windows 7 - or at least my machine - throws a shit fit and closes the program after an ArrayList > 63,000 objects, regardless of how you arrange the pointers. The solution I started with was after 60,000 objects, restarting the list while adding everything to CSV. This led to some duplicates between list iteration, and should ultimately be surpassed by my moving to linux tomorrow!

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.