2

In my application, I perform a topological sort of nodes where an integer represents a node's ID using DFS and white/gray/black coloring.

See https://yuminlee2.medium.com/topological-sort-cf9f8e43af6a and the section titled "Method 2" for the technique.

As you can see in the example cited above (and typically for all samples of depth-first search I can find) the topological sort occurs with integers and results in an ordered list of integers representing nodes in the graph.

This worked great until I needed to represent a non-node as part of the graph. In my case, I needed the concept of a "system" of nodes.

This is a small tangent to explain the concept of "system". Consider a motor vehicle comprised of "parts." Each part is a node in the graph. For example, a "bolt' is a part. a "rubber hose" is a part.

Now consider an "engine". The engine itself is not a "part". Rather it is a "system" of parts that are related so intimately that we can call them an "engine". The "engine" is dependent on multiple parts (camshaft, block, pistons, sparkplugs, etc). It is an abstract concept since it has no exact "part" itself but nevertheless it has a definable set of parts that it depends on.

Because it has a directed, acyclic relationship to those specific parts that belong in the engine "system", the abstract concept of engine can be sorted.

BUT we arrive at the problem, as we now need an integer to represent not a physical part but an abstract system.

And in reality, we have multiple abstract overlapping systems; for example, an "engine" belongs to a "drivetrain" system; a "master cylinder" belongs to a "hydraulic" system and a "electrical" system; An "alternator" belongs to both an "engine" system and an "electrical" system.

These "systems" are all comprised of valid acyclic relationships that overlap in may ways but are nevertheless acyclic (ignore cyclic conditions for now)

As explained in the above vehicle example, and using the logic of graphs, we can always determine a sort order for both the parts and the systems of parts even if the systems overlap.

Previously in my app design, integers represented nodes/objects (i.e. parts) and always mapped directly to the physical object (the allocated class instance); now the graph sort is a mix of objects and systems.

Speaking in terms of software engineering (using C# language), in the old design, we could allocate Int32.MaxValue numbers of parts/nodes to store in a List<Part> where their integer ID represents an index in the list and Part is an abstract class instance/allocation.

But now, in order to perform a DFS topological sort, we need to assign integers to objects and arbitrary systems to perform the sort. Intuitively, if the integer id used for the sort was previously also an index into an array of allocations, this does not correspond cleanly with the introduction of systems.

Assuming the allocated nodes might occupy the range of integers from zero to N, I can only see two approaches to assigning integer id's to the non-nodes:

  1. The app logic needs to reserve an integer range to limit the number of nodes, let's call it Nodes.MaxCount. By assigning an arbitrary upper bound, the range of Int32.MaxValue - Nodes.MaxCount becomes the available range of integers for non-nodes (i.e. systems). Then a system can get an id from the reserved range.
  2. If I could perform a topological sort on a two-dimensional ID like (int domain, int id) represented here as a tuple, it would enable a sort on Int32.MaxValue number of systems with Int32.MaxValue numbers of elements in each system. Then a node in the graph could be of type (Node,id) or (System,id) and the integers will be able to handle overlaps (i.e. we can have both a node with id '1' and also a System with id '1').

The second clearly has more expressiveness and available range, but I don't know how to refactor a typical white/black topological sort to use this two-dimensional id (or if it is even possible).

EDIT:

Let's consider a hash map using the motor vehicle example. Some pseudocode:

enum SystemType { Default, Engine, Drivetrain, Hydraulic, Electric } Dictionary<(SystemType,guid),List<???>> allparts = new(); // is a hash map in C# Bolt b = new Bolt(); b.ID = Guid.NewGuid(); allparts.Add((SystemType.Default,b.ID),b.GetDependentParts()); allparts.Add((SystemType.Engine,b.ID),b.GetDependentEngineSpecificParts()); allparts.Add((SystemType.Drivetrain,b.ID),b.GetDependentDriveTrainSpecificParts()); 

This is some minimal pseudocode that attempts to store system-specific dependencies in a hash map with key type of (SystemType,guid). This tuple is easy to hash into a hashcode integer. Therefore, the "drivetrain" system can perform a lookup in the hash map to get a set of relevant parts to be used in sorting. The problem is that this logic is awful on space complexity and appears to be storing a logarithmic number of hash map entries for the elements. O(N_elements*N_systems). Also, The List<???> represents an array of dependent "somethings" since we are trying to solve this without integers ??? must be a type of something-else other than integers.

I'm not opposed to the hash map route, but this code is not optimal.

10
  • 2
    I don't know what "white/black topological sort" means, but typical topological sorting does not care about ids. It only cares about graph relation, more specifically about "get me children" function. Commented Oct 26, 2024 at 19:48
  • 2
    @NWoodsman ok. But still, those algorithms don't care about ids. Only about graph relation, as I said earlier. Perhaps you modeled the graph itself incorrectly. One way is to simply make a hashmap of the form (node, list of nodes) which represents "parent -> children" relationship. Commented Oct 26, 2024 at 20:29
  • 2
    "But now, in order to perform a DFS topological sort, we need to assign integers to objects and arbitrary systems to perform the sort." — I challenge that you need to assign integers in order to perform a sort. For example, you could assign a string as ID instead, where that string has multiple components, as in N.M.X.Y as many elements as the depth of the item in the tree. Also as others are saying, a topo sort only requires maintaining ordering of parent to child, which is also not an integer. Commented Oct 26, 2024 at 20:36
  • 1
    @freakish I added a pseduo-example of a hash map implementation. See edit. Commented Oct 26, 2024 at 21:38
  • 2
    @NWoodsman "The problem is that this logic is awful on space complexity and appears to be storing a logarithmic number of hash map entries for the elements." What is your concern here? Are you dealing with a graph of size of hundreds of millions of nodes for that to matter? If not, then don't bother, it doesn't matter. I mean, you can pack it, but who cares? Premature optimization is the root of all evil. Commented Oct 26, 2024 at 21:57

1 Answer 1

2

Also, The List represents an array of dependent "somethings" since we are trying to solve this without integers ??? must be a type of something-else other than integers.

Let us give that "somethings" a class name, for example NodeSystem. A node system should know it's system type SystemType and it's parts, which are either other node system or terminal nodes. Hence, the dictionary only needs a guid as key, it would be pointless to use the SystemType which is already included in the value:

Dictionary<guid,NodeSystem> allparts = new(); 

The NodeSystem, however, can be modeled by a well-known design pattern from the famous GoF book: the composite pattern. This seems to be a good fit here.

The knowledge of the SystemType can be implemented either by adding a type attribute to NodeSystem, or by introducing different subclasses - I don't know what fits better to your case, that's something you have to decide for yourself.

For example, the resulting class structure might look like this:

enter image description here

If I got your post right, in your case the same Node can occur in different NodeSystems. This structure does not forbid it.

There are diffent implementations of the aggregate reference NodeSystem->Component in C# possible. The most straightforward is List<Component>.

The introduction of GUIDs by yourself seems to have already removed the necessity of sorted integer ID generation. Hence sorting the elements does not seem to be necessary, and I am not sure if there is any sorting problem still to be solved here (that is something you might clarify). Still, if you want to iterate over all components in an order which fits to the induced DAG of this structure, Kahn's algorithm (from the link you provided) will work well with Guids as keys for the Nodes.

If you think 128 bit GUIDs need to much memory, you could still use a 32 bit integer instead, generated from some counter during node or node system construction, just without any relationship between the ID order and the topological order of the nodes.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.