Given two (small-ish) sequences $I$ and $O$ of rational numbers with $\sum_{x \in I}x=\sum_{y \in O}y$, generate a directed graph $G = (V,E)$ with minimum $|V|$ that respects the following:
- $|I|$ vertices have degrees $(0,1)$. These are sources.
- $|O|$ vertices have degrees $(1,0)$. These are sinks.
- All other vertices must have one of the following degrees: $S_2=(1,2)$, $S_3=(1,3)$, $M_2=(2,1)$ or $M_3=(3,1)$.
- The out-flow $O$ is produced at the sinks when in-flow $I$ is applied to the sources and the flow at $S_2$ and $S_3$ vertices is split uniformly.
- (bonus) The maximum flow over any edge is $\leq$ some given capacity limit $c$, which is $\geq max\{I\cup O\}$.
For example, here is an optimal solution for $I=(5,)$, $O=(1,1,1,1,1)$, $c\geq 6$: 
The motivation for this for the curious is creating balancer networks in a factory game (1).
I already have designed and implemented a working algorithm that does this! However, despite several optimizations I could come up with on my own, due to combinatorial blow-up it's still a bit too inefficient to solve instances quite of interesting size in a timely manner. In the following I will outline how I've been doing it so far:
The top-level idea is to generate the graphs adhering to the degree criteria in ascending order of size until one is found that also satisfies the flow.
- Initialize total internal node count $n := \lceil \frac{|I-O|}{2}\rceil$
- Iterate over node count solutions $\vec{x}=(S_2,S_3,M_2,M_3)$ of the linear diophantine equation system comprised of $\vec{1} * \vec{x} = n$ and $(1,2,-1,-2) * \vec{x} = |O|-|I|$.
- For each $\vec{x}$, construct an initial graph with the given node counts and an universal vertex such that all vertices share all their edges (incoming and outgoing with the adequate number for their type) only with this universal vertex. Degree conditions hold for all original vertices.
- Generate the graphs for this specific $\vec{x}$ by iteratively choosing one incoming and one outgoing edge of the universal vertex, removing them, and instead connecting the adjacent vertices directly. If the universal vertex is disconnected, remove it from the graph and jump to 6.
- In each step, check that the adjacent vertices were distinct, that every source is connected to at least one sink, that every sink is connected to at least one source, and remove all isomorphic duplicates. Back to 4.
- For each generated graph, compute the flow for the given in-flow using the weighted adjacency matrix of the graph (each row normalized to sum to 1) and check if it matches the desired out-flow at the sinks. And optionally the bottleneck criterion at the $M$ nodes. Terminate if a valid solution is found.
- Deplete the graph generation for $\vec{x}$.
- Deplete all $\vec{x}$ for the current $n$.
- Increment $n := n+1$. Back to step 2.
Some remarks and approaches, in no particular order:
- Obviously a critical thing here is that during the construction process, by going BFS over the number of performed edge substitutions, all partially constructed graphs with the same number of edge substitutions $k$ and $k+1$ done have to be stored simultaneously in memory. But I see no good way of avoiding this since it's essential to recognize isomorphic duplicates. The rejected ones and the ones from the second-to-last layer are garbage collected.
- To efficiently recognize isomorphs, I'm first doing a Weisfeiler-Lehman graph hash and store them in a map from hash to list of all graphs with the same hash. When a new graph is to be added to such a list, there will be pairwise direct isomorphism checks only with each other graph in that list (i.e. with the same hash), not others. A pair is checked by first checking for matching degree sequence (question: is this unnecessary and equal degree sequence already implied by equal Weisfeiler-Lehman hash? wait, it's already unnecessary due to the construction from same node counts of each degree profile and same number of construction steps, isn't it?), then only if matching is a potentially expensive, definitive isomorphism test performed using the vf2 algorithm.
- What I see from logging the rejections separated by the check that triggered it is that in every step, roughly 3/4ths of the candidates are removed due to isomorphism.The ratio varies by size and node count profile but I don't have further insight here. Since that reduces basically the branching factor of this search by 3/4ths I'd say the isomorphism reduction is absolutely essential.
- Of course, that's still a lot of time spent generating graphs that are pruned due to isomorphism anyways. Is clean, isomorph-free generation possible for my specific purpose, a la McKay 1999? Would the effort be worth it?
- Note we are generating all possible graphs with the given node counts and degree types, when we only want the select few that could potentially end up with the correct flow in the end. I feel like this is the biggest block of improvement potential here. If it was somehow possible to reject candidates earlier on during the construction process on the basis of recognizing that, no matter how the not-yet-replaced connections to the universal vertex are re-wired, it cannot possibly result in the desired flow, that could result in huge savings. Need linear algebra buffs to chime in on this topic and the following points.
- It feels like what the graph is doing to get from the in-flow to the out-flow is one matrix multiplication, but I struggle to conceptualize how exactly this matrix relates to the weighted adjacency matrix I have and use to solve the system. Perhaps exploring this in more detail could give clues. We might use some variables in the adjacency matrix to denote still-reconnectable edges and perhaps that spans some kind of subvector space we could check for overlap with the demanded mapping somehow, but that's just very vague ideas and I don't know how to actually do this. Eigenvalue spectrum of the matrix maybe? I'm grasping at straws.
So, that's all I have for now. If you have any ideas for improvements, opportunity for pruning, or a different approach that would be more efficient, please chime in!