Product distribution between a set of machines

Question

I'm working on a problem, that deals with product distribution between a set of machines in a factory. Here's what I'm given (variables):

The amount of products produced daily, e.g. 19200
The amount of products produced, as well as their distribution, e.g.:
- Product A - 35% (6720 pieces)
- Product B - 15% (2880 pieces)
- Product C - 20% (3840 pieces)
- Product D - 30% (5760 pieces)
The machines setup, that includes the amount of machines, a specification of which products the machine can process, as well as its cycle time, e.g.:
- Machine 1 - Can process products A and C - 36 seconds cycle time
- Machine 2 - Can process products B and D - 36 seconds cycle time
- Machine 3 - Can process products B and C - 36 seconds cycle time
- Machine 4 - Can process products A and D - 36 seconds cycle time
- Machine 5 - Can process products A and B - 18 seconds cycle time
- Machine 6 - Can process products C and D - 18 seconds cycle time

Based on the data above, I have prepared an excel sheet, that can calculate the machine capability (based on its cycle time), as well as the expected product flow through each machine, using a system of weights, that specify a proportion of each product flowing through the machine. The end result is "machine utilization" which is the percentage of time throughout the day, that the machine ends up processing products, instead of just sitting there idle:

Having that, I can then keep adjusting the weights manually, until the flow between all machines is all balanced. This is how it looks like after solving:

The goal of all of this is to:

Ensure there is at least one machine available to process each product.
Find out if a solution where the utilization of each machine does not exceed 100% is even possible to achieve in the first place.
Find out each machine's utilization, assuming all of the available flow is equally distributed between each available machine, to the extent allowed within the specified constraints.
Find out the highest machine utilization (trivial having the above).
Test for possible scenarios, for ex. a machine becomes unavailable. It's flow has to be redistributed across all other remaining machines, and has to meet all of the above goals too.

The notes/caveats to account for:

The usual amounts I have to deal with, is ~10 total products and 10-20 machines, where each machine ends up processing 1-3 product types each.
I'm pretty sure that there's always an infinite number of solutions to this problem, but fortunately, I don't really care about each exact solution, as long as it meets the goals stated above.
Some machine product distributions sometimes end up forming separate groups, where for ex. products A and B can only be processed by one set of machines, and products C and D only by the remaining machines. This means the utilization cannot be balanced across all machines in the system, but can instead be balanced between all machines in a group. Knowing which machines end of forming those groups is valuable too.

The current solution I have, which is adjusting the weights manually, is really tedious and takes a lot of time, especially for bigger systems. As far as I was able to figure it out though, this is just a set of linear equations, that a computer should be able to solve in milliseconds.

I've wanted to turn this problem into a Python script, where I'd be able to specify the input data and constraints, and it'd figure out the weights for me. The problem is, I've never done any data analysis before, so I don't really know how to do it. I did some research and found the Google OR-Tools, that might allow me to be able to do this, however, I cannot for the life of me figure out how to turn this problem into a set of linear equations. Nothing I try seems to be working.

I've figured asking here would be my best bet at getting some help. Can someone explain how to turn this into a set of linear equations? Is this a set of linear equations in the first place, or maybe something else? I'm a little bit lost on how to approach this.

The excel file from the pictures, in case it'd be needed: Products.xlsx

EDIT: An example of a machine setup with two non-overlapping flow groups, where products A and C form one group of 55% of the total flow, and products B and D form the remaining 45% (these are the actual solution weights):

Solution utilization for 1/10th of the input product (1920) is:
[14.667, 14.667, 14.667, 8.0, 8.0, 6.0]

Another one, with two 50-50 groups of products A-B and C-D (also the solution weights):

Solution utilization for 1/10th of the input product (1920) is:
[14.0, 14.0, 12.0, 8.0, 8.0, 8.0]

MuhammedYunus · Accepted Answer · 2025-09-07 10:35:45Z

I set the problem up below for solving using scipy.optimize.linprog. I get the required result for the overlapping case, but only an approximate solution for the non-overlapping case.

Test for possible scenarios, for ex. a machine becomes unavailable. It's flow has to be redistributed across all other remaining machines, and has to meet all of the above goals too.

In the walk-through below I give all machines the same capacity. You could easily zero one out to mimic a scenario where a machine was unavailable. Similarly, you could set it to a fraction to mimic partial capacity, or make it a function of other factors.

For a more statistical exploration of scenarios, you could make machine capacity probabilistic and report the statistics from multiple trials.

Solution walk-through

Supplied data

I use the overlapping case from your examples.

The supplied data comprises:

Products and machines

products = ['Pa', 'Pb', 'Pc', 'Pd'] machines = ['M1', 'M2', 'M3', 'M4', 'M5', 'M6'] num_products = len(products) num_machines = len(machines)

The quantity of each product - one third of the full load in this case.

# Pa Pb Pc Pd product_totals = np.array([6720, 2880, 3840, 5760]) // 3

A matrix of machine-product compatibilities

machine_product_compatibility = np.array([ #Pa Pb Pc Pd [1, 0, 1, 0], # M1: Pa, Pc [0, 1, 0, 1], # M2: Pb, Pd [0, 1, 1, 0], # M3: Pb, Pc [1, 0, 0, 1], # M4: Pa, Pd [1, 1, 0, 0], # M5: Pa, Pb [0, 0, 1, 1], # M6: Pc, Pd ])

The cycle time of each machine

# M1 M2 M3 M4 M5 M6 cycle_times = np.array([36, 36, 36, 36, 18, 18])

The capacity of each machine (how much time/seconds it is available for)

#Seconds available per machine T_workday = 24 * 60**2 machine_capacities = np.full(num_machines, T_workday)

The utilisation (± tolerance) we want each machine to be at. It is uniform here, and defined with reference to a known 'full load' value.

# For enforcing uniform machine utilisation full_utilisation_reference = np.array([6720, 2880, 3840, 5760]).sum() #all machines were 100% here uniform_ideal = product_totals.sum() / full_utilisation_reference uniformity_tolerance = 0/100 #Accept 0% deviation either side of the ideal uniformity_lb = max(0., uniform_ideal - uniformity_tolerance) uniformity_ub = min(1., uniform_ideal + uniformity_tolerance)

Framing the problem

Each product will be allotted to one or more machines. Since each product can, in theory, be spread over six machines, we can say that the task is to estimate six weights per product. So we want to estimate num_machines x num_products = 24 weights in total.

# We want to estimate a weight per product, per machine num_weights = num_machines * num_products

These product weights (percentages) say how much of a product is given to each machine.

Defining constraints

The weights need to encode these constraints:

1. The weights are like percentages, so they must sum to 100% for each product

I set this up using an equality constraint (A_eq, b_eq).

# === Equality constraints === # For each product p, product_weights must sum to 100% # A_eq @ product_weights = b_eq A_eq = np.zeros((num_products, num_weights)) b_eq = np.ones(num_products) for p in range(num_products): #Compatibility matrix, limited to product p compatibilities_p = np.zeros_like(machine_product_compatibility) compatibilities_p[:, p] = machine_product_compatibility[:, p] #Sum of weights for machine p (A_eq @ x) = 1 (b_eq) A_eq[p, :] = compatibilities_p.ravel()

The constraint works as follows. We use matrix notation to define a constraint per product, and put that into a single matrix. Each row is for a different product.

$$A_{eq} \cdot weights=b_{eq}$$

In this case, $A_{eq}$ is shaped (num_products, num_weights), and $b_{ub}$ is shaped (num_products,) since it must match the number of rows.

Concretely, for a given product $p$, the corresponding row from $A_{eq} \cdot weights$ will select the weights relevant to that product and sum them together: $\Sigma weights_{p}$. The matching element from $b_{eq}$ specifies that the result should equal 100% (hence "equality constraint"): $$\Sigma weights_{p}=100\%$$

2. The usage of each machine should be uniform

I define this using an upper-bound constraint.

# === Uniformity constraints === # Each machine should be utilised at a similar level: # unform_ideal - tolerance < utilisation % < unform_ideal + tolerance A_uniform = np.zeros((num_machines * 2, num_weights)) b_uniform = np.zeros((num_machines * 2,)) for m in range(num_machines): #Compatibility matrix, limited to machine m compatibilities_m = np.zeros_like(machine_product_compatibility) compatibilities_m[m, :] = machine_product_compatibility[m, :] seconds_per_x = compatibilities_m * product_totals[None, :] * cycle_times[m] #Upper bound # utilisation < (ideal + tolerance) * capacity A_uniform[m, :] = seconds_per_x.ravel() b_uniform[m] = uniformity_ub * machine_capacities[m] #Lower bound # utilisation > (ideal - tol)*capacity -> -sum < -(ideal - tol)*capacity A_uniform[2 * m + 1, :] = -seconds_per_x.ravel() b_uniform[2 * m + 1] = -uniformity_lb * machine_capacities[m] A_ub, b_ub = A_uniform, b_uniform

3. For incompatible product-machine pairs, the weight must be zero

4. Each weight must be bounded between $(0, 1)$

I set these up using boundary constraints:

# === Bounds constraints === # Product weights bounded between (0, 1) for compatible product-machine pairs, else 0 bounds = [ (0., 1.) if machine_product_compatibility[m, p] else (0., 0.) for m in range(num_machines) for p in range(num_products) ]

Solving for `product_weights`

We want to estimate product_weights in a way that minimises time utilisation across machines. The utilisation of each machine is given by summing unweighted_utilisation x product_weights per machine.

# Coefficients # Each row (machine) of unweighted_utilisation @ product_weights gives # utilisation distribution over products. # Will find product_weights st utilisation minimised (ie time minimised -> maximal throughput) unweighted_utilisation = ( machine_product_compatibility * cycle_times[:, None] * product_totals[None, :] ) #Estimate product_weights (returned in res.x) res = linprog(unweighted_utilisation.ravel(), A_ub, b_ub, A_eq, b_eq, bounds) display(res)

Results for this example

The percentage weights come back as:

 #Pa Pb Pc Pd array([[ 36, 0, 0, 0], # M1 [ 0, 0, 0, 42], # M2 [ 0, 0, 62, 0], # M3 [ 36, 0, 0, 0], # M4 [ 29, 100, 0, 0], # M5 [ 0, 0, 38, 58]] # M6 )

This suggests that you can reserve machines 1 to 4 for a single product each, and use machines 5 and 6 for two products each.

The total available capacity machine_capacities.sum() (which is seconds in a day $\times$ number of machines), of which a third is utilised, since we have 1/3 of the full load in this example.

available time: 518,400 | utilised time: 172,800

Fractional utilisation per machine shows each hitting the uniformity requirement:

[0.33 0.33 0.33 0.33 0.33 0.33]

Non-overlapping case

I also tried for the second non-overlapping case (at 1/10th total). It succeeds if I allow a uniformity tolerance of 4%, by changing uniformity_tolerance to 4/100. The solution I get results in utilisation fractions of [0.14 0.14 0.12 0.06 0.11 0.06], which is close to yours though not identical. This means that you'd need to increase the tolerance until it works, which might not be practical. I think a squared-error minimisation approach might work better, since it will treat constraints as soft conditions rather than hard failure points.

Reproducible example

import numpy as np from scipy.optimize import linprog products = ['Pa', 'Pb', 'Pc', 'Pd'] machines = ['M1', 'M2', 'M3', 'M4', 'M5', 'M6'] num_products = len(products) num_machines = len(machines) # Pa Pb Pc Pd product_totals = np.array([6720, 2880, 3840, 5760]) // 3 machine_product_compatibility = np.array([ #Pa Pb Pc Pd [1, 0, 1, 0], # M1: Pa, Pc [0, 1, 0, 1], # M2: Pb, Pd [0, 1, 1, 0], # M3: Pb, Pc [1, 0, 0, 1], # M4: Pa, Pd [1, 1, 0, 0], # M5: Pa, Pb [0, 0, 1, 1], # M6: Pc, Pd ]) # M1 M2 M3 M4 M5 M6 cycle_times = np.array([36, 36, 36, 36, 18, 18]) #Seconds available per machine T_workday = 24 * 60**2 machine_capacities = np.full(num_machines, T_workday) # For enforcing uniform machine utilisation full_utilisation_reference = np.array([6720, 2880, 3840, 5760]).sum() #all machines were 100% here uniform_ideal = product_totals.sum() / full_utilisation_reference uniformity_tolerance = 0/100 #Accept 0% deviation either side of the ideal uniformity_lb = max(0., uniform_ideal - uniformity_tolerance) uniformity_ub = min(1., uniform_ideal + uniformity_tolerance) # We want to estimate a weight per product, per machine num_weights = num_machines * num_products # === Equality constraints === # For each product p, product_weights must sum to 100% # A_eq @ product_weights = b_eq A_eq = np.zeros((num_products, num_weights)) b_eq = np.ones(num_products) for p in range(num_products): #Compatibility matrix, limited to product p compatibilities_p = np.zeros_like(machine_product_compatibility) compatibilities_p[:, p] = machine_product_compatibility[:, p] #Sum of weights for machine p (A_eq @ x) = 1 (b_eq) A_eq[p, :] = compatibilities_p.ravel() # === Uniformity constraints === # Each machine should be utilised at a similar level: # unform_ideal - tolerance < utilisation % < unform_ideal + tolerance A_uniform = np.zeros((num_machines * 2, num_weights)) b_uniform = np.zeros((num_machines * 2,)) for m in range(num_machines): #Compatibility matrix, limited to machine m compatibilities_m = np.zeros_like(machine_product_compatibility) compatibilities_m[m, :] = machine_product_compatibility[m, :] seconds_per_x = compatibilities_m * product_totals[None, :] * cycle_times[m] #Upper bound # utilisation < (ideal + tolerance) * capacity A_uniform[m, :] = seconds_per_x.ravel() b_uniform[m] = uniformity_ub * machine_capacities[m] #Lower bound # utilisation > (ideal - tol)*capacity -> -sum < -(ideal - tol)*capacity A_uniform[2 * m + 1, :] = -seconds_per_x.ravel() b_uniform[2 * m + 1] = -uniformity_lb * machine_capacities[m] A_ub, b_ub = A_uniform, b_uniform # === Bounds constraints === # Product weights bounded between [0, 1] for compatible product-machine pairs, else 0 bounds = [ (0., 1.) if machine_product_compatibility[m, p] else (0., 0.) for m in range(num_machines) for p in range(num_products) ] # Coefficients # Each row (machine) of unweighted_utilisation @ product_weights gives # utilisation distribution over products. # Will find product_weights st utilisation minimised (ie time minimised -> maximal throughput) unweighted_utilisation = ( machine_product_compatibility * cycle_times[:, None] * product_totals[None, :] ) #Estimate product_weights (returned in res.x) res = linprog(unweighted_utilisation.ravel(), A_ub, b_ub, A_eq, b_eq, bounds) display(res) # === View results === #Product weights for each machine product_weights = res.x.reshape(num_machines, num_products) print( 'Percentage product weights for machines (rows) and products (columns):\n', product_weights.__mul__(100).round().astype(int), sep='' ) #Total utilisation print( '\navailable time: {:,}'.format(machine_capacities.sum()), ' | utilised time: {:,}'.format(res.fun) ) #Machine time per product print( '\nUtilisation time per machine (rows) per product (columns):\n', product_weights * unweighted_utilisation, '\n\nFractional utilisation per machine:\n', (product_weights * unweighted_utilisation).sum(axis=1) / machine_capacities, sep='' )

So it is a set of linear equations after all. This is exactly what I was looking for! Thank you so much =) — DevilXD
– DevilXD, Commented Sep 6 at 11:02
Hmm, I've been playing with it a little, and there seems to be either a misunderstanding on my side, or on your/code's side. If I decrease the amount of products two fold, making the amounts [3360, 1440, 1920, 2880], the result is four machines running at 100%, and two machines just sitting there doing nothing. This goes against the 3rd goal: "all of the available flow is equally distributed between each available machine, to the extent allowed within the specified constraints". The end result should be all machines running at 50% of their capacity instead. — DevilXD
– DevilXD, Commented Sep 6 at 17:17
You're right that the code tries to maximise utilisation. This is done by the minus sign at res = linprog(-unweighted_utilisation.ravel(), ...). If you remove the minus sign it will become a minimisation problem. I can see what you're saying - processing products as fast as possible implies minimising machine utilisation. — MuhammedYunus
– MuhammedYunus, Commented Sep 6 at 18:05
Yeah it feels like another constraint needs to be specified, or they need to be reconsidered. I ran the code and found utilisation is still maximal, even when minimising the objective. I will look into it. So you want maximal throughput, with at least 50% per machine? — MuhammedYunus
– MuhammedYunus, Commented Sep 6 at 18:18

MuhammedYunus · Accepted Answer · 2025-09-19 13:43:55Z

This approach frames the constraints as 'soft constraints' whose MSE is minimised using gradient descent.

Concepts

Key constraints

Weights sum to 1 for compatible machines, and individual weights are bounded between (0, 1). These two constraints are strictly enforced using a softmax transform (see the answer by @DevilXD for the original suggestion and commentary).

A soft constraint is mach_overcapacity_loss which tries to enforce machine utilisation $\leq$ machine capacity.

Machine grouping

Machine groups are determined using graph analysis; machines are linked if they share a product, which forms isolated groups of machines when there is no product overlap. The figure below visualises this for the first non-overlapping case.

Uniformity preference

There is also a soft constraint that encourages uniform machine utilisation for each group of machines, and acts more like a 'preference' - you can tune its impact via uniformity_strength. The algorithm discovers the right average for each group, so you don't need to specify any target value it.

Results

Overlapping case (full product volume)

The constraints and uniformity preference are easily respected, resulting in a low loss:

First non-overlapping case (1/10th volume)

The loss is higher here since the uniformity preference is harder to meet, but performance is nonetheless good.

The discrepancy between product totals and products processed is less than 1.

Percent utilisation per machine:

[14.667, 14.667, 14.667, 7.72, 8.16, 5.98]

Group uniformity looks good. There is a wider spread in the second group:

Second non-overlapping case (1/10th volume)

Convergence quality/characteristics are similar to the preceding case.

Percent utilisation per machine: [14.034, 14.034, 11.932, 8.054, 8.099, 7.873]

There is some variation in terms of utilisation but it's relatively small:

In all cases the algorithm perfectly satisfies the machine capacity constraint (no machines > 100% capacity), which is why that loss component is zero (and doesn't show up on the log-scaled plots).

Reproducible example

Imports and supplied data

import numpy as np import torch from torch import nn from collections import defaultdict from matplotlib import pyplot as plt import networkx as nx np.set_printoptions(suppress=True) products = ['Pa', 'Pb', 'Pc', 'Pd'] machines = ['M1', 'M2', 'M3', 'M4', 'M5', 'M6'] num_products = len(products) num_machines = len(machines) # Pa Pb Pc Pd product_totals = torch.tensor([6720, 2880, 3840, 5760]).float() / 10 #Overlapping example machine_product_compatibility = torch.tensor([ #Pa Pb Pc Pd [1, 0, 1, 0], # M1: Pa, Pc [0, 1, 0, 1], # M2: Pb, Pd [0, 1, 1, 0], # M3: Pb, Pc [1, 0, 0, 1], # M4: Pa, Pd [1, 1, 0, 0], # M5: Pa, Pb [0, 0, 1, 1], # M6: Pc, Pd ]).float() #Non-overlapping, ex1 machine_product_compatibility = torch.tensor([ #Pa Pb Pc Pd [1, 0, 0, 0], # M1 [1, 0, 1, 0], # M2 [0, 0, 1, 0], # M3 [0, 0, 0, 1], # M4 [0, 1, 0, 1], # M5 [0, 1, 0, 0], # M6 ]).float() #Non-overlapping, ex2 machine_product_compatibility = torch.tensor([ #Pa Pb Pc Pd [1, 0, 0, 0], # M1 [1, 1, 0, 0], # M2 [0, 1, 0, 0], # M3 [0, 0, 0, 1], # M4 [0, 0, 1, 1], # M5 [0, 0, 1, 0], # M6 ]).float() # M1 M2 M3 M4 M5 M6 cycle_times = torch.tensor([36, 36, 36, 36, 18, 18]).float() #Seconds available per machine T_workday = 24 * 60**2 machine_capacities = torch.full((num_machines,), T_workday).float() seconds_products_matrix = cycle_times.outer(product_totals).float()

Machine groups

def get_machine_groups(machine_product_compatibility): num_machines = machine_product_compatibility.shape[0] #Build a graph where machines/nodes are connected (edges) if they share a product G = nx.Graph() G.add_nodes_from(range(num_machines)) for i in range(num_machines): for j in range(i + 1, num_machines): if machine_product_compatibility[i].dot(machine_product_compatibility[j]): G.add_edge(i, j) # Extract connected components (groups of machines) machine_groups = [np.sort(list(group)) for group in nx.connected_components(G)] # Visualise graph pos = nx.spring_layout(G, seed=0) f, ax = plt.subplots(figsize=(5, 3), layout='tight') nx.draw_networkx_nodes(G, pos, node_color='lightblue', node_size=400, ax=ax) nx.draw_networkx_edges(G, pos, width=1.5, edge_color='gray', ax=ax) nx.draw_networkx_labels(G, pos, font_size=10, font_weight='bold', ax=ax) ax.set_title('Machine relationships via shared products', fontsize=12) ax.spines[:].set_visible(False) plt.show() return machine_groups machine_groups = get_machine_groups(machine_product_compatibility) print(f'Num. of machine groups is {len(machine_groups)}:', machine_groups)

Gradient descent

# === Equality constraint === # For each product p, product_weights must sum to 100% # === Machine capacity constraints === # Each machine's utilisation can't exceed its capacity # === Uniformity constraint/preference === # Each machine group should be utilised at a similar level np.random.seed(0) torch.manual_seed(0) logits = nn.Parameter(torch.randn(num_machines, num_products)) optimiser = torch.optim.Adam([logits]) # How strongly to push for uniform utilsation across machines uniformity_strength = 1 def logits_to_weights(logits, mach_prod_compatibility): return ( logits .masked_fill(mach_prod_compatibility.eq(0), -torch.inf) .softmax(dim=0) ) losses = defaultdict(list) for _ in range(num_iter := 10_000): #Each weight is bound (0, 1), and only available for compatible pairs weights = logits_to_weights(logits, machine_product_compatibility) # Current utilisation per machine utilisation_per_mach = ( #Per machine, time spent on each product seconds_products_matrix.mul(weights) #Each machine's total time (add up all product times) .sum(dim=1) #Express as fraction of each machine's capacity .div(machine_capacities) ) #Machine utilisation must not exceed machine's # capacity (ie. normalised utilisation < 1). # Being under-capacity is ok - exlude using relu mach_overcapacity_loss = utilisation_per_mach.sub(1.).relu().square().mean() #Encourage uniformity, per machine group group_uniformity_losses = torch.stack([ #Deviation of machines from group average (utilisation_per_mach[mach_ixs] - utilisation_per_mach[mach_ixs].mean()) #Scalar MSE .square().mean() #Repeat for each group of machine for mach_ixs in machine_groups ]) group_uniformity_loss = group_uniformity_losses.mean().mul(uniformity_strength) #Total loss loss = mach_overcapacity_loss + group_uniformity_loss #Optimisation step optimiser.zero_grad() loss.backward() optimiser.step() # === Record metrics === losses['mach_overcapacity'].append(mach_overcapacity_loss.item()) losses['group_uniformity'].append(group_uniformity_loss.item()) losses['loss'].append(loss.item()) #Stop at relatively low loss if loss < 1e-7: print('Stopping at low loss', loss.item()) break # === Plot === f, ax = plt.subplots(figsize=(5, 2.8)) for name, loss_ in losses.items(): if name == 'loss': continue ax.plot(loss_, label=name) ax.plot(losses['loss'], lw=3, ls=(0, (2, 2)), label='loss') ax.legend(framealpha=0) ax.spines[['top', 'right']].set_visible(False) ax.semilogy(True) ax.set_ylim(None, 1e-2) ax.grid(axis='y', which='major', linestyle=':') ax.set( title='Loss convergence', xlabel='iteration', ylabel='total loss' ) plt.show()

Plot and report results

# === View results === # Product weights (compatible) for each machine weights = logits_to_weights(logits, machine_product_compatibility).detach() print( '\nPercentage product weights for compatible machines (rows) and products (columns):\n', weights.mul(100).numpy().round(2), sep='' ) #Total utilisation print( '\navailable time: {:,}'.format(machine_capacities.sum()), ' | utilised time: {:,}'.format(seconds_products_matrix.ravel() @ weights.ravel()) ) #Machine time per product print( '\nUtilisation time per machine (rows) per product (columns):\n', weights * seconds_products_matrix, '\n\nPercent utilisation per machine:\n', weights.mul(seconds_products_matrix).sum(dim=1) / machine_capacities * 100, sep='' ) f, ax = plt.subplots(figsize=(5, 3), layout='tight') for mach_ixs in machine_groups: ax.plot(mach_ixs, utilisation_per_mach[mach_ixs].detach(), linestyle='none', marker='d') ax.plot(mach_ixs, [utilisation_per_mach[mach_ixs].detach().mean(),]*len(mach_ixs), color='black') ax.set( xlabel='machine index', ylabel='utilisation', title='Machine utilisation and group averages' ) ax.spines[['top', 'right']].set_visible(False)

Reported to stdout:

Percentage product weights for compatible machines (rows) and products (columns): [[50.12 0. 0. 0. ] [49.88 0.57 0. 0. ] [ 0. 99.43 0. 0. ] [ 0. 0. 0. 33.56] [ 0. 0. 1.58 66.44] [ 0. 0. 98.42 0. ]] available time: 518,400.0 | utilised time: 55,319.453125 Utilisation time per machine (rows) per product (columns): tensor([[12125.4277, 0.0000, 0.0000, 0.0000], [12066.5742, 58.7491, 0.0000, 0.0000], [ 0.0000, 10309.2510, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000, 6958.8979], [ 0.0000, 0.0000, 109.3414, 6888.5508], [ 0.0000, 0.0000, 6802.6587, 0.0000]]) Percent utilisation per machine: tensor([14.0341, 14.0339, 11.9320, 8.0543, 8.0994, 7.8734])

I got 14.66 for the first non-overlapping case (using the defaults in the code, which include uniformity_strength=0.1). I would avoid increasing unifomity_strength as it can begin to impinge on more important things like weights adding up to 100%. Do you need accuracy down to the individual item? Or to a percentage, like 0.01%? — MuhammedYunus
– MuhammedYunus, Commented Sep 8 at 12:44
I've always aimed for identical utilization across all machines in a group, where that last 0.001% on the utilization percentage was identical across all machines in a group (letting excel round it), but I've had a few problems where it seemed impossible to achieve it no matter how granular I've made the weights, so that last 0.001% differed for a few machines. Not entirely sure what accuracy that makes it. About ±0.0005%? Pretty sure that's also close to a single item for an average machine. — DevilXD
– DevilXD, Commented Sep 8 at 12:57
I've been thinking about it a little. It feels like some kind of hybrid approach could make this work, where the first step would be to divide the problem into those non-overlapping groups (figure out which machines form them), then calculate the average flow for each of those groups (to get the reference point for calculating the utilization error), and then try to minimize those errors. Since the groups don't overlap, they can be solved separately, or the entire problem can be solved together - unsure which approach would be best. This complicates a lot the entire thing though. — DevilXD
– DevilXD, Commented Sep 8 at 13:00
These are two fantastic answers (and comments) that both deserve wide attention in the community !! — Robert Long
– Robert Long, Commented Sep 14 at 3:25
I came here to agree with @RobertLong - both answers aregreat and is is terrific to see someone put this much effort into helping the OP — Lynchian
– Lynchian, Commented Sep 14 at 20:48

DevilXD · Accepted Answer · 2025-09-18 21:23:45Z

It's a little bit of a long one, as I wanted to share my learning process, discovery steps and changes I've made along the way as they were happening. For a straight up solution, please see the code at the end.

The second version of the code was my starting point. Over the course of a few days, I went from making simple tuning changes and just rerunning the code, to mostly grasping what each part of the code does (not necessarily knowing 100% how it works). As I barely knew the two libraries, I had to do a lot of googling and reading documentation to understand the basics, with an occasional ChatGPT usage here and there. Chat was especially helpful in explaining certain usage mechanics via simple examples, but more on that later.

The first thing I did, was to implement the idea from my last comment at the time, made under the 2nd answer here - for the non-overlapping cases, when a weight reaches close to zero during solving, we can recompute the machine/product compatibility matrix, to let it separate the saturated machine into it's own group. This will let it arrive at it's own, lower average, than the rest of its group would normally arrive at. For this, I've added a second loop on top of the first one, put the machine group calculation logic at the beginning of the outer loop, and then added an extra exit condition to the inner loop:

zero_weight_threshold = 1e-3 while True: # Build a graph where machines/nodes are connected (edges) if they share a product G = nx.Graph() G.add_nodes_from(range(num_machines)) for n1 in range(num_machines): for n2 in range(n1 + 1, num_machines): if product_comp[n1].mul(product_comp[n2]).any(): G.add_edge(n1, n2) # Extract connected components (groups of machines) machine_groups = [np.sort(list(group)) for group in nx.connected_components(G)] while True: # solve here using the calculated machine_groups if ((weights < zero_weight_threshold) & machine_product_compatibility.bool()).any(): # Recompute the machine product compatibility using the updated weights machine_product_compatibility = (weights >= zero_weight_threshold).float() break

This was the first breakthrough, as it removed the imbalance caused by one of the machines "dragging down" the average for the whole group, and allowed the loss function to continue decreasing to <1e-14 ranges. I had to up the learning rate so that it wouldn't take forever to detect the weight reaching close to zero, but that seemed to introduce trouble in later iterations. To help this, I did two things. First off, I've noticed that the weights are really taking their time reaching the extremes of 0 and 1, and managed to track it down to the logits_to_weights function. The sigmoid function that was used to constrain the weights to (0, 1) range was a logical choice, but something that'd converge quicker was clearly needed. Fortunately, the topic of so called "activation functions" is quite known to me, due to me learning about neural networks in the past, so I've replaced it with an appropriately scaled tanh call. Keeping the single-machine groups and weight constraints of always summing up to 1 in mind, to help single-machine weights converge quicker, I've also added a small overscale, which also seemed to help reduce iterations needed:

overscale: float = 1.01 def logits_to_weights(logits): return (logits.tanh() / 2 + 0.5) * overscale

I've also experimented with a dynamic learning rate. Out of all available options in Torch, this one seemed the most suited:

optimiser = torch.optim.Adam([logits], lr=0.1) scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimiser, factor=0.95, patience=70) while True: # ... optimiser.zero_grad() loss.backward() optimiser.step() scheduler.step(loss.item()) # calling the scheduler with current loss value here # ...

The values of factor and patience, as well as the learning rate itself, can be tuned to possibly achieve quicker convergence times/iterations. The ranges I've experimented with were about 0.01-0.5 for the learning rate, 0.5-0.99 for the factor, and 10-100 for the patience. 0.95 and 70 for factor and patience seemed to be the sweet spot, that had the best average performance over the given examples. The learning rate can usually start at 0.3-0.4 to get a fast start, but too high values would result in loss oscillations. Values of 0.1-0.2 or lower seemed to be the most stable across the given examples and tests I've performed.

This solution seemed ready and was actually working perfectly on all training examples, so I tried using it on some bigger, custom problems: 11x9 and 17x6 of machines/products respectively. The second problem I chose was actually one of the harder ones I had, with highest average utilization reaching around 102% - I wanted to see what the 2nd loss term of over-utilization does to the code. The solution I got was correct for the first problem, but incorrect for the second problem - the utilization percentages ended up slightly off the average. This one boggled me for some time, and after many tests and observing the results, I've reached the conclusion that the 2nd over-utilization loss term was actually not really needed at all! It turns out that the 3rd loss term of evenly distributing the utilization is enough by itself, and the 2nd term was actually competing value-wise with the 3rd term, leading to the solution not converging as expected. Removing the over-utilization term entirely allowed me to get the exact solution I've expected to see, and everything worked from here!

I was basically ready to post the solution at this point, but I guess by sheer curiosity of research, I've stumbled upon posts talking about "parametrization". Digging deeper into this topic revealed that it's apparently a great way to specify a "values that always sum up to 1" constraint, and since that's exactly what I was already doing in this code (for the weights), it has peaked my interest. This is where ChatGPT proved to be the most useful, by showing me simple examples and explaining to me how to specify such a constraint.

This was the second breakthrough. It turns out that the logits_to_weights function can be rewritten as such:

from torch.nn.functional import softmax def logits_to_weights(logits): weights = logits.masked_fill(machine_product_compatibility == 0, -np.inf) return softmax(weights, dim=0)

It turns out, the -inf value lets the softmax function ignore the incompatible places in the compatibility matrix, letting the remaining compatible values sum up to exactly 1.0, every time, in every case. This removes the 1st weights sum term from the loss function, leaving only the uniformity one in and simplifying everything. The learning rate also had to be reduced to ~0.1, as the convergence was so quick, higher values often tended to overshoot. This simple change has improved the convergence speed and reduced the iterations required ~10 fold, while simultenously removing the possibility of the weights sum being not quite right and introducing inaccuracy in the results. Fantastic!

Some other, non-significant changes include:

machine_product_compatibility is now a bool matrix, as it better fits it's definition and purpose. All affected places in the code have been adjusted.
Simplified the num_products and num_machines definitions, so that they're inferred from the compatibility matrix size.
machine_capacities calculation now matches closer to the way it was calculated in my excel (86400 / cycle_time).
Added zero_weight_threshold, that controls when to treat a weight as "zeroed out" during solving (for compatibility matrix recalculation).
logits now starts as a copy of the compatibility matrix instead of being random, as this translates well to the distribution of the starting weights.
There's now a single, separate variable that tracks the amount of iterations during solving, even if the inner loop breaks for the outer loop to recalculate the compatibility matrix.
Added learning_rate and groups to the loss graph, to show when the groups are recalculated during solving, as well as show how the learning rate changes over time.
The masked_fill of -inf values was optimized to be called only when the compatibility matrix changes (that's why the respective call in logits_to_weights is now commented out.
The plots and prints at the very end have been adjusted, to remove information that's obsolete or no longer present.

The final solution code:

from collections import defaultdict import torch import numpy as np from torch import nn import networkx as nx from matplotlib import pyplot as plt from torch.nn.functional import softmax torch.set_printoptions(linewidth=180) np.set_printoptions(linewidth=180, suppress=True) # Overlapping example machine_product_compatibility = torch.tensor([ # Pa Pb Pc Pd [1, 0, 1, 0], # M1: Pa, Pc [0, 1, 0, 1], # M2: Pb, Pd [0, 1, 1, 0], # M3: Pb, Pc [1, 0, 0, 1], # M4: Pa, Pd [1, 1, 0, 0], # M5: Pa, Pb [0, 0, 1, 1], # M6: Pc, Pd ]).bool() # Non-overlapping, ex1 # machine_product_compatibility = torch.tensor([ # # Pa Pb Pc Pd # [1, 0, 0, 0], # M1 # [1, 0, 1, 0], # M2 # [0, 0, 1, 0], # M3 # [0, 0, 0, 1], # M4 # [0, 1, 0, 1], # M5 # [0, 1, 0, 0], # M6 # ]).bool() # Non-overlapping, ex2 # machine_product_compatibility = torch.tensor([ # # Pa Pb Pc Pd # [1, 0, 0, 0], # M1 # [1, 1, 0, 0], # M2 # [0, 1, 0, 0], # M3 # [0, 0, 0, 1], # M4 # [0, 0, 1, 1], # M5 # [0, 0, 1, 0], # M6 # ]).bool() # Pa Pb Pc Pd product_totals = torch.tensor([6720, 2880, 3840, 5760]).float() / 10 num_machines = machine_product_compatibility.size(dim=0) # M1 M2 M3 M4 M5 M6 cycle_times = torch.tensor([36, 36, 36, 36, 18, 18]).float() # Products per day per machine T_workday = 24 * 60 * 60 machine_capacities = torch.full((num_machines,), T_workday).float().div(cycle_times) def logits_to_weights(logits): # weights = logits.masked_fill(machine_product_compatibility == 0, float('-inf')) return softmax(logits, dim=0) np.random.seed(0) torch.manual_seed(0) i = 0 total_epochs = 10_000 # stop after this many epochs loss_threshold = 1e-19 # stop once loss is lower than this value zero_weight_threshold = 1e-3 # decides when to zero-out the weight, due to group splitting results = defaultdict(list) logits = nn.Parameter( machine_product_compatibility.float() .masked_fill(machine_product_compatibility == 0, -np.inf) ) optimiser = torch.optim.Adam([logits], lr=0.1) scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimiser, factor=0.95, patience=70) while True: # Build a graph where machines/nodes are connected (edges) if they share a product G = nx.Graph() G.add_nodes_from(range(num_machines)) for n1 in range(num_machines): for n2 in range(n1 + 1, num_machines): if (machine_product_compatibility[n1] & machine_product_compatibility[n2]).any(): G.add_edge(n1, n2) # Extract connected components (groups of machines) machine_groups = [np.sort(list(group)) for group in nx.connected_components(G)] print(f"Num. of machine groups is {len(machine_groups)}:", machine_groups) if i > 0: # skip first iteration results["compatibility_changed"].append(i) while True: # Each weight is bound (0, 1), and only available for compatible pairs weights = logits_to_weights(logits) # Current utilisation per machine utilisation_per_mach = ( product_totals.mul(weights) # Ratio of each product passing through the machine .sum(dim=1) # Total amount of all products passing through the machine .div(machine_capacities) # Express as fraction of each machine's capacity ) # Encourage uniformity, per machine group loss = torch.stack([ # Deviation of machines from group average (utilisation_per_mach[mach_ixs] - utilisation_per_mach[mach_ixs].mean()) .square().mean() # Scalar MSE for mach_ixs in machine_groups # Repeat for each group of machine ]).mean() # Optimisation step optimiser.zero_grad() loss.backward() optimiser.step() scheduler.step(loss.item()) # === Record metrics === results["learning_rate"].append(optimiser.param_groups[0]['lr']) results["loss"].append(loss.item()) i += 1 # Stop on minimal loss or total epochs reached # Loop to recalulate the compatibility matrix using current weights if loss < loss_threshold: break elif i >= total_epochs: break elif ((weights < zero_weight_threshold) & machine_product_compatibility).any(): # Recompute the machine product compatibility using the updated weights machine_product_compatibility = (weights >= zero_weight_threshold) with torch.no_grad(): logits.masked_fill_(machine_product_compatibility == 0, -np.inf) break if loss < loss_threshold: print("Stopping at near-zero loss:", loss.item()) break elif i >= total_epochs: print("Stopping at max epochs:", i) break # === Plot === f, ax = plt.subplots(figsize=(5, 2.8)) for name, loss_ in results.items(): if name in ("compatibility_changed"): continue ax.plot(loss_, label=name) if results["compatibility_changed"]: for xgroup in results["compatibility_changed"]: vl = ax.axvline(xgroup, color="black", ymin=0.8) vl.set_label("compatibility_changed") ax.legend(framealpha=0) ax.spines[["top", "right"]].set_visible(False) ax.semilogy(True) ax.grid(axis='y', which="major", linestyle=":") ax.set( title="Loss convergence", xlabel="iteration", ylabel="total loss" ) # Visualise machine relationships graph pos = nx.spring_layout(G, seed=1) f, ax = plt.subplots(figsize=(5, 3), layout="tight") nx.draw_networkx_nodes(G, pos, node_color="lightblue", node_size=400, ax=ax) nx.draw_networkx_edges(G, pos, width=1.5, edge_color="gray", ax=ax) nx.draw_networkx_labels(G, pos, font_size=10, font_weight="bold", ax=ax) ax.set_title("Machine relationships via shared products", fontsize=12) ax.spines[:].set_visible(False) # Machine groups and their averages f, ax = plt.subplots(figsize=(5, 3), layout="tight") for mach_ixs in machine_groups: ax.plot(mach_ixs, utilisation_per_mach[mach_ixs].detach(), linestyle="none", marker='d') ax.plot( mach_ixs, [utilisation_per_mach[mach_ixs].detach().mean(),] * len(mach_ixs), color="black", ) ax.set( xlabel="machine index", ylabel="utilisation", title="Machine utilisation and group averages" ) ax.spines[["top", "right"]].set_visible(False) # Product weights (compatible) for each machine print( "\nPercentage product weights for compatible machines (rows) and products (columns):\n", weights.detach().numpy().round(7), sep='' ) # Machine time per product print( "\n\nPercent utilisation per machine:\n", utilisation_per_mach.detach().numpy().round(5), sep='' ) print("Solution found after", i, "iterations") plt.show()

All credit for the original code goes to @MuhammedYunus, of course.

Really interesting read, thanks for sharing. The softmax trick to enforce weights summing to 1 is elegant; it both strictly enforces weights summing to 1 and also simplifies the code. I am intrigued that the overcapacity loss was hampering performance. Not sure why that is - it seems to rear its head in the more complex cases you tried (in the simper cases it was exactly zero). Glad you have homed in on a solution that ticks all the boxes - nice work. — MuhammedYunus
– MuhammedYunus, Commented Sep 19 at 14:02
The overcapacity term would work if all machines would have the same cycle time, but unfortunately that's not the case. If the cycle times differs, a change of eg. +1% of utilization on one machine, corresponds to a decrease of only -0.5% on another machine. When the solution average ends up being greater than 100%, the "zone" between 100% capacity, and solution average capacity (eg. 110%), the lower portion of it (closer to 100%) ends up having lower loss value when the utilizations end up being non-uniform: i.imgur.com/fVDa1zv.png — DevilXD
– DevilXD, Commented Sep 20 at 14:07
On the other hand, passing the middle of the zone (going above 105%) yields the opposite effect, where the more uniform case ends up having the lower loss: i.imgur.com/FD9Wrm0.png Since the calculation has to pass through the first zone during solving, it gets stuck in that local minimum, and the solution ends up off-target, according to each machine's cycle time - lower time ends up getting higher utilization, and all utilization values are off the expected average. I realized that removing the overcapacity term gets rid of that zone, and the uniformity term is enough already. — DevilXD
– DevilXD, Commented Sep 20 at 14:11
I wonder whether it's an artefact of how overcapacity loss was implemented. The loss is suddenly introduced after the algorithm ends up in a bad place. It plays no role when the machines have valid capacities, and only kicks in when a machine's capacity exceeds 100%. Perhaps at that point it's too late to effectively correct for. A better approach might be to make it continuous, increasing as we approach the overcapacity threshold. Might steer the algorithm better overall, but could introduce issues of its own. You've done fine without it so may not be worth investigating. — MuhammedYunus
– MuhammedYunus, Commented Sep 20 at 14:32
It clearly cannot function together with uniformity, otherwise it keeps pulling towards 100%, while uniformity keeps pulling towards eg. 110%, and whichever machine ends up being closer to one of the two values wins. Below 100% average, uniformity will already pull down any machine going above 100% (meaning there's no need for overcapacity), and above 100% average, uniformity is also all we need to balance things out (meaning no overcapacity needed as well), so while the sole idea is sound, uniformity is just better to be used in all cases. — DevilXD
– DevilXD, Commented Sep 20 at 22:38

Stack Exchange Network

Product distribution between a set of machines

3 Answers 3

Solution walk-through

Supplied data

Framing the problem

Defining constraints

1. The weights are like percentages, so they must sum to 100% for each product

2. The usage of each machine should be uniform

3. For incompatible product-machine pairs, the weight must be zero

4. Each weight must be bounded between $(0, 1)$

Solving for `product_weights`

Results for this example

Non-overlapping case

Reproducible example

Concepts

Key constraints

Machine grouping

Uniformity preference

Results

Overlapping case (full product volume)

First non-overlapping case (1/10th volume)

Second non-overlapping case (1/10th volume)

Reproducible example

Imports and supplied data

Machine groups

Gradient descent

Plot and report results

Hot Network Questions

Product distribution between a set of machines

3 Answers 3

Solution walk-through

Supplied data

Framing the problem

Defining constraints

1. The weights are like percentages, so they must sum to 100% for each product

2. The usage of each machine should be uniform

3. For incompatible product-machine pairs, the weight must be zero

4. Each weight must be bounded between $(0, 1)$

Solving for product_weights

Results for this example

Non-overlapping case

Reproducible example

Concepts

Key constraints

Machine grouping

Uniformity preference

Results

Overlapping case (full product volume)

First non-overlapping case (1/10th volume)

Second non-overlapping case (1/10th volume)

Reproducible example

Imports and supplied data

Machine groups

Gradient descent

Plot and report results

Related

Hot Network Questions

Solving for `product_weights`