I demonstrate this approach in my answer:
Create a training set
- Randomly sample parameter values and initial values (or sample ranges relevant to your task)
- Solve for the trajectories $x(t)$ and $y(t)$ of each parameter set using an ODE solver
- Each $\large[x$ and $y$ trajectory$,$ 4 parameters$\large]$ pair becomes a sample in the training set.
Train a multi-regression model to predict the 4 parameters for each 2D sequence. Since the data is strongly-sequential, a sequential model would be a good candidate (e.g. an LSTM/GRU, 1D CNNs).
- The input is a batch of 2D sequences (batch of $x(t)$ and $y(t)$), and the target is the corresponding batch of 4D parameter values.
Using an LSTM, I get 10-20% prediction error across the four parameters. I trained it on a relatively wide range of parameter and initial values, and would expect performance to be better if you restrict the range. You could also tune the model further and/or train for longer.

Model size: 2304 trainable parameters epoch 1/100 [TRN loss: 0.213 | mape [54 56 55 57]] [VAL loss: 0.210 | mape [57 55 55 54]] ... epoch 100/100 [TRN loss: 0.023 | mape [12 17 17 11]] [VAL loss: 0.025 | mape [12 18 19 11]]
I sample parameter sets randomly, where each parameter is drawn from the range [0.1, 0.9), and the initial values are also randomly selected between [1, 50). One such sample:

If you use different ranges across the parameters, it helps to use loss weighting in order to give the smaller parameters a chance to improve (you could alternatively normalise the targets to get them on the same scale).
I first use SciPy's solve_ivp() to solve for the $x(t), y(t)$ trajectories. Then I train a one-layer LSTM to reduce the MSE between the predictions and the targets. I enforce positive values for each parameter using a Softplus() layer.
The training dataset comprised around 3800 sequences, where each sequence covered the time-span (0, 100) in 200 steps. Other training hyperparameters were: batch size of 64, 100 epochs, single-layer LSTM with a hidden size of 64, and the NAdam optimizer.
I tracked the MAPE for each parameter. It seems like $\beta$ and $\delta$ were harder to get right because their MAPEs were closer to 20%, whereas the $\alpha$ and $\gamma$ parameters were around 10%. The final validation loss was 0.025.
Reproducible example
Synthesise data and split:
import numpy as np from matplotlib import pyplot as plt from scipy.integrate import solve_ivp np.random.seed(0) # # Data for testing # n_sequences = 5_000 n_timesteps = 200 t_span = (0, 100) alpha_span = (0.1, 0.9) beta_span = (0.1, 0.9) delta_span = (0.1, 0.9) gamma_span = (0.1, 0.9) x0_span = (1, 50) y0_span = (1, 50) #Sample `n_sequences` parameter sets from above parameters alphas, betas, deltas, gammas = [ np.random.uniform(*span, size=n_sequences) for span in [alpha_span, beta_span, delta_span, gamma_span] ] parameters = np.column_stack([alphas, betas, deltas, gammas]) initial_values = np.column_stack([ np.random.randint(*x0_span, size=n_sequences), np.random.randint(*y0_span, size=n_sequences) ]) # # Solve for the x/y trajectories of each set of parameters # def solve_lotka_volterra(parameters, xy0, t_span, n_timesteps): def lokta_volterra(t, xy, alpha, beta, delta, gamma): x, y = xy.T dx_dt = alpha*x - beta*x*y dy_dt = delta*x*y - gamma*y return np.column_stack([dx_dt, dy_dt]) #\def lotka_voleterra t_eval = np.linspace(*t_span, num=n_timesteps) t_span = t_eval.min(), t_eval.max() solver_results = solve_ivp( lokta_volterra, t_span, xy0, t_eval=t_eval, args=parameters ) if not solver_results['success']: print('[!] solve_ivp failed |', solver_results['message']) return solver_results['y'].T #returns (n_timesteps, 2) #For each parameter set, solve for the trajectory shaped (n_timesteps, 2) # Stack to (n_sequences, n_timesteps, 2) sequences = np.stack( [solve_lotka_volterra(params_i, xy0_i, t_span, n_timesteps) for params_i, xy0_i in zip(parameters, initial_values) ], axis=0 ) # # Train-validation split # from sklearn.model_selection import train_test_split splits = train_test_split(sequences, parameters, train_size=3/4) sequences_trn, sequences_val, parameters_trn, parameters_val = splits
Visualise a randomly-selected sample:
# #Visualise a random sample from train-val set # ix = np.random.randint(n_sequences) x, y = sequences[ix].T params = parameters[ix] x, y = sequences[np.random.randint(n_sequences)].T t = np.linspace(*t_span, n_timesteps) plt.suptitle(r'$\alpha, \beta, \delta, \gamma=$' + str(params.round(3))) plt.subplot(121) plt.plot(t, x, t, y) plt.xlabel('t') plt.ylabel('x, y') plt.subplot(122) plt.scatter(x, y, marker='.', c=t, s=8, edgecolors='none') plt.xlabel('x') plt.ylabel('y') plt.gcf().set_size_inches(9, 4) plt.tight_layout()
To PyTorch tensors, and wrap in a batch loader:
import torch from torch import nn from torch.utils.data import TensorDataset, DataLoader n_features = sequences.shape[-1] #x, y output_size = parameters.shape[1] #alpha, beta, delta, gamma # # Scale dataset # from sklearn.preprocessing import StandardScaler, MinMaxScaler scaler = StandardScaler().fit(sequences_trn.reshape(-1, n_features)) X_trn = scaler.transform(sequences_trn.reshape(-1, n_features)).reshape(sequences_trn.shape) X_val = scaler.transform(sequences_val.reshape(-1, n_features)).reshape(sequences_val.shape) # # To tensor dataset # trn_dataset = TensorDataset(torch.tensor(X_trn).float(), torch.tensor(parameters_trn).float()) val_dataset = TensorDataset(torch.tensor(X_val).float(), torch.tensor(parameters_val).float()) #Batchify batch_size = 64 trn_loader = DataLoader(trn_dataset, batch_size=batch_size, shuffle=True) val_loader = DataLoader(val_dataset, batch_size=batch_size)
Define and train a model, reporting progress:
torch.manual_seed(0) np.random.seed(0) class LambdaLayer (nn.Module): def __init__(self, func): super().__init__() self.func = func def forward(self, x): return self.func(x) model = nn.Sequential( #B L C nn.LSTM(input_size=n_features, hidden_size=64, num_layers=1, batch_first=True, proj_size=output_size), #output: B L D*out, h_n: D*layers B out, c_n: D*layers B cell LambdaLayer(lambda output_hncn: output_hncn[0][:, -1]), #Params must be positive nn.Softplus(), ) print( 'Model size:', sum(p.numel() for p in model.parameters() if p.requires_grad), 'trainable parameters' ) optimizer = torch.optim.NAdam(model.parameters()) n_epochs = 100 #Balance losses wrt largest target target_weights = torch.tensor( parameters_trn.mean(axis=0).max() / parameters_trn.mean(axis=0) ).float().ravel() # target_weights = torch.tensor([1, 1, 1, 1]) #no balancing @torch.no_grad() def calc_metrics(model, loader, target_weights=None): model.eval() if target_weights is None: target_weights = torch.ones(output_size) cum_loss = 0 maes = [] mapes = [] for X_mb, target_mb in loader: preds = model(X_mb) cum_loss += preds.sub(target_mb).pow(2).mul(target_weights).sum() mae_pertarget = preds.sub(target_mb).abs().mean(dim=0) mape_pertarget = mae_pertarget.div(target_mb).mul(100) maes.append(mae_pertarget) mapes.append(mape_pertarget) loss = cum_loss / len(loader.dataset) maes = np.row_stack(maes) mapes = np.row_stack(mapes) return loss, maes.mean(axis=0), mapes.mean(axis=0) # # Training loop # for epoch in range(n_epochs): model.train() for (X_mb, target_mb) in trn_loader: preds = model(X_mb) loss = preds.sub(target_mb).pow(2).mul(target_weights).mean() optimizer.zero_grad() loss.backward() optimizer.step() #/end of epoch # if (epoch % 5) != 0: # continue model.eval() with torch.no_grad(): trn_loss, _, trn_mapes = calc_metrics(model, trn_loader, target_weights) val_loss, _, val_mapes = calc_metrics(model, val_loader, target_weights) print( f'epoch {epoch + 1:2d}/{n_epochs}', f'[TRN loss: {trn_loss:6.3f} | mape {trn_mapes.round()}]', f'[VAL loss: {val_loss:6.3f} | mape {val_mapes.round()}]', )
Visualise targets and predicted values:
# # Visualise predicted vs target parameters, for each parameter # model.eval() with torch.no_grad(): trn_preds = torch.concatenate([model(X_mb) for X_mb, _ in trn_loader], dim=0).numpy() val_preds = torch.concatenate([model(X_mb) for X_mb, _ in val_loader], dim=0).numpy() trn_loss, trn_maes, trn_mapes = calc_metrics(model, trn_loader, target_weights) val_loss, val_maes, val_mapes = calc_metrics(model, val_loader, target_weights) viz_val = True if viz_val: use_preds = val_preds use_mapes = val_mapes else: use_preds = trn_preds use_mapes = trn_mapes pred_alpha, pred_beta, pred_delta, pred_gamma = use_preds.T f, axs = plt.subplots(ncols=output_size, figsize=(12, 3.5), layout='tight') for ax, pred, target, name, mape in zip( axs, use_preds.T, parameters_val.T, r'$\alpha$ $\beta$ $\delta$ $\gamma$'.split(), use_mapes ): ax.scatter(target, pred, marker='.', alpha=0.3, edgecolors='none') ax.set_title(f'{name}\nMAPE={mape:.0f}%') ax.set(xlabel='target', ylabel='prediction') ax.axline([0, 0], slope=1, linestyle='--', color='black') axs[0].plot([np.nan], [np.nan], linestyle='--', color='black', label='line of identity') f.legend(loc='lower left', shadow=True, bbox_to_anchor=(0.05, -0.1)) f.suptitle(f'Predicted vs target parameters for {"validation" if viz_val else "train"} set', weight='bold');