The figures below depict validation and training curves for metrics (top row, the lower the better) and losses (bottom row). The last column depicts aggregated metrics/losses from the first two columns.
Validation curves on metrics plots are flat from the first training epoch onward, despite a sufficiently large dataset and some regularization. This is where my first confusion is.
Another confusion is the loss on the second plot, which increases for validation, while the metrics for the same part of the predictionvalidation metric stays flat, as depicted onin the figure directly above.
Qualitative results tell that the model is underfitted, converging at a local minimum despite sufficient capacity (43M parameters in total).
Could you help me understand these learning curves? It doesn't look like a plain overfitting.
