The "ubiquity" of machine learning model superiority over regular linear models is a myth. Often they can provide more predictive power, but this is context-dependent, they come at the costs of an extreme amount of required data to beat typical regression, and there is an extreme time sink related to things like data collection and wrangling. Despite recent popularity, AI has it's own issues. A recent pre-print shows that these models are going to be more and more susceptible to degeneration without sophisticated watermarking of data. So to summarize, ML and AI are excellent tools, but they're not silver bullets.
We can think of this in a very simple case with the following data, where we fit a typical OLS regression in R and a smooth spline to the same data. As you may already be aware, splines offer a lot of flexibility in fitting (specifically for nonlinear patterns) and are relatively painless in simple cases to implement. In theory, it should do a better job of fitting data.
#### Sim and Fit Data #### set.seed(123) x <- rnorm(100) y <- (2*x) + rnorm(100) fit1 <- lm(y ~ x) fit2 <- smooth.spline(x,y) #### Plot Fits #### par(mfrow=c(1,2)) plot(x,y,main="OLS Regression") abline(fit1,col="red") plot(x,y,main="Spline Regression") lines(fit2,col="blue")
However, we can see quite clearly from the plots that there is essentially no difference in the fitting because the data is pretty linear and thus implementing splines isn't the necessary good we had hoped for.

I specifically highlight this case because you mentioned the following:
What's a bit strange to me is why is it so difficult to beat linear regression, I would think adding a small nonlinearity to a baseline linear model should at least help a little.
Preceded by this:
They all perform about the same as linear regression if not worse.
Overfitting with a nonlinear regression can lead to pretty disastrously poor fits, so your findings are not surprising. Consider this badly fit polynomial regression, which tries to trace as many points in the plot as possible.
#### Extreme Polynomial #### fit3 <- lm(y ~ poly(x,20)) par(mfrow=c(1,1)) plot(x,y,main="Polynomial Regression") newdata <- data.frame(x = seq(min(x),max(x),length.out=200)) pred <- predict(fit3,newdata = newdata) lines(newdata$x,pred,col="red")
We now have a model that makes no sense and has extremely poor predictive power:

Scaling this up by dimension doesn't change the underlying principle if there are several features like your data, and this is just as true in for machine learning as it is vanilla regression. The question then becomes, as pointed out in the comments, why one should even bother. If we can very clearly fit the data well with simpler methods, is it worth expending time and effort on? One of the nice things linear regression will almost always have over ML is that the fit is far more interpretable and actionable. Sacrificing that for a fancy CNN isn't always ideal if you can get away with the "old" methods.