Questions tagged [overfitting]
Modeling error (especially sampling error) instead of replicable and informative relationships among variables improves model fit statistics, but reduces parsimony, and worsens explanatory and predictive validity.
1,002 questions
1 vote
0 answers
47 views
Potential CNN Overfitting Due to Limited Training Data
Neural Network Beginner here. I am currently implementing a CNN on PyTorch for recognizing Japanese handwritten letters, which has 46 classes of outputs. I found a dataset on Kaggle https://www.kaggle....
0 votes
0 answers
51 views
Generalization Error PCA (with closed formula) versus Ridge
There is something I have an intuition on but my numerical toy examples do not confirm, and I really want to understand where is my mistake. I suppose that I have a random vector $X = (X_1, \cdots, ...
3 votes
3 answers
294 views
How might softmax cause overfit in a neural model, even treated from a Bayesian perspective?
The title is perhaps purposely provocative, but still reflects my ignorance. I am trying to understand carefully why, despite a very nice Bayesian interpretation, softmax might overfit, since I've ...
1 vote
0 answers
28 views
Inference validity of an ordered logit model with only 50 observations
How accurate are the estatimates of an order logit model with only 51 observations? Here is my stata output from the model:
0 votes
1 answer
52 views
Why do overfitted models in finite mixture regression sometimes have the smallest BIC despite the true number of components being selected frequently?
Learning about EM algorithms and finite mixture models and I've run into a particularly unintuitive problem. I'm trying to fit a finite mixture regression model on simulated data, where the true ...
1 vote
0 answers
60 views
Overfitting problem in classification CNN
So I have a school project which is to train a CNN with our own architecture to be able to classify marine mammals with a minimum accuracy of 0.82 I have been trying a lot of things and different way ...
2 votes
0 answers
80 views
Number of features selection using AUC
Can AUC be used for model selection, and how can the excessive number of features/parameters be penalized in this case? In frequentist framework we have various model selection criteria, like AIC, BIC,...
1 vote
1 answer
78 views
Gridsearch results vs learning curve
I am using a GridSearchCV to optimize some hyper parameters on a xgboost model. However, although the logloss (metric I am optimizing for) seems alright according to domain knowledge, the learning ...
1 vote
1 answer
118 views
How to reduce overfitting for a randomforest model even when cross validation is implemented?
I'm working on fitting a random forest model using the caret library in R with a repeated cross-validation design to select hyperparameters. I've also experimented with adjusting the number of trees (...
1 vote
0 answers
54 views
Is there an one to one relationship between high bias and underfitting, and with high variance and overfitting?
Assume you have training data $(x_1,y_1), \ldots, (x_n,y_n)$ and a relationship $y_i=f(x_i)+\epsilon_i$, where $\epsilon$ is a random variable. Assume you approximate $f$ with $\hat{f}$ using the ...
2 votes
1 answer
202 views
How to identify problems with mgcv:gam(y ~ s(x) + s(x, fac, bs="sz"))? [closed]
This is sort-of a follow-up from my last question, except purely based on curiosity. I found different versions of similar bs="sz" models in ...
1 vote
0 answers
57 views
The use of cross-validation and a hold-out set
I've been thinking about the use of cross-validation and hold-out sets and I don't really see the use of a randomly selected hold-out test set. I have to say, though, that when the hold-out is not ...
4 votes
1 answer
111 views
Smooth AIC selection
Suppose I have a family of $N$ models for the same data, indexed by $n\in\{1,\dots,N\}$. And suppose that model $n\in\{1,\dots,N\}$ has log-likelihood given by: $$L(X_n \theta_n),$$ where $L:\mathbb{R}...
0 votes
0 answers
90 views
Reducing MLP overfitting for feature importance
I am training an MLP on a dataset with the number of features >> number of samples. For certain reasons, MLPs with at least one hidden layer is the only architecture I am considering. ...
1 vote
0 answers
50 views
Model Performance Varying Greatly
I have built an XGBoost model that performs rather weirdly across months... I trained the model on a heavily imbalanced dataset (1:40 000), which I undersampled to (1:500). The model performance (...