5
$\begingroup$

I am learning machine learning by myself. I am applying logistic regression to Weather Forecast dataset from Kaggle Weather_data. The goal is to predict Rain according to the given features and the dataset is modertely imbalaced since it has 87.4% in the no-rain class and 12.6% in the rain class. I plotted the accuracy for training and test to look for the best value of the hyperparameter C.

First, I tried the standard (without weight) logistic regression, and after that as a curiosity I tried the logistic regression with class_weight= "balanced". Here are my plots in both cases.

Standard Logistic Regression Weighted Logistic Regression

About the plot for the weighted regression where the curves for training and test coincide my questions are:

What can be inferred from this plot? How bad is that the curves coincide? Do I have underfitting?

Any comments will be of great value. Thanks.

$\endgroup$

3 Answers 3

7
$\begingroup$

Actually accuracy is not a good metric when you have class imbalance as a very naive classifier which always predicts the majority class would get a pretty good score of 0.88 here. If the cost for FN and FP are similar, better use something like F1 for example.

I don't see overfitting but rather underfitting (even if it is true that the gap slightly increase with higher c). But it seems to have reach a plateau at a score that doesn't look better than for the naive classifier. Hopefully F1 would get better, but in reality Logistic Regression doesn't seem to work very well on your data (maybe try to tune other hyper-parameters to improve the score). Probably you would get better results with RandomForest and XGBoost.

You could also plot the learning curve (score function of the quantity of data) to see if your model is overfitting.

$\endgroup$
6
$\begingroup$

$C$ is a regularisation parameter, where smaller values tend to limit the model's capacity, and larger values allow the model to follow and fit the data more freely.

The plot you included traces out $C$ over five orders of magnitude, starting from a relatively constrained model at $C=0.001$, up to a model that can deal with more complex patterns.

The learning curve of the lower plot exhibits three regimes over that span.

Underfitting: From $C=0.001$ to $C=1$, the model's train and validation accuracies have these properties: they are relatively low, are increasing with $C$, and they follow each other closely. These are typical traits of an underfitting model, where it's not scoring highly, is making good use of the extra capacity you give it, and where it does no better on the training data than on unseen validation data (like it is hitting a ceiling in performance).

Balanced: At $C\approx2$, the validation accuracy reaches its highest value. Before this point, it was still picking up useful patterns that translated well to unseen data. After this point, it starts memorising the training data and degrading in generalisation performance.

Overfitting: As $C$ is increased further, the additional capacity is being used to over-adapt to the training data in a way that isn't useful for non-training data (hence the validation score drops). It scores well on the training data and could eventually reach 100%, but we care more about general performance (validation score) rather than merely the particular samples of the training set.

Do I have underfitting?

The plot represents lots of models rather than just a single one - each value of $C$ results in a different model (a model with a different capacity).

Depending on which $C$ you choose (which point along that curve), you end up with a different model along the underfitting-overfitting continuum. If you use a small $C$ you'll have an underfitting model, whereas a value of $C$ that is too large will give you an overfitting model.


Relevant posts: bias and variance, how they relate to bagging and pasting.

$\endgroup$
5
$\begingroup$

Something you didn't mention was whether you are making predictions on the same objects (records) or data that you are running regression on. I suspect that if you partition the data into training and testing, then a different result will emerge regarding the two approaches.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.