What does it mean if a neural networks starts overfitting more after applying regularisation techniques

Question

Background

I am building a CNN to categorize cytometric cell data into healthy and diseased groups. The architecture looks as follows: 3 Convolutional layers followed by average pooling followed by 3 fully connected layers. I use a LeakyReLU activation function as this yielded slightly better results than a regular ReLU. I am also using cross entropy loss and an ASDG optimizer.

My dataset has 24 features and a total of 172000 samples which should be more than enough. I use a train-test-validate split. The validation split is used during training to get validation losses and accuracies while the test set is used after training is completed to avoid any kind of data leakage. Models like gradient boosting can, with hyper parameter tuning, get accuracies of 95%+ on this dataset and literature says that a CNN should perform the best for this type of data.

Regularization used:

I introduced regularization techniques such as batch normalization & dropout layers to try and improve the performance of my model since I am having a lot of trouble increasing it's accuracy above 90%. I introduced batch normalization after the activation function of each convolutional layer and dropout after the first two fully connected layers. No dropout after the last fcc layer since this is the output layer.

Result on performance:

With these regularization techniques, my performance on the test set is slightly better, but I only see an improvement from 89.6% to 89.9% accuracy which is a very marginal improvement. It does get to a high accuracy faster which is also good I guess, but doesn't really help with getting a higher accuracy in this case. However by using these techniques, the train loss and accuracy gets quite a bit lower than without regularization. I use 0.25 for the first dropout and 0.5 for the second. If I use a higher dropout like 0.9 than it causes the model to perform way better on the validation dataset than the train dataset (owing to the fact that during evaluation, dropout layers are disabled). However the performance on the test dataset is still about the same.

Why is this strange:

I find this strange since as far as I understand, these techniques are meant to be used such that the model generalizes better and in that case I would guess that a model would over-fit less.

Can anyone help me understand what this result means, how it relates to my model and how I can possibly use it to help actually improving the accuracy significantly? Other suggestions of improving accuracy are also welcome of course.

Below you can find loss curves and the model architecture (implemented using pytorch).

!! note the test curves in these images are mislabeled and don't use the test set but the validation set !!

Without regularisation:

With regularisation:

Model architecture

class CytometryCNN(nn.Module): def __init__(self, N, n_features): super(CytometryCNN, self).__init__() conv1_out = int(n_features) conv2_out = int(conv1_out) conv3_out = int(conv1_out) # Convolutional layers self.conv1 = nn.Conv1d(1, conv1_out, kernel_size=3, stride=1, padding=1) self.conv2 = nn.Conv1d(conv1_out, conv2_out, kernel_size=3, stride=1, padding=1) self.conv3 = nn.Conv1d(conv2_out, conv3_out, kernel_size=3, stride=1, padding=1) # batch normalization layers self.bn1 = nn.BatchNorm1d(conv1_out) self.bn2 = nn.BatchNorm1d(conv2_out) self.bn3 = nn.BatchNorm1d(conv3_out) self.relu = nn.LeakyReLU() self.pool = nn.AvgPool1d(kernel_size=2, stride=2) # fully connected layers fc1_in = conv3_out * int(n_features / 2) fc1_out = int(fc1_in / 3) self.fc1 = nn.Linear(fc1_in, fc1_out) fc2_out = int(fc1_out / 2) self.fc2 = nn.Linear(fc1_out, fc2_out) self.fc3 = nn.Linear(fc2_out, N) self.dropout1 = nn.Dropout(0.5) self.dropout2 = nn.Dropout(0.9) def forward(self, x): # Add a channel dimension shape (batch_size, n_features) -> (batch_size, 1, n_features) x = x.unsqueeze(1) # First convolutional layer x = self.conv1(x) x = self.relu(x) x = self.bn1(x) # Second convolutional layer x = self.conv2(x) x = self.relu(x) x = self.bn2(x) # Third convolutional layer x = self.conv3(x) x = self.relu(x) x = self.bn3(x) x = self.pool(x) x = x.view(x.size(0), -1) # First fully connected layer x = self.fc1(x) x = self.dropout1(x) x = self.relu(x) # Second fully connected layer x = self.fc2(x) x = self.dropout2(x) x = self.relu(x) # output layer x = self.fc3(x) return x

Hi Viktor, just out of curiosity. Did I get it right, that you have just 24 features and use a CNN? Are these features semantically similar? — jottbe
– jottbe, Commented Dec 19, 2023 at 19:36
This doesn't really sound like it is seriously overfitting. If you haven't already tried, you can also try to use cross validation just to make sure, you weren't just unlucky in splitting your data. Because theoretically if you have some quite odd cases just in one of the datasets, your result could look worse than it actually is. — jottbe
– jottbe, Commented Dec 19, 2023 at 19:44

Rathod · Accepted Answer · 2023-12-20 23:29:07Z

The results seems alright. Your training accuracy could be 99% if trained on enough epochs but it does not mean it is a real indicator on how well it will do on unseen data. Regularization bridges that gap and trains network in such a way that difference between training and test accuracy is reduced. Even though it reduces accuracy of train data, it is generalizing and optimizing in such a way that it does well on unseen data. DO NOT FOCUS ON IMPROVING JUST THE TRAIN ACCURACY!

Some of the things you could do to improve the accuracy:

hyperparameter tuning(Such as dropout, learning rate, etc)
increasing or decreasing the number of layers. More layer would mean more learning capacity but since NN are black box models we don't know how the results are going to be (Number of layers could be used as a hyperparameter for tuning)
Try using a different architecture which is SOTA(state of the art) such as ViT(Vision Transformers), ConvNext etc for obtaining best results. Simple model is good as long as your goal is achieved.

Stack Exchange Network

What does it mean if a neural networks starts overfitting more after applying regularisation techniques

1 Answer 1

Hot Network Questions

What does it mean if a neural networks starts overfitting more after applying regularisation techniques

1 Answer 1

Related

Hot Network Questions