Background
I am building a CNN to categorize cytometric cell data into healthy and diseased groups. The architecture looks as follows: 3 Convolutional layers followed by average pooling followed by 3 fully connected layers. I use a LeakyReLU activation function as this yielded slightly better results than a regular ReLU. I am also using cross entropy loss and an ASDG optimizer.
My dataset has 24 features and a total of 172000 samples which should be more than enough. I use a train-test-validate split. The validation split is used during training to get validation losses and accuracies while the test set is used after training is completed to avoid any kind of data leakage. Models like gradient boosting can, with hyper parameter tuning, get accuracies of 95%+ on this dataset and literature says that a CNN should perform the best for this type of data.
Regularization used:
I introduced regularization techniques such as batch normalization & dropout layers to try and improve the performance of my model since I am having a lot of trouble increasing it's accuracy above 90%. I introduced batch normalization after the activation function of each convolutional layer and dropout after the first two fully connected layers. No dropout after the last fcc layer since this is the output layer.
Result on performance:
With these regularization techniques, my performance on the test set is slightly better, but I only see an improvement from 89.6% to 89.9% accuracy which is a very marginal improvement. It does get to a high accuracy faster which is also good I guess, but doesn't really help with getting a higher accuracy in this case. However by using these techniques, the train loss and accuracy gets quite a bit lower than without regularization. I use 0.25 for the first dropout and 0.5 for the second. If I use a higher dropout like 0.9 than it causes the model to perform way better on the validation dataset than the train dataset (owing to the fact that during evaluation, dropout layers are disabled). However the performance on the test dataset is still about the same.
Why is this strange:
I find this strange since as far as I understand, these techniques are meant to be used such that the model generalizes better and in that case I would guess that a model would over-fit less.
Can anyone help me understand what this result means, how it relates to my model and how I can possibly use it to help actually improving the accuracy significantly? Other suggestions of improving accuracy are also welcome of course.
Below you can find loss curves and the model architecture (implemented using pytorch).
!! note the test curves in these images are mislabeled and don't use the test set but the validation set !!
Without regularisation:
With regularisation:
Model architecture
class CytometryCNN(nn.Module): def __init__(self, N, n_features): super(CytometryCNN, self).__init__() conv1_out = int(n_features) conv2_out = int(conv1_out) conv3_out = int(conv1_out) # Convolutional layers self.conv1 = nn.Conv1d(1, conv1_out, kernel_size=3, stride=1, padding=1) self.conv2 = nn.Conv1d(conv1_out, conv2_out, kernel_size=3, stride=1, padding=1) self.conv3 = nn.Conv1d(conv2_out, conv3_out, kernel_size=3, stride=1, padding=1) # batch normalization layers self.bn1 = nn.BatchNorm1d(conv1_out) self.bn2 = nn.BatchNorm1d(conv2_out) self.bn3 = nn.BatchNorm1d(conv3_out) self.relu = nn.LeakyReLU() self.pool = nn.AvgPool1d(kernel_size=2, stride=2) # fully connected layers fc1_in = conv3_out * int(n_features / 2) fc1_out = int(fc1_in / 3) self.fc1 = nn.Linear(fc1_in, fc1_out) fc2_out = int(fc1_out / 2) self.fc2 = nn.Linear(fc1_out, fc2_out) self.fc3 = nn.Linear(fc2_out, N) self.dropout1 = nn.Dropout(0.5) self.dropout2 = nn.Dropout(0.9) def forward(self, x): # Add a channel dimension shape (batch_size, n_features) -> (batch_size, 1, n_features) x = x.unsqueeze(1) # First convolutional layer x = self.conv1(x) x = self.relu(x) x = self.bn1(x) # Second convolutional layer x = self.conv2(x) x = self.relu(x) x = self.bn2(x) # Third convolutional layer x = self.conv3(x) x = self.relu(x) x = self.bn3(x) x = self.pool(x) x = x.view(x.size(0), -1) # First fully connected layer x = self.fc1(x) x = self.dropout1(x) x = self.relu(x) # Second fully connected layer x = self.fc2(x) x = self.dropout2(x) x = self.relu(x) # output layer x = self.fc3(x) return x 
