Return to Answer

deleted 1758 characters in body

Source Link

edited Jun 29, 2018 at 7:05

DeltaIV

18.6k
7
81
124

define the NN architecture (how many layers, which kind of layers, the connections among layers, the activation functions, etc.)
read data from some source (the Internet, a database, a set of local files, etc.), have a look at a few samples (to make sure the import has gone well) and perform data cleaning if/when needed. This step is not as trivial as people usually assume it to be. The reason is that for DNNs, we usually deal with gigantic data sets, several orders of magnitude larger than what we're used to, when we fit more standard nonlinear parametric statistical models (NNs belong to this family, in theory).
normalize or standardize the data in some way. Since NNs are nonlinear models, normalizing the data can affect not only the numerical stability, but also the training time, and the NN outputs (a linear function such as normalization doesn't commute with a nonlinear hierarchical function). Also, at this point you may
split data in training/validation/test set, or in multiple folds if using cross-validation.
train the neural network, while at the same time controlling the loss on the validation set. Here you can enjoy the soul-wrenching pleasures of non-convex optimization, where you don't know if any solution exists, if multiple solutions exist, which is the best solution(s) in terms of generalization error and how close you got to it. The comparison between the training loss and validation loss curve guides you, of course, but don't underestimate the die hard attitude of NNs (and especially DNNs): they often show a (maybe slowly) decreasing training/validation loss even when you have crippling bugs in your code.
Check the accuracy on the test set, and make some diagnostic plots/tables.
Go back to point 1 because the results aren't good. Reiterate ad nauseam.

try changing the split between training set and test set, and see if the problem appears again. There may be a bug in the code you wrote to prepare training and test set - maybe you used the test labels for the training labels

verify that the data preprocessing steps have been performed correcty. Did you standardize or normalize your data? Since neural networks are (in theory) nonlinear parametric models, changing the normalization may actually affect the results, not just from a point of view of numerical stability.

you already checked that your problem is actually super-easy: both linear SVM and logistic regression could solve it, so that's an excellent reason to suspect that no overfitting to the test is occuring. It's just an easy problem. Lucky you :-)

I would have a look at decision regions for a few pairs or triplets of variables, maybe the most influential ones according to linear SVM or logreg. Maybe you'll find out that your problem is (close to) linearly separable, so basically most classifiers will do a great job here.

if you still are worried, shuffle class labels and retrain. Now the only way for your neural network to get high training set accuracy is to memorize the training set, which will manifest in much longer training time. At the same time, the test set will go down dramatically. If this doesn't happen, there's something seriously wrong somewhere in your code.

Perform the opposite test: initialise weights and train on just two or three data points. This time, train accuracy will immediately go to 100%, but test set accuracy will stay extremely low, no matter how long you train. If this doesn't happen, again you have a serious bug somewhere.

define the NN architecture (how many layers, which kind of layers, the connections among layers, the activation functions, etc.)
read data from some source (the Internet, a database, a set of local files, etc.), have a look at a few samples (to make sure the import has gone well) and perform data cleaning if/when needed. This step is not as trivial as people usually assume it to be. The reason is that for DNNs, we usually deal with gigantic data sets, several orders of magnitude larger than what we're used to, when we fit more standard nonlinear parametric statistical models (NNs belong to this family, in theory).
normalize or standardize the data in some way. Since NNs are nonlinear models, normalizing the data can affect not only the numerical stability, but also the training time, and the NN outputs (a linear function such as normalization doesn't commute with a nonlinear hierarchical function). Also, at this point you may
split data in training/validation/test set, or in multiple folds if using cross-validation.
train the neural network, while at the same time controlling the loss on the validation set. Here you can enjoy the soul-wrenching pleasures of non-convex optimization, where you don't know if any solution exists, if multiple solutions exist, which is the best solution(s) in terms of generalization error and how close you got to it. The comparison between the training loss and validation loss curve guides you, of course, but don't underestimate the die hard attitude of NNs (and especially DNNs): they often show a (maybe slowly) decreasing training/validation loss even when you have crippling bugs in your code.
Check the accuracy on the test set, and make some diagnostic plots/tables.
Go back to point 1 because the results aren't good. Reiterate ad nauseam.

try changing the split between training set and test set, and see if the problem appears again. There may be a bug in the code you wrote to prepare training and test set - maybe you used the test labels for the training labels

verify that the data preprocessing steps have been performed correcty. Did you standardize or normalize your data? Since neural networks are (in theory) nonlinear parametric models, changing the normalization may actually affect the results, not just from a point of view of numerical stability.

you already checked that your problem is actually super-easy: both linear SVM and logistic regression could solve it, so that's an excellent reason to suspect that no overfitting to the test is occuring. It's just an easy problem. Lucky you :-)

I would have a look at decision regions for a few pairs or triplets of variables, maybe the most influential ones according to linear SVM or logreg. Maybe you'll find out that your problem is (close to) linearly separable, so basically most classifiers will do a great job here.

if you still are worried, shuffle class labels and retrain. Now the only way for your neural network to get high training set accuracy is to memorize the training set, which will manifest in much longer training time. At the same time, the test set will go down dramatically. If this doesn't happen, there's something seriously wrong somewhere in your code.

Perform the opposite test: initialise weights and train on just two or three data points. This time, train accuracy will immediately go to 100%, but test set accuracy will stay extremely low, no matter how long you train. If this doesn't happen, again you have a serious bug somewhere.

define the NN architecture (how many layers, which kind of layers, the connections among layers, the activation functions, etc.)
read data from some source (the Internet, a database, a set of local files, etc.), have a look at a few samples (to make sure the import has gone well) and perform data cleaning if/when needed. This step is not as trivial as people usually assume it to be. The reason is that for DNNs, we usually deal with gigantic data sets, several orders of magnitude larger than what we're used to, when we fit more standard nonlinear parametric statistical models (NNs belong to this family, in theory).
normalize or standardize the data in some way. Since NNs are nonlinear models, normalizing the data can affect not only the numerical stability, but also the training time, and the NN outputs (a linear function such as normalization doesn't commute with a nonlinear hierarchical function).
split data in training/validation/test set, or in multiple folds if using cross-validation.
train the neural network, while at the same time controlling the loss on the validation set. Here you can enjoy the soul-wrenching pleasures of non-convex optimization, where you don't know if any solution exists, if multiple solutions exist, which is the best solution(s) in terms of generalization error and how close you got to it. The comparison between the training loss and validation loss curve guides you, of course, but don't underestimate the die hard attitude of NNs (and especially DNNs): they often show a (maybe slowly) decreasing training/validation loss even when you have crippling bugs in your code.
Check the accuracy on the test set, and make some diagnostic plots/tables.
Go back to point 1 because the results aren't good. Reiterate ad nauseam.

Source Link

answered Jun 28, 2018 at 14:25

DeltaIV

18.6k
7
81
124

At its core, the basic workflow for training a NN/DNN model is more or less always the same:

define the NN architecture (how many layers, which kind of layers, the connections among layers, the activation functions, etc.)
read data from some source (the Internet, a database, a set of local files, etc.), have a look at a few samples (to make sure the import has gone well) and perform data cleaning if/when needed. This step is not as trivial as people usually assume it to be. The reason is that for DNNs, we usually deal with gigantic data sets, several orders of magnitude larger than what we're used to, when we fit more standard nonlinear parametric statistical models (NNs belong to this family, in theory).
normalize or standardize the data in some way. Since NNs are nonlinear models, normalizing the data can affect not only the numerical stability, but also the training time, and the NN outputs (a linear function such as normalization doesn't commute with a nonlinear hierarchical function). Also, at this point you may
split data in training/validation/test set, or in multiple folds if using cross-validation.
train the neural network, while at the same time controlling the loss on the validation set. Here you can enjoy the soul-wrenching pleasures of non-convex optimization, where you don't know if any solution exists, if multiple solutions exist, which is the best solution(s) in terms of generalization error and how close you got to it. The comparison between the training loss and validation loss curve guides you, of course, but don't underestimate the die hard attitude of NNs (and especially DNNs): they often show a (maybe slowly) decreasing training/validation loss even when you have crippling bugs in your code.
Check the accuracy on the test set, and make some diagnostic plots/tables.
Go back to point 1 because the results aren't good. Reiterate ad nauseam.

Of course details will change based on the specific use case, but with this rough canvas in mind, we can think of what is more likely to go wrong.

try changing the split between training set and test set, and see if the problem appears again. There may be a bug in the code you wrote to prepare training and test set - maybe you used the test labels for the training labels
verify that the data preprocessing steps have been performed correcty. Did you standardize or normalize your data? Since neural networks are (in theory) nonlinear parametric models, changing the normalization may actually affect the results, not just from a point of view of numerical stability.
you already checked that your problem is actually super-easy: both linear SVM and logistic regression could solve it, so that's an excellent reason to suspect that no overfitting to the test is occuring. It's just an easy problem. Lucky you :-)
I would have a look at decision regions for a few pairs or triplets of variables, maybe the most influential ones according to linear SVM or logreg. Maybe you'll find out that your problem is (close to) linearly separable, so basically most classifiers will do a great job here.
if you still are worried, shuffle class labels and retrain. Now the only way for your neural network to get high training set accuracy is to memorize the training set, which will manifest in much longer training time. At the same time, the test set will go down dramatically. If this doesn't happen, there's something seriously wrong somewhere in your code.
Perform the opposite test: initialise weights and train on just two or three data points. This time, train accuracy will immediately go to 100%, but test set accuracy will stay extremely low, no matter how long you train. If this doesn't happen, again you have a serious bug somewhere.

Basic Architecture checks

This can be a source of issues. Usually I make these preliminary checks:

look for a simple architecture which works well on your problem (for example, MobileNetV2 in the case of image classification) and apply a suitable initialization (at this level, random will usually do). If this trains correctly on your data, at least you know that there are no glaring issues in the data set. If you can't find a simple, tested architecture which works in your case, think of a simple baseline. For example a Naive Bayes classifier for classification (or even just classifying always the most common class), or an ARIMA model for time series forecasting
Build unit tests. Neglecting to do this (and the use of the bloody Jupyter Notebook) are usually the root causes of issues in NN code I'm asked to review, especially when the model is supposed to be deployed in production. As the most upvoted answer has already covered unit tests, I'll just add that there exists a library which supports unit tests development for NN (only in Tensorflow, unfortunately).

Training Set

Double check your input data. See if you inverted the training set and test set labels, for example (happened to me once -___-), or if you imported the wrong file. Have a look at a few input samples, and the associated labels, and make sure they make sense. Check that the normalized data are really normalized (have a look at their range). Also, real-world datasets are dirty: for classification, there could be a high level of label noise (samples having the wrong class label) or for multivariate time series forecast, some of the time series components may have a lot of missing data (I've seen numbers as high as 94% for some of the inputs).

The order in which the training set is fed to the net during training may have an effect. Try a random shuffle of the training set (without breaking the association between inputs and outputs) and see if the training loss goes down.

Finally, the best way to check if you have training set issues is to use another training set. If you're doing image classification, instead than the images you collected, use a standard dataset such CIFAR10 or CIFAR100 (or ImageNet, if you can afford to train on that). These data sets are well-tested: if your training loss goes down here but not on your original data set, you may have issues in the data set.

Do the Golden Tests

There are two tests which I call Golden Tests, which are very useful to find issues in a NN which doesn't train:

reduce the training set to 1 or 2 samples, and train on this. The NN should immediately overfit the training set, reaching an accuracy of 100% on the training set very quickly, while the accuracy on the validation/test set will go to 0%. If this doesn't happen, there's a bug in your code.
the opposite test: you keep the full training set, but you shuffle the labels. The only way the NN can learn now is by memorising the training set, which means that the training loss will decrease very slowly, while the test loss will increase very quickly. In particular, you should reach the random chance loss on the test set. This means that if you have 1000 classes, you should reach an accuracy of 0.1%. If you don't see any difference between the training loss before and after shuffling labels, this means that your code is buggy (remember that we have already checked the labels of the training set in the step before).

Check that your training metric makes sense

Accuracy (0-1 loss) is a crappy metric if you have strong class imbalance. Try something more meaningful such as cross-entropy loss: you don't just want to classify correctly, but you'd like to classify with high accuracy.

Bring out the big guns

If nothing helped, it's now the time to start fiddling with hyperparameters. This is easily the worse part of NN training, but these are gigantic, non-identifiable models whose parameters are fit by solving a non-convex optimization, so these iterations often can't be avoided.

try different optimizers: SGD trains slower, but it leads to a lower generalization error, while Adam trains faster, but the test loss stalls to a higher value
try decreasing the batch size
increase the learning rate initially, and then decay it, or use a cyclic learning rate
add layers
add hidden units
remove regularization gradually (maybe switch batch norm for a few layers). The training loss should now decrease, but the test loss may increase.
visualize the distribution of weights and biases for each layer. I never had to get here, but if you're using BatchNorm, you would expect approximately standard normal distributions. See if the norm of the weights is increasing abnormally with epochs.
if you're getting some error at training time, google that error. I wasted one morning while trying to fix a perfectly working architecture, only to find out that the version of Keras I had installed had buggy multi-GPU support and I had to update it. Sometimes I had to do the opposite (downgrade a package version).
update your CV and start looking for a different job :-)