The main problem here is that even before attempting to apply anomaly detection algorithms, you are not getting good enough predictions of gas consumption using neural networks.
If the main goal here is to reach the stage when anomaly detection algorithms could be used and you state that you have access to examples of successful application of linear regression for this problem, this approach could be more productive. One of the principles of successful machine learning application is that several different algorithms can be tried out before final selection based on results.
It you choose to tune your neural network performance, learning curvelearning curve plotting the effect of change in different hyperparameters on the error rate can be used. Hyperparameters that can be modified are:
- number of features
- order of the polynomial
- regularization parameter
- number of layers in the network
Best settings can be selected by the performance on cross validation set.