6
$\begingroup$

enter image description here

This is one of the slides from Andrew Ng course on deep learning. Actually I took it from Jason Brownlee website that seems to second the idea presented on the picture.

However, my limited experience shows that after some point the line stats to head down. I use Keras with EarlyStopping to prevent overfitting. The additional data that I introduce is basically temperature from extra past hours. Even though the temperature is highly correlated with predicted parameter (Pearson's R~0.9) I still get decrease in performance (increased MSE).

What could cause that?

What's more: I use two layer NN and increase its number of neurons (input and hidden) for extra single parameter added.

My code:

# fix random seed for reproducibility seed = 7 np.random.seed(seed)

kf = KFold(n_splits=10 , random_state=seed, shuffle=True) kf.get_n_splits(x_cv) print(kf)

cvscores = [] for train, test in kf.split(x_cv): # create model model = Sequential() model.add(Dense(55, activation="relu", kernel_initializer="normal", input_dim=55)) #when activation=tanh then rescale to -1 1 model.add(Dense(55, activation="relu", kernel_initializer="normal")) #model.add(Dense(30, activation="relu", kernel_initializer="normal")) #model.add(Dense(31, input_dim=31, init= normal , activation= relu )) model.add(Dense(1, kernel_initializer="normal")) # Compile model model.compile(loss= 'mean_squared_error' , optimizer= 'adam' ) #EarlyStopping: es = EarlyStopping(monitor='loss', min_delta=0.0, patience=3, verbose=0, mode='min')

# Fit the model model.fit(x_cv[train], y_cv[train],callbacks=[es],batch_size=100, epochs=1000,verbose=0) scores = model.evaluate(x_cv[test], y_cv[test], verbose=0) print 'mean_squared_error',scores cvscores.append(scores)

$\endgroup$

1 Answer 1

3
$\begingroup$

The notion of "more data -> better performance" is normally used in context of number of samples and not the size of each sample. I.e. Deep learning can extract more information from higher number of observations than other methods. In your example you are talking more about giving additional information per sample rather than more samples.

Things to check:

  • Scale of the temperature - improperly scaled inputs can completely destroy the stability of training
  • Outliers - if model heavily relies on the temperature to predict the outcome it is possible that outliers in this relationship can create wildly wrong predictions and since MSE is sensitive to outliers you get worse performance.
$\endgroup$
2
  • $\begingroup$ I know it is pretty late comment, but can you please explain your second comment, i.e. MSE is sensitive to outliers, how is that? Thank you. $\endgroup$ Commented Jan 29, 2020 at 18:16
  • 1
    $\begingroup$ Outliers generally produce larger error values than normal values. MSE squares this error increasing it even further. This shifts agreggate MSE to proportionally represent outlier errors more. For example, if there are 3 points with average error of 2 units (whatever those are) and 1 outlier with error of 5 units , mean MSE will (3*(2^2)+5^2)/4=9.25 meanwhile without outlier mean MSE would be just 4 . You can see that the outlier greatly shifted the MSE due to quadratic effect. $\endgroup$ Commented Jan 30, 2020 at 20:35

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.