1

I am trying to understand what is the accuracy "acc" shown in the keras progress bar at the end of epoch:

13/13 [==============================] - 0s 76us/step - loss: 0.7100 - acc: 0.4615

At the end of an epoch it should be the accuracy of the model predictions of all training samples. However when the model is evaluated on the same training samples, the actual accuracy can be very different.

Below is adapted example of MLP for binary classification from keras webpage. A simple sequential neural net is doing binary classification of randomly generated numbers. The batch size is the same as the number of training examples (13), so that every epoch contain only one step. Since loss is set to binary_crossentropy, for the accuracy calculation is used binary_accuracy defined in metrics.py. MyEval class defines callback, which is called at the end of each epoch. It uses two ways of calculating the accuracy of the training data a) model evaluate and b) model predict to get prediction and then almost the same code as is used in keras binary_accuracy function. These two accuracies are consistent, but most of the time are different to the one in the progress bar. Why they are different? Is is possible to calculate the same accuracy as is in the progress bar? Or have I made a mistake in my assumptions?

import numpy as np from keras.models import Sequential from keras.layers import Dense, Dropout from keras import callbacks np.random.seed(1) # fix random seed for reproducibility # Generate dummy data x_train = np.random.random((13, 20)) y_train = np.random.randint(2, size=(13, 1)) model = Sequential() model.add(Dense(64, input_dim=20, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(64, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy']) class MyEval(callbacks.Callback): def on_epoch_end(self, epoch, logs=None): my_accuracy_1 = self.model.evaluate(x_train, y_train, verbose=0)[1] y_pred = self.model.predict(x_train) my_accuracy_2 = np.mean(np.equal(y_train, np.round(y_pred))) print("my accuracy 1: {}".format(my_accuracy_1)) print("my accuracy 2: {}".format(my_accuracy_2)) my_eval = MyEval() model.fit(x_train, y_train, epochs=5, batch_size=13, callbacks=[my_eval], shuffle=False) 

The output of the above code:

13/13 [==============================] - 0s 25ms/step - loss: 0.7303 - acc: 0.5385 my accuracy 1: 0.5384615659713745 my accuracy 2: 0.5384615384615384 Epoch 2/5 13/13 [==============================] - 0s 95us/step - loss: 0.7412 - acc: 0.4615 my accuracy 1: 0.9230769276618958 my accuracy 2: 0.9230769230769231 Epoch 3/5 13/13 [==============================] - 0s 77us/step - loss: 0.7324 - acc: 0.3846 my accuracy 1: 0.9230769276618958 my accuracy 2: 0.9230769230769231 Epoch 4/5 13/13 [==============================] - 0s 72us/step - loss: 0.6543 - acc: 0.5385 my accuracy 1: 0.9230769276618958 my accuracy 2: 0.9230769230769231 Epoch 5/5 13/13 [==============================] - 0s 76us/step - loss: 0.6459 - acc: 0.6923 my accuracy 1: 0.8461538553237915 my accuracy 2: 0.8461538461538461 

using: Python 3.5.2, tensorflow-gpu==1.14.0 Keras==2.2.4 numpy==1.15.2

1 Answer 1

1

I think it has to do with the usage of Dropout. Dropout is only enabled during training, but not during evaluation or prediction. Hence the discrepancy of the accuracies during training and evaluation/prediction.

Moreover, the training accuracy that is displayed in the bar shows the averaged accuracy over the training epoch, averaged over the batch accuracies calculated after each batch. Keep in mind that the model parameters are tuned after each batch, such that the accuracy shown in the bar at the end does not exactly match the accuracy of a valication after the epoch is finished (because the training accuracy is calculated with different model parameters per batch, and the validation accuracy is calculated with the same parameters for all batches).

This is your example, with more data (therefore more than one epoch), and without dropout:

import numpy as np from keras.models import Sequential from keras.layers import Dense, Dropout from keras import callbacks np.random.seed(1) # fix random seed for reproducibility # Generate dummy data x_train = np.random.random((200, 20)) y_train = np.random.randint(2, size=(200, 1)) model = Sequential() model.add(Dense(64, input_dim=20, activation='relu')) # model.add(Dropout(0.5)) model.add(Dense(64, activation='relu')) # model.add(Dropout(0.5)) model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy']) class MyEval(callbacks.Callback): def on_epoch_end(self, epoch, logs=None): my_accuracy_1 = self.model.evaluate(x_train, y_train, verbose=0)[1] y_pred = self.model.predict(x_train) my_accuracy_2 = np.mean(np.equal(y_train, np.round(y_pred))) print("my accuracy 1 after epoch {}: {}".format(epoch + 1,my_accuracy_1)) print("my accuracy 2 after epoch {}: {}".format(epoch + 1,my_accuracy_2)) my_eval = MyEval() model.fit(x_train, y_train, epochs=5, batch_size=13, callbacks=[my_eval], shuffle=False) 

The output reads:

Train on 200 samples Epoch 1/5 my accuracy 1 after epoch 1: 0.5450000166893005 my accuracy 2 after epoch 1: 0.545 200/200 [==============================] - 0s 2ms/sample - loss: 0.6978 - accuracy: 0.5350 Epoch 2/5 my accuracy 1 after epoch 2: 0.5600000023841858 my accuracy 2 after epoch 2: 0.56 200/200 [==============================] - 0s 383us/sample - loss: 0.6892 - accuracy: 0.5550 Epoch 3/5 my accuracy 1 after epoch 3: 0.5799999833106995 my accuracy 2 after epoch 3: 0.58 200/200 [==============================] - 0s 496us/sample - loss: 0.6844 - accuracy: 0.5800 Epoch 4/5 my accuracy 1 after epoch 4: 0.6000000238418579 my accuracy 2 after epoch 4: 0.6 200/200 [==============================] - 0s 364us/sample - loss: 0.6801 - accuracy: 0.6150 Epoch 5/5 my accuracy 1 after epoch 5: 0.6050000190734863 my accuracy 2 after epoch 5: 0.605 200/200 [==============================] - 0s 393us/sample - loss: 0.6756 - accuracy: 0.6200 

The validation accuracy after the epoch pretty much resembles the averaged training accuracy at the end of the epoch now.

Sign up to request clarification or add additional context in comments.

3 Comments

You are right, that the Dropout was not supposed to be there, thank you. Now after setting the batch_size to 200 (or putting back number of samples to 13), epoch=batch, we get the same accuracy as is in progress bar only that it is in the following line/epoch, which weird and maybe it has to do with printing. ``` Epoch 2/15 200/200 [==============================] - 0s 8us/step - loss: 0.6856 - acc: 0.5350 my accuracy 1 after epoch 2: 0.635 my accuracy 2 after epoch 2: 0.635 Epoch 3/15 200/200 [==============================] - 0s 5us/step - loss: 0.6789 - acc: 0.6350 ```
Exactly, that's due to the printing. I added the epoch in the callback's printout to avoid the confusion.
Yes, the epoch number printing is helpful, but according to what I see in the output (and the small piece in my comment^^ if you formatted), it should be epoch + 2 (not epoch + 1). (remember to set batch size to be the same as number of samples). Thank you! I accept it.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.