When training a neural network with keras for the categorical_crossentropy loss, how exactly is the loss defined? I expect it to be the average over all samples of $$\textstyle\text{loss}(p^\text{true}, p^\text{predict}) = -\sum_i p_i^\text{true} \log p_i^\text{predict}$$ but couldn't find a definitive answer in the docs nor in the code. An authoritative reference is desirable.
Looking at the code I'm not sure if the computation is delegated to tensorflow/theano.
(There is an analogous question concerning the accuracy; the code is clearer, but I don't see a call to mean()?)
PS. From this code, it appears that loss and accuracy are computed as in loss_and_acc(...) but before the last training epoch (keras version 2.0.4, same results for tensorflow and theano backend).
#!/usr/bin/python3 import numpy as np from numpy.random import randint, seed from keras import __version__ as keras_version from keras.models import Sequential from keras.layers import Dense N = 4 # Classes S = 10 # Samples nn = Sequential() nn.add(Dense(input_dim=1, units=N, kernel_initializer='normal', activation='softmax')) nn.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) seed(7) X = np.random.random((S, 1)) Y = np.vstack([np.eye(1, N, k=randint(0, N)) for _ in range(S)]) #for (x, y) in zip(X, Y) : print(x, y) def loss_and_acc(NN, X, Y) : loss = [] acc = [] for (p, q) in zip(Y, NN.predict(X)) : loss += [ -sum(a*np.log(b) for (a, b) in zip(p, q) if (b != 0)) ] acc += [ np.argmax(p) == np.argmax(q) ] return (np.mean(loss), np.mean(acc)) print("Keras version: ", keras_version) for _ in range(10) : print("Before: loss = {}, acc = {}".format(*loss_and_acc(nn, X, Y))) H = nn.fit(X, Y, epochs=1, verbose=0).history print("History: loss = {}, acc = {}".format(H['loss'][-1], H['acc'][-1])) Output:
Using Theano backend. Keras version: 2.0.4 Before: loss = 1.3843669414520263, acc = 0.2 History: loss = 1.3843669891357422, acc = 0.20000000298023224 Before: loss = 1.3834303855895995, acc = 0.2 History: loss = 1.3834303617477417, acc = 0.20000000298023224 Before: loss = 1.3824962615966796, acc = 0.3 History: loss = 1.3824962377548218, acc = 0.30000001192092896 Before: loss = 1.381564486026764, acc = 0.3 History: loss = 1.3815644979476929, acc = 0.30000001192092896 Before: loss = 1.380635154247284, acc = 0.3 History: loss = 1.380635142326355, acc = 0.30000001192092896 Before: loss = 1.3797082901000977, acc = 0.3 History: loss = 1.3797082901000977, acc = 0.30000001192092896 Before: loss = 1.378783941268921, acc = 0.2 History: loss = 1.378783941268921, acc = 0.20000000298023224 Before: loss = 1.3778621554374695, acc = 0.2 History: loss = 1.3778622150421143, acc = 0.20000000298023224 Before: loss = 1.3769428968429565, acc = 0.2 History: loss = 1.3769428730010986, acc = 0.20000000298023224 Before: loss = 1.3760262489318849, acc = 0.3 History: loss = 1.3760262727737427, acc = 0.30000001192092896