8
$\begingroup$

When training a neural network with keras for the categorical_crossentropy loss, how exactly is the loss defined? I expect it to be the average over all samples of $$\textstyle\text{loss}(p^\text{true}, p^\text{predict}) = -\sum_i p_i^\text{true} \log p_i^\text{predict}$$ but couldn't find a definitive answer in the docs nor in the code. An authoritative reference is desirable.

Looking at the code I'm not sure if the computation is delegated to tensorflow/theano.

(There is an analogous question concerning the accuracy; the code is clearer, but I don't see a call to mean()?)

PS. From this code, it appears that loss and accuracy are computed as in loss_and_acc(...) but before the last training epoch (keras version 2.0.4, same results for tensorflow and theano backend).

#!/usr/bin/python3 import numpy as np from numpy.random import randint, seed from keras import __version__ as keras_version from keras.models import Sequential from keras.layers import Dense N = 4 # Classes S = 10 # Samples nn = Sequential() nn.add(Dense(input_dim=1, units=N, kernel_initializer='normal', activation='softmax')) nn.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) seed(7) X = np.random.random((S, 1)) Y = np.vstack([np.eye(1, N, k=randint(0, N)) for _ in range(S)]) #for (x, y) in zip(X, Y) : print(x, y) def loss_and_acc(NN, X, Y) : loss = [] acc = [] for (p, q) in zip(Y, NN.predict(X)) : loss += [ -sum(a*np.log(b) for (a, b) in zip(p, q) if (b != 0)) ] acc += [ np.argmax(p) == np.argmax(q) ] return (np.mean(loss), np.mean(acc)) print("Keras version: ", keras_version) for _ in range(10) : print("Before: loss = {}, acc = {}".format(*loss_and_acc(nn, X, Y))) H = nn.fit(X, Y, epochs=1, verbose=0).history print("History: loss = {}, acc = {}".format(H['loss'][-1], H['acc'][-1])) 

Output:

Using Theano backend. Keras version: 2.0.4 Before: loss = 1.3843669414520263, acc = 0.2 History: loss = 1.3843669891357422, acc = 0.20000000298023224 Before: loss = 1.3834303855895995, acc = 0.2 History: loss = 1.3834303617477417, acc = 0.20000000298023224 Before: loss = 1.3824962615966796, acc = 0.3 History: loss = 1.3824962377548218, acc = 0.30000001192092896 Before: loss = 1.381564486026764, acc = 0.3 History: loss = 1.3815644979476929, acc = 0.30000001192092896 Before: loss = 1.380635154247284, acc = 0.3 History: loss = 1.380635142326355, acc = 0.30000001192092896 Before: loss = 1.3797082901000977, acc = 0.3 History: loss = 1.3797082901000977, acc = 0.30000001192092896 Before: loss = 1.378783941268921, acc = 0.2 History: loss = 1.378783941268921, acc = 0.20000000298023224 Before: loss = 1.3778621554374695, acc = 0.2 History: loss = 1.3778622150421143, acc = 0.20000000298023224 Before: loss = 1.3769428968429565, acc = 0.2 History: loss = 1.3769428730010986, acc = 0.20000000298023224 Before: loss = 1.3760262489318849, acc = 0.3 History: loss = 1.3760262727737427, acc = 0.30000001192092896 
$\endgroup$

1 Answer 1

5
$\begingroup$

I am using keras with tensorflow backend. I checked and the categorical_crossentropy loss in keras is defined as you have defined. This is the part of code (not the whole function definition)-

def categorical_crossentropy(target, output, from_logits=False, axis=-1): if not from_logits: # scale preds so that the class probas of each sample sum to 1 output /= tf.reduce_sum(output, axis, True) # manual computation of crossentropy _epsilon = _to_tensor(epsilon(), output.dtype.base_dtype) output = tf.clip_by_value(output, _epsilon, 1. - _epsilon) return - tf.reduce_sum(target * tf.log(output), axis) 

As you can see in last line, it is returning the sum of product of true values and log of output values for each observation. You can find complete function definition here in line 3176.

For theano backend, it should be same. You can check here in line 1622.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.