2

I am currently playing around with building a simple neural network to identify written numbers using the MNIST number database. When I run the first test I get a random assortment of probabilities in my output arrays, which is expected. However, when I run it more than once, my output arrays nearly all drop to zeros in all index positions. It seems that my calculations for my weighting adjustments that are being back-propagated are causing an issue but I can't seem to find out why.

def sigmoid(x): a = 1 / (1 + np.exp(-x)) return a def sigmoid_derivative(x): return x * (1 - x) def train(n, inputs): input_layer = inputs weights1 = 2 * np.random.random ((784 , 16)) - 1 weights2 = 2 * np.random.random ((16 , 10)) - 1 for i in range(n): trained_hidden = sigmoid(np.dot(input_layer, weights1)) trained_outputs = sigmoid(np.dot(trained_hidden, weights2)) o_error = (outputs - trained_outputs) o_adjustments = o_error * sigmoid_derivative(trained_outputs) h_error = np.dot(o_adjustments, weights2.T) h_adjustments = h_error * sigmoid_derivative(trained_hidden) w1 = np.dot(input_layer.T, h_adjustments) w2 = np.dot(trained_hidden.T, o_adjustments) weights1 += w1 weights2 += w2 return trained_outputs 

I am using Numpy arrays and the input is a (10000 x 784) array with a 0 - 1 value of greyscale and output it a (10000 x 10) array with a 1 at the index position of the actual digit.

x_train, t_train, x_test, t_test = mnist.load() inputs = x_test/256 outputs = np.zeros((10000,10), dtype=int) for i in range(10000): x = t_test[int(i)] outputs[i][x] = 1 set = train(10, inputs) 

I have used a number of resources to build this, including the theory coming from 3 blue 1 brown neural network series and the code being closely followed by the example provided here (https://enlight.nyc/projects/neural-network/)

Edit: As per @9000's suggestion, here is a printout of each step in one example. Looking at the results it looks like w1 (the weighting adjustment calculation) is the issue, but looking at it over and over, I cannot figure out why it is incorrect, any help is appreciated.

Edit 2: I have included a second printout of the same example on the second training run.

First Run

trained_hidden [0.87880514 0.4789476 0.38500953 0.00142838 0.01373613 0.37572408 0.53673194 0.11774215 0.99989426 0.0547656 0.20645864 0.85484692 0.99903171 0.88929566 0.00673453 0.03816501] trained_output [0.33244312 0.26289407 0.79917376 0.95143406 0.90780616 0.2100068 0.66253735 0.57961972 0.28231436 0.15963378] o_error [ 0.66755688 -0.26289407 -0.79917376 -0.95143406 -0.90780616 -0.2100068 -0.66253735 -0.57961972 -0.28231436 -0.15963378] o-adjustment [ 0.14814735 -0.05094382 -0.12826344 -0.04396319 -0.07597805 -0.03484096 -0.14813117 -0.14123055 -0.05720055 -0.02141501] h_error [-0.00359599 0.18884347 0.15954247 -0.14839811 0.2081496 -0.01152563 0.03262859 -0.46315722 -0.06974061 -0.46774417 -0.00690463 -0.44303219 -0.16267084 -0.02505235 -0.12866526 0.22212537] h_adjustment [-3.82997246e-04 4.71271721e-02 3.77760172e-02 -2.11665993e04 2.81989626e-03 -2.70339996e-03 8.11312465e-03 -4.81122794e02 -7.37327102e-06 -2.42134002e-02 -1.13120886e-03 -5.49730579e-02 -1.57359939e-04 -2.46637616e-03 -8.60664795e-04 8.15387570e-03] w1 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] w2 [-111.70644608 -164.50671691 -254.60942018 -205.06537232 -330.43317768 -94.6976 -346.78221607 -272.22044431 -249.54889015 -75.99543441] weights1 [-0.09535479 -0.09824519 -0.11582134 -0.65075843 -0.65593035 0.77593957 -0.0406199 0.12669151 0.79979191 -0.52502487 -0.2433578 0.16617536 -0.25711996 0.92995152 -0.40922601 -0.63029133] weights2 [-112.24597022 -164.86741004 -254.21715269 -205.27326963 -331.18579697 -95.07615178 -347.04311247 -271.82206581 -250.04075852 -76.69273265] 

Second Run

trained_hidden [0.00000000e+000 1.00000000e+000 1.00000000e+000 3.77659154e-181 1.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000 1.00000000e+000 0.00000000e+000 1.00000000e+000 0.00000000e+000 2.71000625e-055 0.00000000e+000 0.00000000e+000 1.00000000e+000] trained_output [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] o_error [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.] o-adjustment [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] h_error [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] h_adjustment [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] w1 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] w2 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] weights1 [-0.09535479 -0.09824519 -0.11582134 -0.65075843 -0.65593035 0.77593957 -0.0406199 0.12669151 0.79979191 -0.52502487 -0.2433578 0.16617536 -0.25711996 0.92995152 -0.40922601 -0.63029133] weights2 [-112.24597022 -164.86741004 -254.21715269 -205.27326963 -331.18579697 -95.07615178 -347.04311247 -271.82206581 -250.04075852 -76.69273265] 
3
  • I would start with printing a histogram of the result of each step, to have an idea which operation ends up with some unexpected result. Commented Jun 24, 2019 at 16:21
  • You might encounter numerical problems due to the sigmoid function. That might not be your problem, but you can check a nice and stable implementation of the sigmoid here. Commented Jun 24, 2019 at 16:52
  • @AlCorreia Thanks for the suggestion, I tried the stable implementation you linked to and I am getting the same results unfortunately Commented Jun 24, 2019 at 17:21

1 Answer 1

1

First of all, are you sure your print outs of weights1 and weights2 are correct? They are the same between both runs with very different outputs, that seems very suspicious to me.

I slightly checked your derivatives and from the little I looked into, they look correct. However, I see two mistakes. When updating weights you actually want to subtact the derivative from your weights. Because the gradient always points uphill and you want to minimize the loss, therefore you want to go in the downhill direction. Second possible mistake is that your are using the full derivative as the update, basically always, in neural networks, a learning rate is used (e.g. 0.001) which is used as a multiplier for the updating derivative, if you don't scale down your gradient before update, its possible that it will overshoot really hard and e.g. set all your weights to very large values which leads to very unstable optimization.

So my suggestion is to replace:

weights1 += w1 weights2 += w2 

with:

learning_rate = 0.001 weights1 -= w1 * learning_rate weights2 -= w2 * learning_rate 

Also, general rule of thumb in debugging neural networks is to use minimal example which your network should fit, so choose a single sample from your dataset and look at the update in each iteration, this will tell you a lot (e.g. use a debugger). If you can't fit single example, you can't fit 10000.

Sign up to request clarification or add additional context in comments.

1 Comment

e.g.set all your weights to very large values which leads to very unstable optimization. -- This is exactly what was happening, made your suggested changes and everything seems to be working now! Thanks!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.