I am currently playing around with building a simple neural network to identify written numbers using the MNIST number database. When I run the first test I get a random assortment of probabilities in my output arrays, which is expected. However, when I run it more than once, my output arrays nearly all drop to zeros in all index positions. It seems that my calculations for my weighting adjustments that are being back-propagated are causing an issue but I can't seem to find out why.
def sigmoid(x): a = 1 / (1 + np.exp(-x)) return a def sigmoid_derivative(x): return x * (1 - x) def train(n, inputs): input_layer = inputs weights1 = 2 * np.random.random ((784 , 16)) - 1 weights2 = 2 * np.random.random ((16 , 10)) - 1 for i in range(n): trained_hidden = sigmoid(np.dot(input_layer, weights1)) trained_outputs = sigmoid(np.dot(trained_hidden, weights2)) o_error = (outputs - trained_outputs) o_adjustments = o_error * sigmoid_derivative(trained_outputs) h_error = np.dot(o_adjustments, weights2.T) h_adjustments = h_error * sigmoid_derivative(trained_hidden) w1 = np.dot(input_layer.T, h_adjustments) w2 = np.dot(trained_hidden.T, o_adjustments) weights1 += w1 weights2 += w2 return trained_outputs I am using Numpy arrays and the input is a (10000 x 784) array with a 0 - 1 value of greyscale and output it a (10000 x 10) array with a 1 at the index position of the actual digit.
x_train, t_train, x_test, t_test = mnist.load() inputs = x_test/256 outputs = np.zeros((10000,10), dtype=int) for i in range(10000): x = t_test[int(i)] outputs[i][x] = 1 set = train(10, inputs) I have used a number of resources to build this, including the theory coming from 3 blue 1 brown neural network series and the code being closely followed by the example provided here (https://enlight.nyc/projects/neural-network/)
Edit: As per @9000's suggestion, here is a printout of each step in one example. Looking at the results it looks like w1 (the weighting adjustment calculation) is the issue, but looking at it over and over, I cannot figure out why it is incorrect, any help is appreciated.
Edit 2: I have included a second printout of the same example on the second training run.
First Run
trained_hidden [0.87880514 0.4789476 0.38500953 0.00142838 0.01373613 0.37572408 0.53673194 0.11774215 0.99989426 0.0547656 0.20645864 0.85484692 0.99903171 0.88929566 0.00673453 0.03816501] trained_output [0.33244312 0.26289407 0.79917376 0.95143406 0.90780616 0.2100068 0.66253735 0.57961972 0.28231436 0.15963378] o_error [ 0.66755688 -0.26289407 -0.79917376 -0.95143406 -0.90780616 -0.2100068 -0.66253735 -0.57961972 -0.28231436 -0.15963378] o-adjustment [ 0.14814735 -0.05094382 -0.12826344 -0.04396319 -0.07597805 -0.03484096 -0.14813117 -0.14123055 -0.05720055 -0.02141501] h_error [-0.00359599 0.18884347 0.15954247 -0.14839811 0.2081496 -0.01152563 0.03262859 -0.46315722 -0.06974061 -0.46774417 -0.00690463 -0.44303219 -0.16267084 -0.02505235 -0.12866526 0.22212537] h_adjustment [-3.82997246e-04 4.71271721e-02 3.77760172e-02 -2.11665993e04 2.81989626e-03 -2.70339996e-03 8.11312465e-03 -4.81122794e02 -7.37327102e-06 -2.42134002e-02 -1.13120886e-03 -5.49730579e-02 -1.57359939e-04 -2.46637616e-03 -8.60664795e-04 8.15387570e-03] w1 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] w2 [-111.70644608 -164.50671691 -254.60942018 -205.06537232 -330.43317768 -94.6976 -346.78221607 -272.22044431 -249.54889015 -75.99543441] weights1 [-0.09535479 -0.09824519 -0.11582134 -0.65075843 -0.65593035 0.77593957 -0.0406199 0.12669151 0.79979191 -0.52502487 -0.2433578 0.16617536 -0.25711996 0.92995152 -0.40922601 -0.63029133] weights2 [-112.24597022 -164.86741004 -254.21715269 -205.27326963 -331.18579697 -95.07615178 -347.04311247 -271.82206581 -250.04075852 -76.69273265] Second Run
trained_hidden [0.00000000e+000 1.00000000e+000 1.00000000e+000 3.77659154e-181 1.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000 1.00000000e+000 0.00000000e+000 1.00000000e+000 0.00000000e+000 2.71000625e-055 0.00000000e+000 0.00000000e+000 1.00000000e+000] trained_output [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] o_error [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.] o-adjustment [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] h_error [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] h_adjustment [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] w1 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] w2 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] weights1 [-0.09535479 -0.09824519 -0.11582134 -0.65075843 -0.65593035 0.77593957 -0.0406199 0.12669151 0.79979191 -0.52502487 -0.2433578 0.16617536 -0.25711996 0.92995152 -0.40922601 -0.63029133] weights2 [-112.24597022 -164.86741004 -254.21715269 -205.27326963 -331.18579697 -95.07615178 -347.04311247 -271.82206581 -250.04075852 -76.69273265]