Why should we normalize data for deep learning in Keras?

Question

I was testing some network architectures in Keras for classifying the MNIST dataset. I have implemented one that is similar to the LeNet.

I have seen that in the examples that I have found on the internet, there is a step of data normalization. For example:

X_train /= 255

I have performed a test without this normalization and I have seen that the performance (accuracy) of the network has decreased (keeping the same number of epochs). Why has this happened?

If I increase the number of epochs, the accuracy can reach the same level reached by the model trained with normalization?

So, the normalization affects the accuracy, or only the training speed?

The complete source code of my training script is below:

from keras.models import Sequential from keras.layers.convolutional import Conv2D from keras.layers.convolutional import MaxPooling2D from keras.layers.core import Activation from keras.layers.core import Flatten from keras.layers.core import Dense from keras.datasets import mnist from keras.utils import np_utils from keras.optimizers import SGD, RMSprop, Adam import numpy as np import matplotlib.pyplot as plt from keras import backend as k def build(input_shape, classes): model = Sequential() model.add(Conv2D(20, kernel_size=5, padding="same",activation='relu',input_shape=input_shape)) model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2))) model.add(Conv2D(50, kernel_size=5, padding="same", activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2))) model.add(Flatten()) model.add(Dense(500)) model.add(Activation("relu")) model.add(Dense(classes)) model.add(Activation("softmax")) return model NB_EPOCH = 4 # number of epochs BATCH_SIZE = 128 # size of the batch VERBOSE = 1 # set the training phase as verbose OPTIMIZER = Adam() # optimizer VALIDATION_SPLIT=0.2 # percentage of the training data used for evaluating the loss function IMG_ROWS, IMG_COLS = 28, 28 # input image dimensions NB_CLASSES = 10 # number of outputs = number of digits INPUT_SHAPE = (1, IMG_ROWS, IMG_COLS) # shape of the input (X_train, y_train), (X_test, y_test) = mnist.load_data() k.set_image_dim_ordering("th") X_train = X_train.astype('float32') X_test = X_test.astype('float32') X_train /= 255 X_test /= 255 X_train = X_train[:, np.newaxis, :, :] X_test = X_test[:, np.newaxis, :, :] print(X_train.shape[0], 'train samples') print(X_test.shape[0], 'test samples') y_train = np_utils.to_categorical(y_train, NB_CLASSES) y_test = np_utils.to_categorical(y_test, NB_CLASSES) model = build(input_shape=INPUT_SHAPE, classes=NB_CLASSES) model.compile(loss="categorical_crossentropy", optimizer=OPTIMIZER,metrics=["accuracy"]) history = model.fit(X_train, y_train, batch_size=BATCH_SIZE, epochs=NB_EPOCH, verbose=VERBOSE, validation_split=VALIDATION_SPLIT) model.save("model2") score = model.evaluate(X_test, y_test, verbose=VERBOSE) print('Test accuracy:', score[1])

What do you mean by performance here? Is it training speed or is it accuracy? — Shridhar R Kulkarni
– Shridhar R Kulkarni, Commented Jan 16, 2018 at 16:52

Shridhar R Kulkarni · Accepted Answer · 2018-01-16 20:00:16Z

Normalization is a generic concept not limited only to deep learning or to Keras.

Why to normalize?

Let me take a simple logistic regression example which will be easy to understand and to explain normalization. Assume we are trying to predict if a customer should be given loan or not. Among many available independent variables lets just consider Age and Income. Let the equation be of the form:

Y = weight_1 * (Age) + weight_2 * (Income) + some_constant

Just for sake of explanation let Age be usually in range of [0,120] and let us assume Income in range of [10000, 100000]. The scale of Age and Income are very different. If you consider them as is then weights weight_1 and weight_2 may be assigned biased weights. weight_2 might bring more importance to Income as a feature than to what weight_1 brings importance to Age. To scale them to a common level, we can normalize them. For example, we can bring all the ages in range of [0,1] and all incomes in range of [0,1]. Now we can say that Age and Income are given equal importance as a feature.

Does Normalization always increase the accuracy?

Apparently, No. It is not necessary that normalization always increases accuracy. It may or might not, you never really know until you implement. Again it depends on at which stage in you training you apply normalization, on whether you apply normalization after every activation, etc.

As the range of the values of the features gets narrowed down to a particular range because of normalization, its easy to perform computations over a smaller range of values. So, usually the model gets trained a bit faster.

Regarding the number of epochs, accuracy usually increases with number of epochs provided that your model doesn't start over-fitting.

A very good explanation for Normalization/Standardization and related terms is here.

pietz · Accepted Answer · 2018-01-17 08:15:55Z

In a nutshell, normalization reduces the complexity of the problem your network is trying to solve. This can potentially increase the accuracy of your model and speed up the training. You bring the data on the same scale and reduce variance. None of the weights in the network are wasted on doing a normalization for you, meaning that they can be used more efficiently to solve the actual task at hand.

O.Gask22 · Accepted Answer · 2022-01-30 17:49:25Z

As @Shridhar R Kulkarni says, normalization is a general concept and doesn’t only apply to keras.

It’s often applied as part of data preparation for ML learning models to change numeric values in the dataset to fit a standard scale without distorting the differences in their ranges. As such, normalization enhances the cohesion of entity types within a model by reducing the probability of inconsistent data.

However, not every other dataset and use case requires normalization, it’s primarily necessary when features have different ranges. You may use when;

You want to improve your model’s convergence efficiency and make
optimization feasible
When you want to make training less sensitive to scale features, you can better solve coefficients.
Want to improve analysis from multiple models.

Normalization is not recommended when;

-Using decision tree models or ensembles based on them
-Your data is not normally distributed- you may have to use other data pre- processing techniques
-If your dataset comprises already scaled variables

In some cases, normalization can improve performance. However, it is not always necessary.

The critical thing is to understand your dataset and scenario first, then you’ll know whether you need it or not. Sometimes, you can experiment to see if it gives you good performance or not.

Check out deepchecks and see how to deal with important data-related checks you come across in ML.

For example, to check duplicated data in your set, you can use the following code detailed code

from deepchecks.checks.integrity.data_duplicates import DataDuplicates from deepchecks.base import Dataset, Suite from datetime import datetime import pandas as pd

arj · Accepted Answer · 2021-05-13 13:39:51Z

I think there are some issue with the convergence of the optimizer function too. Here i show a simple linear regression. Three examples: First with an array with small values and it works as expected. Second an array with bigger values and the loss function explodes toward infinity, suggesting the need to normalize. And at the end in model 3 the same array as case two but it has been normalized and we get convergence.

github colab enabled ipython notebook

I've use the MSE optimizer function i don't know if other optimizers suffer the same issues.

This feels almost like a link only answer. I can't get anything from your post without actually following the link. Could you explain a bit what happens in the notebook?

Collectives™ on Stack Overflow

Why should we normalize data for deep learning in Keras?

4 Answers 4

Comments

Comments

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

1 Comment

Linked

Related