Why you should use it
To understand what is the normalization of the lot, first we must solve the problem that is trying to solve.
Usually, to train a neural network, we perform a pre-processing of the input data. For example, we could normalize all data to resemble a normal distribution (ie, zero mean and unit variance). Why do we do this pre-processing? Well, there are many reasons for this, some of which are: preventing early saturation of nonlinear activation functions such as the sigmoid function, ensuring that all input data are in the same range of values, etc.
But the problem appears in the intermediate layers because the distribution of activations changes continuously during training. This slows down the training process because each layer must learn to adapt to a new distribution in each training phase. This problem is known as internal covariato movement.
So … what happens if we force the input of each level to have approximately the same distribution in each training phase?
What & # 39; is
Batch normalization is a method that we can use to normalize the inputs of each level, in order to combat the problem of moving the internal covariate.
During the training period, a batch normalization level performs the following operations:
- Calculates the mean and variance of the input levels.
2. Normalize layer inputs using previously calculated batch statistics.
3. Scale and move to get the level output.
Note that γ is β we are learned during training together with the original parameters of the network.
During the test time (or inference), the mean and variance are fixed. They are calculated using the means and deviations calculated previously for each training lot.
So, if every lot had m samples and where j lots:
How we use it in Tensorflow
Fortunately for us, the Tensorflow API has already implemented all this math in the tf.layers.batch_normalization layer.
To add a batch normalization level in your model, all you need to do is use the following code:
It is very important to obtain the update ops as indicated in the Tensorflow documentation because in the training time the variance in movement and the moving average of the level must be updated. If you do not, batch normalization it will not work and the network he will not train as expected.
It is also useful to declare a placeholder to communicate to the network whether it is training time or inference time (we have already discussed what the differences are for the train and the test time).
Note that this layer has A lot more parameters (you can check them in the documentation), but this is the basic work code you should use.
If you feel curious about normalizing the games, I invite you to take a look at these documents and videos:
- Sergey Ioffe, Christian Szegedy, Batch Standardization: accelerate deep training on the network by reducing the internal Covariate Shift, 2015
- Tim Cooijmans, Nicolas Ballas, César Laurent, Çaglar Gülçehre, Aaron Courville, Periodic regularization of matches, 2016
- Mahdi M. Kalayeh, Mubarak Shah, quicker training by separating modes of variation in standardized batch models, 2018
- CS231n Lecture 6, at the Stanford University School of Engineering Training Neural Networks I, 2017
- Daejin Jung, Wonkyung Jung and Byeongho Kim, Sunjung Lee, Wonjong Rhee, Jung Ho Ahn, Restructuring batch normalization to accelerate the formation of CNN, 2018