Why it is it called "batch" gradient descent if it consumes the full dataset before calculating the gradient?

edited title

Link

edited Aug 2, 2021 at 1:49

hanugm

4.2k
3
32
65

Why is it is called "batch" gradient descent if it consumes the full dataset before calculating the gradient?

added 15 characters in body; edited tags

Source Link

edited Jul 31, 2021 at 14:00

nbro

43.1k
14
121
222

Why it is it called a "batch" gradient descent if it consumes the full dataset before calculating the gradient?

While training a neural network, we can follow three methods: batch gradient descent, mini-batch gradient descent and stochastic gradient descent.

For this question, assume that your dataset has $n$ training samples and we divided in toit into $k$ batches with $\dfrac{n}{k}$ samples in each batch. So, it can be easily understood the the word "batch" is generally used to refer to a portion of the dataset rather than the whole dataset.

In batch gradient descent, we pass all the $n$ available training samples to the network and then calculates the gradients (only once). We can repeat this process several times.

In mini-batch gradient descent, we pass $\dfrac{n}{k}$ training samples to the network and calculates the gradient. That is, we calculate the gradient once for a batch. We repeat this process with all $k$ batches of samples to complete an epoch. And we can repeat this process several times.

In stochastic gradient descent, we pass one training sample to the network and calculates the gradient. That is, we calculate the gradient once for iteration. We repeat this process with all $n$ times to complete an epoch. And we can repeat this process several times.

Batch gradient descent can be viewed as a mini-batch gradient descent with $k = 1$ and stochastic gradient descent can be viewed as a mini-batch gradient descent with $k = n$.

Am I correct regarding the usage of terms in the context provided above? If wrong then where did I go wrong?

If correct, I am confused about the usage of the word "batch" in "batch gradient descent". In fact, we do not need the concept of batch in batch gradient descent since we pass all the training samples before calculating gradient. In fact, there is no need infor batch gradient descent to partition the training dataset into batches. Then why do we use the word "batch" in batch gradient descent? Similarly, we are using the word "mini-batch" in "mini-batch gradient descent". In fact, we are passing a batch of samples before calculating the gradient. Then why it is called "mini-batch" gradient descent instead of "batch" gradient descent?

Why it is called a "batch" gradient descent if it consumes full dataset before calculating gradient?

While training a neural network, we can follow three methods: batch gradient descent, mini-batch gradient descent and stochastic gradient descent.

For this question assume that your dataset has $n$ training samples and we divided in to $k$ batches with $\dfrac{n}{k}$ samples in each batch. So, it can be easily understood the the word "batch" is generally used to refer a portion of dataset rather than whole dataset.

In batch gradient descent, we pass all the $n$ available training samples to the network and then calculates the gradients (only once). We can repeat this process several times.

In mini-batch gradient descent, we pass $\dfrac{n}{k}$ training samples to the network and calculates the gradient. That is, we calculate the gradient once for a batch. We repeat this process with all $k$ batches of samples to complete an epoch. And we can repeat this process several times.

In stochastic gradient descent, we pass one training sample to the network and calculates the gradient. That is, we calculate the gradient once for iteration. We repeat this process with all $n$ times to complete an epoch. And we can repeat this process several times.

Batch gradient descent can be viewed as a mini-batch gradient descent with $k = 1$ and stochastic gradient descent can be viewed as a mini-batch gradient descent with $k = n$.

Am I correct regarding the usage of terms in the context provided above? If wrong then where did I go wrong?

If correct, I am confused about the usage of word "batch" in "batch gradient descent". In fact, we do not need the concept of batch in batch gradient descent since we pass all the training samples before calculating gradient. In fact, there is no need in batch gradient descent to partition the training dataset into batches. Then why do we use the word "batch" in batch gradient descent? Similarly, we are using the word "mini-batch" in "mini-batch gradient descent". In fact, we are passing a batch of samples before calculating the gradient. Then why it is called "mini-batch" gradient descent instead of "batch" gradient descent?

Why is it called "batch" gradient descent if it consumes the full dataset before calculating the gradient?

While training a neural network, we can follow three methods: batch gradient descent, mini-batch gradient descent and stochastic gradient descent.

For this question, assume that your dataset has $n$ training samples and we divided it into $k$ batches with $\dfrac{n}{k}$ samples in each batch. So, it can be easily understood the word "batch" is generally used to refer to a portion of the dataset rather than the whole dataset.

In batch gradient descent, we pass all the $n$ available training samples to the network and then calculates the gradients (only once). We can repeat this process several times.

In mini-batch gradient descent, we pass $\dfrac{n}{k}$ training samples to the network and calculates the gradient. That is, we calculate the gradient once for a batch. We repeat this process with all $k$ batches of samples to complete an epoch. And we can repeat this process several times.

In stochastic gradient descent, we pass one training sample to the network and calculates the gradient. That is, we calculate the gradient once for iteration. We repeat this process with all $n$ times to complete an epoch. And we can repeat this process several times.

Batch gradient descent can be viewed as a mini-batch gradient descent with $k = 1$ and stochastic gradient descent can be viewed as a mini-batch gradient descent with $k = n$.

Am I correct regarding the usage of terms in the context provided above? If wrong then where did I go wrong?

If correct, I am confused about the usage of the word "batch" in "batch gradient descent". In fact, we do not need the concept of batch in batch gradient descent since we pass all the training samples before calculating gradient. In fact, there is no need for batch gradient descent to partition the training dataset into batches. Then why do we use the word "batch" in batch gradient descent? Similarly, we are using the word "mini-batch" in "mini-batch gradient descent". In fact, we are passing a batch of samples before calculating the gradient. Then why it is called "mini-batch" gradient descent instead of "batch" gradient descent?