0
$\begingroup$

I mean let's say I have a mini batch, I take an example from it and for it I do the following:

  1. I do forward propagation.
  2. Using the output after forward propogation - I calculate the gradients of the parameters.

Then I take the next example from the mini-batch and repeat steps 1 and 2 for it, and after I get the parameter gradients I sum them with the gradients I got earlier. After these actions are performed for all elements of the mini-batch, I use the resulting sum of gradients to find the average value of the gradients and use it to update the weights? Am I correct?

$\endgroup$
1
  • $\begingroup$ No, you don't compute example by example. All examples of the minibatch are computed together, in parallel. $\endgroup$ Commented May 23, 2023 at 20:35

1 Answer 1

0
$\begingroup$

Your description looks conceptually correct to me. See Hugh's answer to: How does minibatch gradient descent update the weights for each example in a batch? on Cross Validated for a detailed explanation.

However, as per @noe's comment, in practice mini-batches are not implemented by processing the examples one at a time. To speed up processing, most deep learning frameworks will implement this using matrix or tensor operations and process the entire mini-batch in one pass.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.