Skip to main content

Questions tagged [backpropagation]

Use for questions about Backpropagation, which is commonly used in training Neural Networks in conjunction with an optimization method such as gradient descent.

8 votes
2 answers
2k views

Specifically to solve the problem of text generation, not translation. There is literally not a single discussion, blog post, or tutorial that explains the math behind this. My best guess so far is: ...
Austin Capobianco's user avatar
1 vote
1 answer
40 views

If I would do loss = loss/10 before calculating the gradient would that change the amount of change applied to the model parameters during back propagation? Or is ...
GreedyGroot's user avatar
2 votes
1 answer
105 views

I'm currently working on deriving the the gradients of a simple recurrent neural networks weights with respect to the loss to update the weights through backpropagation. It's a super simple network, ...
namor129's user avatar
1 vote
0 answers
50 views

Given the following network: I'm asked to write the backpropagation process for the $b_3$ parameter, where the loss function is $L(y,z_3)=(z_3-y)^2$ I'm not supposed to calculate any of the weights ...
Aishgadol's user avatar
  • 111
0 votes
1 answer
37 views

For a simple skip connection $y = x@w + x$, the gradient dy/dx will be $w+1$. $$\frac {\partial y}{\partial x} = w +1$$ Is +1 a bit too large and can it overpower $...
mon's user avatar
  • 829
0 votes
1 answer
157 views

I'm trying to implement RNN and LSTM , many-to-many architecture. I reasoned myself why BPTT is necessary in RNNs and it makes sense. But what doesn't make sense to me is, most of resources I went ...
Amith Adiraju's user avatar
0 votes
0 answers
105 views

From what I read, I know we don't use log loss or cross entropy for regression problems. However, the entire logic behind binary cross entropy(say) is to firstly squeeze the y_hat between 0 and 1 (...
the_he_man's user avatar
1 vote
0 answers
72 views

I have a stupid question on the derivative of relu activation function. After the finding the difference of the true output $t_k$ and predicted output $a_k$, why is the value of the $d_{a3}$ \ $d_{z3}$...
Gunners 's user avatar

15 30 50 per page
1
2 3 4 5
21