Questions tagged [backpropagation]

Ask Question

Use for questions about Backpropagation, which is commonly used in training Neural Networks in conjunction with an optimization method such as gradient descent.

301 questions

8 votes

2 answers

2k views

How does backpropagation in a transformer work?

Specifically to solve the problem of text generation, not translation. There is literally not a single discussion, blog post, or tutorial that explains the math behind this. My best guess so far is: ...

Austin Capobianco

asked Mar 1 at 17:37

1 vote

1 answer

40 views

Does reducing the loss change the amount of change during backpropagation?

If I would do loss = loss/10 before calculating the gradient would that change the amount of change applied to the model parameters during back propagation? Or is ...

GreedyGroot

asked Dec 16, 2024 at 22:55

2 votes

1 answer

105 views

Deriving the gradient hidden to hidden weights for backpropagation through time in a reccurent neural network

I'm currently working on deriving the the gradients of a simple recurrent neural networks weights with respect to the loss to update the weights through backpropagation. It's a super simple network, ...

namor129

asked Aug 11, 2024 at 23:51

1 vote

0 answers

50 views

Backpropgation for a single parameter on a rather simple network

Given the following network: I'm asked to write the backpropagation process for the $b_3$ parameter, where the loss function is $L(y,z_3)=(z_3-y)^2$ I'm not supposed to calculate any of the weights ...

Aishgadol

asked Apr 3, 2024 at 12:29

0 votes

1 answer

37 views

Why no scale parameter for skip connection addition?

For a simple skip connection $y = x@w + x$, the gradient dy/dx will be $w+1$. $$\frac {\partial y}{\partial x} = w +1$$ Is +1 a bit too large and can it overpower $...

mon

asked Feb 18, 2024 at 7:01

0 votes

1 answer

157 views

Why not Back propagate through time in LSTM , similar to RNN

I'm trying to implement RNN and LSTM , many-to-many architecture. I reasoned myself why BPTT is necessary in RNNs and it makes sense. But what doesn't make sense to me is, most of resources I went ...

Amith Adiraju

asked Jan 31, 2024 at 2:45

0 votes

0 answers

105 views

Doubts on a custom loss function for regression problems

From what I read, I know we don't use log loss or cross entropy for regression problems. However, the entire logic behind binary cross entropy(say) is to firstly squeeze the y_hat between 0 and 1 (...

the_he_man

asked Oct 30, 2023 at 19:01

1 vote

0 answers

72 views

Relu derivative value

I have a stupid question on the derivative of relu activation function. After the finding the difference of the true output $t_k$ and predicted output $a_k$, why is the value of the $d_{a3}$ \ $d_{z3}$...

Gunners

asked Sep 23, 2023 at 18:20

15 30 50 per page

2 3 4 5

…

21 Next

Stack Exchange Network

Questions tagged [backpropagation]

How does backpropagation in a transformer work?

Does reducing the loss change the amount of change during backpropagation?

Deriving the gradient hidden to hidden weights for backpropagation through time in a reccurent neural network

Backpropgation for a single parameter on a rather simple network

Why no scale parameter for skip connection addition?

Why not Back propagate through time in LSTM , similar to RNN

Doubts on a custom loss function for regression problems

Relu derivative value

Hot Network Questions