Questions tagged [gradient]
For questions related to the gradient, a way of packing together all the partial derivative information of a function
50 questions
5 votes
1 answer
326 views
Is PyTorch's `grad_fn` for a non-differentiable function that function's inverse?
What is grad_fn for a non-differentiable function like slicing (grad_fn=<SliceBackward0>), ...
3 votes
2 answers
152 views
Do computational graphs predate the era of machine learning?
Do computational graphs predate the era of machine learning? If so, who first devised the idea of a computational graph?
0 votes
0 answers
82 views
Backpropagation math question
Hi I have a very simple model and I'm trying to learn the math of it. Basically, I have an input matrix X m x n. An output matrix Y m x n is formed from some convolution H. The figure of merit is ...
2 votes
2 answers
129 views
How to calculate the gradient for the output with respect to the input pixels
Hi for my project I'm using a somewhat simple CNN consisting of several convolution layers and pooling layers. Essentially the model is trained to perform a blur of sorts on an input image. For my ...
1 vote
1 answer
109 views
Might use of rational numbers and calculations be beneficial for an ANN?
Rational numbers would help alleviate some gradient issues by not losing precision as the weights and the propagated values (signal) reach extremely low and high values. I'm not aware of any hardware ...
0 votes
0 answers
133 views
Why is the loss function for the critic decreasing and the gradient increasing over the episodes?
I have built an Actor Critic model, where we have a Quantile Critic. The aim of the task is to tell what is the optimal portfolio choice of an agent based on quantiles. The model set up: We have ...
0 votes
0 answers
77 views
What am I doing wrong that result in a graph indicating better gradients in non-scaled dot-product attention compared to the scaled version?
I'm trying to visualize how the gradients change as we're increasing $d_{k}$ in the scaled dot-product attention and compare it to its non scaled version but I'm failing to produce a reasonable graph ...
0 votes
2 answers
229 views
In policy gradient methods why do we compute the gradient of the objective function through a one-trajectory estimate?
Taking as an example the Advantage Actor Critic, the objective function is: \begin{equation} \nabla_{\boldsymbol{\theta}} J(\boldsymbol{\theta})=\mathbb{E}_{\tau \sim \pi_{\boldsymbol{\theta}}}\left[\...
1 vote
1 answer
103 views
In multilayer perceptron neural networks, are the names "delta", "gradient" and "error" all the same thing? or not?
What is the difference between "delta", "gradient" and "error", are these names the same thing? I'm confused because someone once told me that both the names "delta&...
0 votes
1 answer
122 views
Gradient: any resource on how to understand everything about it?
I have read some resources about AI, and they all speak about the gradient. Is there any book focused on this? maybe with tons of images / diagrams? Cheers
1 vote
1 answer
182 views
Why the gradients produced by the soft targets scale as 1/T^2 in knowledge distillation?
In the paper "Distilling the knowledge in a neural network", it mentioned "the magnitudes of the gradients produced by the soft targets scale as 1/(T^2) ", but it has no ...
1 vote
2 answers
103 views
How to apply backpropagation when one layer of the network is a call-only function (no gradient)?
I worked with Feed Forward Neural Network and VAE and understood backpropagation algorithm. Now I build a VAE network, one layer of it is a very complex vector-to-vector function $f(x)$ (a general '...
0 votes
1 answer
215 views
Is there a recommended resource that can provide a detailed overview of the gradient norm?
When it comes to the concept of "Gradient Norm," it can be challenging to find a widely recognized and clearly defined resource that offers a comprehensive explanation. While many search ...
3 votes
0 answers
105 views
Why does training converges when the norm of gradient increases?
This is from deep learning book by Ian Goodfellow and Yoshua Bengio and Aaron Courville. When training converges well, I thought the gradient should be at local minima. But the book says it often does ...
1 vote
0 answers
206 views
How to estimate the gradient of an argmin loss
Suppose we have a neural network $f_\theta(x)$, where $x$ is the input and $\theta$ is the network's parameters. For each $\theta$, we can minimize $f_\theta(x)$ w.r.t. $x$ and obtain the minimum ...