Skip to main content

Questions tagged [gradient]

For questions related to the gradient, a way of packing together all the partial derivative information of a function

5 votes
1 answer
326 views

What is grad_fn for a non-differentiable function like slicing (grad_fn=<SliceBackward0>), ...
Geremia's user avatar
  • 577
3 votes
2 answers
152 views

Do computational graphs predate the era of machine learning? If so, who first devised the idea of a computational graph?
Geremia's user avatar
  • 577
0 votes
0 answers
82 views

Hi I have a very simple model and I'm trying to learn the math of it. Basically, I have an input matrix X m x n. An output matrix Y m x n is formed from some convolution H. The figure of merit is ...
James Li's user avatar
2 votes
2 answers
129 views

Hi for my project I'm using a somewhat simple CNN consisting of several convolution layers and pooling layers. Essentially the model is trained to perform a blur of sorts on an input image. For my ...
James Li's user avatar
1 vote
1 answer
109 views

Rational numbers would help alleviate some gradient issues by not losing precision as the weights and the propagated values (signal) reach extremely low and high values. I'm not aware of any hardware ...
Mark_Sagecy's user avatar
0 votes
0 answers
133 views

I have built an Actor Critic model, where we have a Quantile Critic. The aim of the task is to tell what is the optimal portfolio choice of an agent based on quantiles. The model set up: We have ...
dragonforce's user avatar
0 votes
0 answers
77 views

I'm trying to visualize how the gradients change as we're increasing $d_{k}$ in the scaled dot-product attention and compare it to its non scaled version but I'm failing to produce a reasonable graph ...
Daviiid's user avatar
  • 595
0 votes
2 answers
229 views

Taking as an example the Advantage Actor Critic, the objective function is: \begin{equation} \nabla_{\boldsymbol{\theta}} J(\boldsymbol{\theta})=\mathbb{E}_{\tau \sim \pi_{\boldsymbol{\theta}}}\left[\...
user avatar
1 vote
1 answer
103 views

What is the difference between "delta", "gradient" and "error", are these names the same thing? I'm confused because someone once told me that both the names "delta&...
will The J's user avatar
0 votes
1 answer
122 views

I have read some resources about AI, and they all speak about the gradient. Is there any book focused on this? maybe with tons of images / diagrams? Cheers
zerunio's user avatar
1 vote
1 answer
182 views

In the paper "Distilling the knowledge in a neural network", it mentioned "the magnitudes of the gradients produced by the soft targets scale as 1/(T^2) ", but it has no ...
Xiong's user avatar
  • 11
1 vote
2 answers
103 views

I worked with Feed Forward Neural Network and VAE and understood backpropagation algorithm. Now I build a VAE network, one layer of it is a very complex vector-to-vector function $f(x)$ (a general '...
whitegreen's user avatar
0 votes
1 answer
215 views

When it comes to the concept of "Gradient Norm," it can be challenging to find a widely recognized and clearly defined resource that offers a comprehensive explanation. While many search ...
StudentV's user avatar
3 votes
0 answers
105 views

This is from deep learning book by Ian Goodfellow and Yoshua Bengio and Aaron Courville. When training converges well, I thought the gradient should be at local minima. But the book says it often does ...
tesio's user avatar
  • 205
1 vote
0 answers
206 views

Suppose we have a neural network $f_\theta(x)$, where $x$ is the input and $\theta$ is the network's parameters. For each $\theta$, we can minimize $f_\theta(x)$ w.r.t. $x$ and obtain the minimum ...
Mingzhou Liu's user avatar

15 30 50 per page