I understand why the gradient gives the greatest directional derivative, as a dot product between the direction and the gradient. But I usually see the gradient called as the direction of steepest ascent. Why does it specifically indicate ascent, as opposed to for instance steepest descent?
- 2$\begingroup$ If the directional derivative in the direction $u$ is positive, then $u$ is a direction of ascent. Moving a short distance in the direction $u$ increases the value of your function. $\endgroup$littleO– littleO2023-04-01 01:23:32 +00:00Commented Apr 1, 2023 at 1:23
- 2$\begingroup$ The negative of the gradient gives the direction of steepest descent. $\endgroup$Ted Shifrin– Ted Shifrin2023-04-01 01:38:29 +00:00Commented Apr 1, 2023 at 1:38
- $\begingroup$ It's really the direction of fastest increase, which is described as "steepest ascent" because we tend to plot functions with the function value on a "vertical" axis, so when you follow the gradient it looks like you're going uphill; also because people tend to use "higher" as a synonym for "greater". $\endgroup$David K– David K2023-04-01 19:52:40 +00:00Commented Apr 1, 2023 at 19:52
1 Answer
Locally, a differentiable function $f:\mathbb{R}^n\to\mathbb{R}$ can be expressed as a Taylor series $$ f(x+hu) = f(x) +h(\nabla f(x)\cdot u) + O(h^2), $$ where $u $ is a unit vector in $\mathbb{R}^n$. For sufficiently small $h$, if one wishes to maximize $f(x+hu)$, then one must maximize $\nabla f(x)\cdot u$. This maximum is acheived exactly at $u = \nabla f(x) / \lvert\lvert \nabla f(x)\rvert\vert$.
If $\nabla f(x)\neq 0$, this direction indicates ascent as opposed to descent because $$ \nabla f(x) \cdot \frac{\nabla f(x)} {\lvert\lvert \nabla f(x)\rvert\vert} = \lvert\lvert\nabla f(x)\rvert\vert > 0, $$ whereas $$ \nabla f(x) \cdot \left(-\frac{\nabla f(x)} {\lvert\lvert \nabla f(x)\rvert\vert}\right) = -\lvert\lvert\nabla f(x)\rvert\vert < 0 $$