Explaining Directional derivatives

Question

I'm trying to understand the concept of the directional derivative, from the perspective of my multivariable calculus textbook. I've typed out a summary of the explanation, with the questions I couldn't answer in boldface. Any intuitive answers, geometrical answers, physical answers are welcome. Formal, rigorous answers are also welcome. Partial explanations (answering only one of the questions etc) are also very welcome!

Consider the problem of calculating the rate of change of $\phi$ in some particular direction. For an infinitesimal vector displacement $d \mathbf{r},$ forming its scalar product with $\nabla \phi$ we obtain $$ \begin{aligned} \nabla \phi \cdot d \mathbf{r} &=\left(\mathbf{i} \frac{\partial \phi}{\partial x}+\mathbf{j} \frac{\partial \phi}{\partial y}+\mathbf{k} \frac{\partial \phi}{\partial z}\right) \cdot(\mathbf{i} d x+\mathbf{j} d y+\mathbf{k} d x) \\ &=\frac{\partial \phi}{\partial x} d x+\frac{\partial \phi}{\partial y} d y+\frac{\partial \phi}{\partial z} d z \\ &=d \phi \end{aligned} $$ which is the infinitesimal change in $\phi$ in going from position $\mathbf{r}$ to $\mathbf{r}+d \mathbf{r} .$ In particular, if $\mathbf{r}$ depends on some parameter $u$ such that $\mathbf{r}(u)$ defines a space curve then the total derivative of $\phi$ with respect to $u$ along the curve is simply $$ \frac{d \phi}{d u}=\nabla \phi \cdot \frac{d \mathbf{r}}{d u}. $$ Question 1: How did we get this? Should I just divide both sides of $\nabla \phi \cdot d \mathbf{r} = d\phi$ by $du$? I don't even know if that's a valid operation. In the particular case where the parameter $u$ is the arc length $s$ along the curve, the total derivative of $\phi$ with respect to $s$ along the curve is given by $$ \frac{d \phi}{d s}=\nabla \phi \cdot \hat{\mathbf{t}} $$ where $\hat{\mathbf{t}}$ is the unit tangent to the curve at the given point. Question 2: Then why isn't $\frac{d \phi}{d s} = 0$? Surely $\nabla \phi$ is perpendicular/tangent to the surface of $\phi$, so it will be perpendicular to $\hat{\mathbf{t}}$! In general, the rate of change of $\phi$ with respect to the distance $s$ in a particular direction a is given by $$ \frac{d \phi}{d s}=\nabla \phi \cdot \hat{\mathbf{a}} $$ (Question 3: (most burning question) I have no idea how to obtain/understand, the above result/why the above result holds. Also, am I to think $\nabla \phi \cdot \hat{\mathbf{a}} = \nabla \phi \cdot \hat{\mathbf{t}}?$) and is called the directional derivative. Since $\hat{\mathbf{a}}$ is a unit vector we have $$ \frac{d \phi}{d s}=|\nabla \phi| \cos \theta $$ where $\theta$ is the angle between $\hat{\mathbf{a}}$ and $\nabla \phi$. Clearly $\nabla \phi$ lies in the direction of the fastest increase in $\phi$ and $|\nabla \phi|$ is the largest possible value of $d \phi / d s$. Question 4: I get that the largest possible value of $d \phi / d s$ is when $\theta = 0$, which is the direction of $\nabla \phi$, but why does largest $\frac{d \phi}{d s}$ imply direction of fastest increase of $\phi$?

Vercassivelaunos · Accepted Answer · 2020-09-03 11:56:18Z

I think the best way to understand the formulae for the directional derivative is to understand the total derivative, which is the "best" generalization of the derivative in single variable calculus. A function $\varphi:\mathbb R^n\to \mathbb R^m$ is called totally differentiable in $x_0$ if there is a linear map $L:\mathbb R^n\to \mathbb R^m$ such that $f(x)\approx f(x_0)+L(x-x_0)$. The specific definition of $\approx$ isn't too important right now.

This linear map $L$ is called the (total) differential of $f$ at $x_0$. Most of the important concepts in multivariable calculus boil down to the total differential. The Jacobian of a function is the matrix representation of the total differential. The transpose of the gradient, too. And in single variable calculus, the matrix representation would have just one single entry, which is the 1d derivative. Now for unambiguous notation, we write the total differential of $f$ at $x_0$ as $\mathrm Df(x_0)$. We will need this notation to generalize the chain rule: if $f$ and $g$ are differentiable functions, then $f\circ g$ is also differentiable and it holds that

$$\mathrm D(f\circ g)(x)=\mathrm Df(g(x))\mathrm Dg(x).$$

Replace $\mathrm Df=f'$ and $\mathrm Dg=g'$ to obtain the 1d chain rule. Now all of your formulae are applications of this generalized chain rule. The directional derivative of $\varphi$ along the path $\mathbf r$ is the derivative of $\varphi\circ\mathbf r$, that is

$$\mathrm D(\varphi\circ r)=\mathrm D\varphi(\mathbf r)\mathrm D\mathbf r.$$

With $\mathrm D\varphi=\nabla\varphi$ and $\mathrm D\mathbf r=\partial_u\mathbf r$ you obtain all your formulae. Just choose an appropriate parametrization of the path $r$.

Now to your questions.

Question 1: You get this by the chain rule as mentioned above.

Question 2: $\mathbf t$ is tangent to the path, but how the path lies relative to the equipotential surfaces of $\varphi$ is mentioned nowhere. It could be tangent, in which case the directional derivative would in fact be $0$. But it doesn't have to be.

Question 3: You choose the path $\mathbf r(u)=\mathbf x_0+u\mathbf a$, and then $\partial_u\mathbf r=\mathbf a$. The rest is the chain rule. And yes, for this specific path, the tangent vector $\mathbf t$ is exactly $\mathbf a$.

Question 4: That's what the directional derivative is: a measure for the rate of change in a certain direction. According to the formula, it is largest in the direction of the gradient vector, because then $\cos\theta=1$. So the gradient vector points in the direction of fastest increase.

I know comments aren't supposed to be used for saying things like this, but this is an amazing answer! Thanks! — Albert
– Albert, Commented Sep 3, 2020 at 22:42

Stack Exchange Network

Explaining Directional derivatives

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Explaining Directional derivatives

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions