Higher dimensional total derivative

Question

I'm teaching myself some higher dimensional calculus, and I am currently stuck on the definition of the total derivative. The book I'm using is Principles of Mathematical Analysis by Walter Rudin, and he defines the derivative as

Suppose $E$ is an open set in $\mathbb{R}^n$, $f$ maps $E$ into $\mathbb{R}^m$, and $x \in E$. If there exists a linear transformation $A$ of $\mathbb{R}^n$ into $\mathbb{R}^m$ such that $$ \lim_{h \to 0} \frac{|f(x + h) - f(x) - Ah|}{|h|} = 0, $$

then we say that $f$ is differentiable at $x$ and we write

$$ f'(x) = A. $$

My confusion stems from the $Ah$ term for two reasons: other texts define this term as $L(v)$ for some linear function $L$, and because Rudin also writes that "since $A \in L(\mathbb{R}^n, \mathbb{R}^m), Ah \in \mathbb{R}^m$" which both indicate that $A$ is taking $h$ as an argument. This doesn't make sense to me, for suppose that $f$ were to also be linear, then the limit implies that the linear function $A$ is $f$ itself:

If we write $A(h)$ to emphasise that $A$ is a function of $h$, then by the linearity of $f$ the numerator of the limit becomes $$ f(x + h) - f(x) - A(h) = f(h) - A(h) $$ so that the limit is zero precisely when $A = f$. However, the derivative of a linear function shouldn't be the same linear function! What is the correct way to understand this definition? And how might I go about finding the derivative of an arbitrary function? For example, what is the derivative of the function $$ g : \mathbb{R}^2 \to \mathbb{R}, g(x, y) = xy $$ according to this definition?

In this interpretation of the derivative, the derivative of the linear function $x \mapsto A x$ is that same function, but note that this statement implicitly uses the canonical identification of a vector space $V$ with the tangent space $T_x V$ (for any fixed element $x \in V$). Informally, the derivative of $f$ at $x$ is the linear map that best approximates $f$ at that point, so if $f$ is already linear, it is its own best linear approximation. — Travis Willse
– Travis Willse, Commented Oct 22, 2017 at 14:29
In regards to your comment "however, the derivative of a linear function shouldn't be the same linear function", based on this definition, what is the derivative of the map $\mathbb R\to\mathbb R$, with $x\mapsto 5x$? — Aweygan
– Aweygan, Commented Oct 22, 2017 at 14:29
To directly answer your question about $g$: its derivative at $(x,y)$ is the linear map $A_{x,y}: \mathbb{R}^{2} \to \mathbb{R}$ given by the matrix $\begin{pmatrix}y \\x\end{pmatrix}$ — preferred_anon
– preferred_anon, Commented Aug 22, 2018 at 13:39

Cronus · Accepted Answer · 2018-08-22 13:44:50Z

It seems to me there are two things which confuse you:

$A$ is considered occasionally as a matrix and occasionally as a linear function. As a function, it assign to a vector $h$ in $\mathbb{R}^n$ the vector $A\cdot v$ in $\Bbb{R}^m$ (where in the term "$A\cdot v$" we think of it as a matrix and of the multiplication as matrix multiplication).
The total derivative is not (exactly) the same as the ordinary derivative. Like you said, it is usually considered as a linear function rather than a matrix. The total derivative of a linear function $f$ is $f$ itself. However, as you've noticed, we can also think of it as a matrix. More on that below.

If $f$ is a function from $\mathbb{R}^n$ to $\Bbb{R}^m$, its total derivative, which I prefer to denote by $Df$, gives you - at any given point in $\Bbb{R}^n$ - a linear function from $\Bbb{R}^n$ to $\Bbb{R}^m$. So if $f:\Bbb{R}\to\Bbb{R}$ is an ordinary real function, $Df(x)$ is a linear function from $\Bbb{R}$ to $\Bbb{R}$ for any $x\in \Bbb{R}$. But this linear function is very much related to the value of the ordinary derivative of $f$ as in ordinary calculus. If we denote by $f'$ the usual derivative of $f$, one can see that $Df(x)$ is in fact the linear function $t\mapsto f'(x)t$.

In fact, both of these point amount to the same thing - identifying a linear function from $\mathbb{R}^n$ to $\Bbb{R}^m$ with an $n\times m$ real matrix. For any linear function $f$ from $\mathbb{R}^n$ to $\Bbb{R}^m$ there is precisely one $n\times m$ real matrix $A$ such that $$f(v)=A\cdot v$$, and one very often denotes the function $f$ by $A$ as well and identifies these two objects. This explains the first point which confused you, but also the second: if $f:\Bbb{R}\to\Bbb{R}$ is a real function, then the matrix associated to its total derivative at a point $x$ is a $1\times 1$ real matrix whose one entry equals precisely to $f'(x)$ (the ordinary derivative of $f$ at $x$).

I hope this clears it up a bit... I didn't prove everything I wrote. The things I didn't prove can be good exercises, though.

I am not sure if Rudin is the best book for self study. I think Analysis on Manifolds by Munkres is a lot better: he explains everything thoroughly, and probably explains in more detail everything I wrote above. If you don't have a teacher (and maybe even if you do), it's best to study from a book like that.

Paul Frost · Accepted Answer · 2018-08-22 13:48:34Z

Quotation from Rudin's book "9.4 Definitions":

Note that one often writes $Ax$ instead of $A(x)$ if $A$ is linear.

This means your interpretation that $A$ is taking $h$ as an argument is correct.

Given a function $f : E \to \mathbb{R}^m$, where $E \subset \mathbb{R}^n$ is open, the derivative of $f$ at $x \in E$ (if it exists) is a linear transformation $f'(x) \in L(\mathbb{R}^n,\mathbb{R}^m)$. It may be understood as the best linear approximation of $f$ at $x$. What does this mean? A linear approximation of $f$ at $x$ is a function having the form $l_A(x') = f(x) + A(x' - x)$ with a linear $A$. Obviuosly we have $f(x) = l_A(x)$. To be a linear approximation of $f$ we require that the absolute error $(f(x') - l_A(x')) \to 0$ as $x' \to x$. It is easy to see that $f$ has a linear approximation at $x$ if and only if $f$ is continuous at $x$. In this case any $A$ will do. However, linear approximations in this sense are in general poor. We can do better if we require that the relative error

$$\frac{|f(x') - l_A(x'))|}{|x'-x|} \to 0$$

as $x' \to x$. If this is satisfied, we get a really good linear approximation of $f$ by $l_A$, in fact the best possible one. This means that if $B \ne A$, then $l_B$ does not have the above relative error property.

The derivative of a linear transformation $A : \mathbb{R}^n \to \mathbb{R}^m$ is in fact $A'(x) = A$ at any $x \in \mathbb{R}^n$. This is not at all surprising if you think about the best linear approximation of $A$.

To compute the derivative of a function $f$ (without guessing it), you need additional methods. In Rudin's book you will find them in "9.16 Partial derivatives".

However, for your function $g$ you can easily verify that $g'(x,y)$ as given in Daniel Littlewood's comment is the derivative of $g$ at $(x,y)$

user403337 · Accepted Answer · 2018-08-22 19:14:46Z

At each $x\in\mathbb R^n$ we get $df_x: \mathbb R^n\to \mathbb R^m$ given by the Jacobian matrix of first partial derivatives: $df_x=(a_{ij})$, where $a_{ij}=\frac{\partial f_i}{\partial x_j}$.

In your example, ($g:\mathbb R^2\to\mathbb R$ by $g(x,y)=xy$), we have $dg_{(x,y)}=\begin{pmatrix}g_x\\g_y\end{pmatrix}=\begin{pmatrix}y\\x\end{pmatrix}$.

It is well known (and easy to prove) that the derivative of a linear transformation is itself. (The derivative is the best linear approximation of a function, so in the case of a linear transformation, the approximation is exact.)

Finally, for the total derivative, this is neatly done from the point of view of differential geometry. For $Tf$ (which is in fact a functor) we have a commutative diagram: $$ \require{AMScd}\begin{CD}TM @>Tf>>TN\\ @VVV@VVV\\ M@>f>>N \end{CD} $$, where $M$ and $N$ are manifolds. As a reference, i would recommend Spivak's Comprehensive Introduction to Differential Geometry.

user26872 · Accepted Answer · 2018-08-24 13:45:46Z

Consider one-dimensional calculus. A linear function is of the form $f(x) = a x$. We can think of $a$ as the linear operator defined by $a(x) = a x$. Thus, $f(x) = a(x)$ or $f = a$. Note that $f'(x) = a = f$, that is, the derivative of $f$ evaluated at $x$ is the (bare) linear operator $a=f$, but that $f'(x) \ne f(x)$.

Stack Exchange Network

Higher dimensional total derivative

4 Answers 4

You must log in to answer this question.

Linked

Hot Network Questions

Higher dimensional total derivative

4 Answers 4

You must log in to answer this question.

Linked

Related

Hot Network Questions