I'm teaching myself some higher dimensional calculus, and I am currently stuck on the definition of the total derivative. The book I'm using is Principles of Mathematical Analysis by Walter Rudin, and he defines the derivative as
Suppose $E$ is an open set in $\mathbb{R}^n$, $f$ maps $E$ into $\mathbb{R}^m$, and $x \in E$. If there exists a linear transformation $A$ of $\mathbb{R}^n$ into $\mathbb{R}^m$ such that $$ \lim_{h \to 0} \frac{|f(x + h) - f(x) - Ah|}{|h|} = 0, $$
then we say that $f$ is differentiable at $x$ and we write
$$ f'(x) = A. $$
My confusion stems from the $Ah$ term for two reasons: other texts define this term as $L(v)$ for some linear function $L$, and because Rudin also writes that "since $A \in L(\mathbb{R}^n, \mathbb{R}^m), Ah \in \mathbb{R}^m$" which both indicate that $A$ is taking $h$ as an argument. This doesn't make sense to me, for suppose that $f$ were to also be linear, then the limit implies that the linear function $A$ is $f$ itself:
If we write $A(h)$ to emphasise that $A$ is a function of $h$, then by the linearity of $f$ the numerator of the limit becomes $$ f(x + h) - f(x) - A(h) = f(h) - A(h) $$ so that the limit is zero precisely when $A = f$. However, the derivative of a linear function shouldn't be the same linear function! What is the correct way to understand this definition? And how might I go about finding the derivative of an arbitrary function? For example, what is the derivative of the function $$ g : \mathbb{R}^2 \to \mathbb{R}, g(x, y) = xy $$ according to this definition?