Consider $f:\mathbb{R}^d\to\mathbb{R}$ and $g:\mathbb{R}\to\mathbb{R}^d$.
It is known that $$ \tag{*} (f\circ g)'(x) = \sum_{i=1}^d \partial_i f(g(x)) \cdot g'(x)^i $$
I would like to prove this from the chain rule for the total derivatives:
$$ \tag{**} D_x(f\circ g) = D_{g(x)}f \circ D_xg $$
I'm not sure how to rigorously proceed here. Intuitively I know the two total differentials in the total derivative chain rule can be represented as matrices and their composition will correspond to the multiplication in the desired partial derivative chain rule. But I'm not sure how to get there. How does the function composition get converted into a summation + multiplication?
Another related issue. The expression $(*)$ is a real number when evaluated at $x$. The expression $(**)$ is a linear map from $\mathbb{R}\to\mathbb{R}$. It's not to difficult to understand that the linear maps $\mathbb{R}\to\mathbb{R}$ (the dual space of $\mathbb{R}$) are isomorphic to $\mathbb{R}$ itself. But still, it's providing a little bit of a technical stumbling block for getting my desired result.
A note, the derivative language may even be overkill for my understanding. If instead we had linear functions $S:\mathbb{R}^d \to \mathbb{R}$ and $T:\mathbb{R}\to\mathbb{R}^d$ we can represent the composition $S\circ T$ by
$$ \sum_i S_iT^i $$
Where $S_i$ and $T^i$ are somehow the matrix components of these linear transformations. But my same questions remain, how is this correspondence made rigorous? How do we pass from function composition to multiplication and summation, and from a function from $\mathbb{R}\to\mathbb{R}$ to a number in $\mathbb{R}$?
Note, I suspect I'm looking for an answer involving inserting projection matrices somewhere, or resolutions of the identity. But I can't figure out exactly what I need...