I have a question about the explanation of universal approximation theorem provided by wikipedia. https://en.wikipedia.org/wiki/Universal_approximation_theorem#cite_note-:0-29
It says, after a universal approximation of one layer that approximates $f$ with $g$ with \begin{align} \sup_{x\in K} ||f(x) - g(x)|| < \epsilon\\ \text{where} \quad g(x) = C \cdot [\sigma \circ (A \cdot x + b)],\\ \end{align}
that
an $f$ can also be approximated by a network of greater depth by using the same construction for the first layer and approximating the identity function with later layers.
I cannot understand how I should interpret this. After the first layer, it outputs $g$ which is an approximation of $f$ right? Why is identity function following here? Doesn’t it keep outputting the same $g$ without its improving quality?