Why do we need the identify function when approximating a function with a neural network with multiple layers?

Question

I have a question about the explanation of universal approximation theorem provided by wikipedia. https://en.wikipedia.org/wiki/Universal_approximation_theorem#cite_note-:0-29

It says, after a universal approximation of one layer that approximates $f$ with $g$ with \begin{align} \sup_{x\in K} ||f(x) - g(x)|| < \epsilon\\ \text{where} \quad g(x) = C \cdot [\sigma \circ (A \cdot x + b)],\\ \end{align}
that

an $f$ can also be approximated by a network of greater depth by using the same construction for the first layer and approximating the identity function with later layers.

I cannot understand how I should interpret this. After the first layer, it outputs $g$ which is an approximation of $f$ right? Why is identity function following here? Doesn’t it keep outputting the same $g$ without its improving quality?

Stack Exchange Broke The Law · Accepted Answer · 2023-04-24 22:02:20Z

This says that if you can approximate a function with one layer, you can also approximate it with multiple layers because you can make the extra layers do nothing.

The universal approximation theorem tells you there is at least one way to approximate any function - which is the way that puts exponentially many nodes into one layer. It doesn't tell you whether there's a more efficient way using more layers.

Stack Exchange Network

Why do we need the identify function when approximating a function with a neural network with multiple layers?

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Why do we need the identify function when approximating a function with a neural network with multiple layers?

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions