1
$\begingroup$

I have a question about the explanation of universal approximation theorem provided by wikipedia. https://en.wikipedia.org/wiki/Universal_approximation_theorem#cite_note-:0-29

It says, after a universal approximation of one layer that approximates $f$ with $g$ with \begin{align} \sup_{x\in K} ||f(x) - g(x)|| < \epsilon\\ \text{where} \quad g(x) = C \cdot [\sigma \circ (A \cdot x + b)],\\ \end{align}
that

an $f$ can also be approximated by a network of greater depth by using the same construction for the first layer and approximating the identity function with later layers.

I cannot understand how I should interpret this. After the first layer, it outputs $g$ which is an approximation of $f$ right? Why is identity function following here? Doesn’t it keep outputting the same $g$ without its improving quality?

$\endgroup$

1 Answer 1

4
$\begingroup$

This says that if you can approximate a function with one layer, you can also approximate it with multiple layers because you can make the extra layers do nothing.

The universal approximation theorem tells you there is at least one way to approximate any function - which is the way that puts exponentially many nodes into one layer. It doesn't tell you whether there's a more efficient way using more layers.

$\endgroup$
0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.