1
$\begingroup$

In gradient descent for neural networks, we optimize over a loss surface defined by our loss function L(W) where W represents the network weights. However, since there are infinitely many possible weight configurations, we can never compute or store the complete geometric surface of this loss function.

This raises a question: What exactly are we optimizing over if we only ever compute point-wise evaluations of the loss? How can we meaningfully talk about descending a surface that we never fully construct?

I understand that at each step we can:

  1. Compute the loss at our current weights
  2. Compute the gradient at that point
  3. Take a step in the direction of steepest descent

But I'm struggling to understand the geometric/mathematical meaning of optimizing over an implicit surface that we never fully realize. What is the theoretical foundation for this?

$\endgroup$

1 Answer 1

1
$\begingroup$

We're not actually seeing or storing the whole loss surface which might be an infinitely complex landscape, instead, gradient descent relies only on the local geometry of the loss function. By iteratively updating $W$ using these local linear approximations, gradient descent “descends” the surface without ever needing to construct it globally and explicitly. This idea is rooted in fundamental calculus and optimization theory, where we optimize a function by following its local first-order derivatives rather than by examining the entire function at once.

Of course in this way it may stuck in many possibly shallow suboptimal local minimums or saddle points and that's the case for many deep learning models with complex non-convex loss function $L(W)$, so we usually use stochastic gradient descent and many other optimization techniques to mitigate this issue. In theory only for convex function gradient descent can ensure to reach global minimum.

$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.