0
$\begingroup$

Let $f:\mathbb R^d \rightarrow \mathbb R$ be a convex function and $A \subset \mathbb R^d$ a convex set. We are interested in finding the minimum of $f$ over $A$. We have the gradient of $f$ and we know that the global unique minimum of $f$ is outside of $A$.

Suppose we decide to do gradient descent to find that minimum. We start in a point of $A$ and follow the opposite of the gradient at each step (while decreasing the learning rate). At some point, this will take us outside of $A$ since the global minimum is outside of $A$. What modification to gradient descent must we apply to guarantee that we stay in $A$ while converging to the minimum in $A$?

What I tried: every time we go outside of $A$, we project that point back on $A$ using the Euclidean norm, and continue the algorithm from that point. Unfortunately, this seems to not converge to the minimum in $A$, but instead to the projection of the global minimum onto $A$. This is not necessarily the minimum in $A$.

$\endgroup$

1 Answer 1

1
$\begingroup$

You can apply a Gradient Descent for Constrained Optimization. It is exactly what you are doing but you need to change $2$ things:

  • At the end, take the average of all your projections.

  • Choose wisely your learning rate.


Let $\varepsilon > 0$, $T = 4\displaystyle\frac{\mathbf{diam}(\mathcal A)^2\left\|\nabla f\right\|_{\infty}^2}{\varepsilon^2}$ and $\eta = \displaystyle\frac{\mathbf{diam}(\mathcal A)}{\left\|\nabla f\right\|_{\infty}\sqrt{T}}$. Using $x^{(0)}$ a starting point of $\mathcal A$, for $t = 1,\ldots, T$

\begin{align} y^{(t)} &\gets x^{(t-1)} - \eta \nabla f\left(x^{(t-1)}\right)\\ x^{(t)} &\gets \texttt{Proj}_{\mathcal A} \left(y^{(t)}\right) \end{align}

At the end output $$z = \frac1T \sum_{t=1}^T x^{(t)} \in \mathcal A$$

Let $z^*$ be a minimizer of $f$ on $\mathcal A$.

\begin{align} \left\|x^{(t)} - z^*\right\|^2 & \le \left\|y^{(t)} - z^*\right\|^2\\ &= \left\|x^{(t-1)} - z^* - \eta\nabla f\left(x^{(t-1)}\right)\right\|^2\\ &= \left\|x^{(t-1)} - z^*\right\|^2 + \eta^2 \left\|\nabla f\left(x^{(t-1)}\right)\right\|^2 - 2\eta\left\langle \nabla f\left(x^{(t-1)}\right),\, x^{(t-1)} - z^*\right\rangle \end{align}

So:

\begin{align} \left\langle \nabla f\left(x^{(t-1)}\right),\, x^{(t-1)} - z^*\right\rangle &\le \frac1{2\eta}\left(\left\|x^{(t-1)} - z^*\right\|^2 - \left\|x^{(t)} - z^*\right\|^2\right) + \frac\eta 2\left\|\nabla f\right\|^2_{\infty} \end{align}

Using the convexity of $f$ you have:

$$f\left(x^{(t-1)}\right) - f\left(z^*\right) \le \left\langle \nabla f\left(x^{(t-1)}\right),\, x^{(t-1)} - z^*\right\rangle$$

Summing and using the convexity of $f$ again,

\begin{align} f(z) - f\left(z^*\right) &= f\left(\frac1T\sum_{t=1}^Tx^{(t)}\right) -f\left(z^*\right)\\ &\le \frac1T\left(\sum_{t=1}^T \left(f\left(x^{(t)}\right) - f\left(z^*\right)\right)\right)\\ &\le \frac{1}{2\eta T}\left(\left\|x^{(0)} - z^*\right\|^2\right) +\frac12\eta\left\|f\right\|_{\infty}^2 \end{align}

Use the value of $\eta$ and $T$ to have:

$$f(z) - f\left(z^*\right) \le \varepsilon$$

$\endgroup$
6
  • $\begingroup$ Is finding the max of the gradient supposed to be easier than finding the min of the function? Where do the $\eta$'s come from in your last step? Also, I believe there is a missing gradient in the last expression (sup norm of the gradient, not $f$) $\endgroup$ Commented May 12, 2023 at 12:22
  • $\begingroup$ If you know the expression of $f$, you don't need exactly to compute the max of the gradient you only need an upperbound. The etas in my last step come from the inequality that I established on the dot product of the gradient and the difference of $x^t$ with $z^*$ $\endgroup$ Commented May 12, 2023 at 13:03
  • $\begingroup$ I know the global minimum of $f$ and I know that $f$ is negative so upper bounded by $0$. Can I use any of these bounds? I do not know any bounds on the gradient. $\endgroup$ Commented May 12, 2023 at 13:42
  • $\begingroup$ I am not sure if it will be useful. However you can update your question with your specific function and we can see if there is a way to use the informations that you have. $\endgroup$ Commented May 12, 2023 at 15:24
  • $\begingroup$ The function is a horrible thing with many embedded integrals, trust me you don't want to see that. I am trying to understand why we choose the average projection instead of the best projection, any insight? $\endgroup$ Commented May 14, 2023 at 15:56

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.