Gradient descent over a restricted convex domain — how do we guarantee that we stay in the domain if the global minimizer is outside of it?

Question

Let $f:\mathbb R^d \rightarrow \mathbb R$ be a convex function and $A \subset \mathbb R^d$ a convex set. We are interested in finding the minimum of $f$ over $A$. We have the gradient of $f$ and we know that the global unique minimum of $f$ is outside of $A$.

Suppose we decide to do gradient descent to find that minimum. We start in a point of $A$ and follow the opposite of the gradient at each step (while decreasing the learning rate). At some point, this will take us outside of $A$ since the global minimum is outside of $A$. What modification to gradient descent must we apply to guarantee that we stay in $A$ while converging to the minimum in $A$?

What I tried: every time we go outside of $A$, we project that point back on $A$ using the Euclidean norm, and continue the algorithm from that point. Unfortunately, this seems to not converge to the minimum in $A$, but instead to the projection of the global minimum onto $A$. This is not necessarily the minimum in $A$.

Kroki · Accepted Answer · 2023-05-12 01:58:55Z

You can apply a Gradient Descent for Constrained Optimization. It is exactly what you are doing but you need to change $2$ things:

At the end, take the average of all your projections.
Choose wisely your learning rate.

Let $\varepsilon > 0$, $T = 4\displaystyle\frac{\mathbf{diam}(\mathcal A)^2\left\|\nabla f\right\|_{\infty}^2}{\varepsilon^2}$ and $\eta = \displaystyle\frac{\mathbf{diam}(\mathcal A)}{\left\|\nabla f\right\|_{\infty}\sqrt{T}}$. Using $x^{(0)}$ a starting point of $\mathcal A$, for $t = 1,\ldots, T$

\begin{align} y^{(t)} &\gets x^{(t-1)} - \eta \nabla f\left(x^{(t-1)}\right)\\ x^{(t)} &\gets \texttt{Proj}_{\mathcal A} \left(y^{(t)}\right) \end{align}

At the end output $$z = \frac1T \sum_{t=1}^T x^{(t)} \in \mathcal A$$

Let $z^*$ be a minimizer of $f$ on $\mathcal A$.

\begin{align} \left\|x^{(t)} - z^*\right\|^2 & \le \left\|y^{(t)} - z^*\right\|^2\\ &= \left\|x^{(t-1)} - z^* - \eta\nabla f\left(x^{(t-1)}\right)\right\|^2\\ &= \left\|x^{(t-1)} - z^*\right\|^2 + \eta^2 \left\|\nabla f\left(x^{(t-1)}\right)\right\|^2 - 2\eta\left\langle \nabla f\left(x^{(t-1)}\right),\, x^{(t-1)} - z^*\right\rangle \end{align}

So:

\begin{align} \left\langle \nabla f\left(x^{(t-1)}\right),\, x^{(t-1)} - z^*\right\rangle &\le \frac1{2\eta}\left(\left\|x^{(t-1)} - z^*\right\|^2 - \left\|x^{(t)} - z^*\right\|^2\right) + \frac\eta 2\left\|\nabla f\right\|^2_{\infty} \end{align}

Using the convexity of $f$ you have:

$$f\left(x^{(t-1)}\right) - f\left(z^*\right) \le \left\langle \nabla f\left(x^{(t-1)}\right),\, x^{(t-1)} - z^*\right\rangle$$

Summing and using the convexity of $f$ again,

\begin{align} f(z) - f\left(z^*\right) &= f\left(\frac1T\sum_{t=1}^Tx^{(t)}\right) -f\left(z^*\right)\\ &\le \frac1T\left(\sum_{t=1}^T \left(f\left(x^{(t)}\right) - f\left(z^*\right)\right)\right)\\ &\le \frac{1}{2\eta T}\left(\left\|x^{(0)} - z^*\right\|^2\right) +\frac12\eta\left\|f\right\|_{\infty}^2 \end{align}

Use the value of $\eta$ and $T$ to have:

$$f(z) - f\left(z^*\right) \le \varepsilon$$

Is finding the max of the gradient supposed to be easier than finding the min of the function? Where do the $\eta$'s come from in your last step? Also, I believe there is a missing gradient in the last expression (sup norm of the gradient, not $f$) — W. Volante
– W. Volante, Commented May 12, 2023 at 12:22
If you know the expression of $f$, you don't need exactly to compute the max of the gradient you only need an upperbound. The etas in my last step come from the inequality that I established on the dot product of the gradient and the difference of $x^t$ with $z^*$ — Kroki
– Kroki, Commented May 12, 2023 at 13:03
I know the global minimum of $f$ and I know that $f$ is negative so upper bounded by $0$. Can I use any of these bounds? I do not know any bounds on the gradient. — W. Volante
– W. Volante, Commented May 12, 2023 at 13:42
I am not sure if it will be useful. However you can update your question with your specific function and we can see if there is a way to use the informations that you have. — Kroki
– Kroki, Commented May 12, 2023 at 15:24
The function is a horrible thing with many embedded integrals, trust me you don't want to see that. I am trying to understand why we choose the average projection instead of the best projection, any insight? — W. Volante
– W. Volante, Commented May 14, 2023 at 15:56

Stack Exchange Network

Gradient descent over a restricted convex domain — how do we guarantee that we stay in the domain if the global minimizer is outside of it?

1 Answer 1

$$f(z) - f\left(z^*\right) \le \varepsilon$$

You must log in to answer this question.

Hot Network Questions

Gradient descent over a restricted convex domain — how do we guarantee that we stay in the domain if the global minimizer is outside of it?

1 Answer 1

$$f(z) - f\left(z^*\right) \le \varepsilon$$

You must log in to answer this question.

Related

Hot Network Questions