Suppose we wish to minimize a function $f(\vec{x})$ via the gradient descent algorithm \begin{equation} \vec{x}_{n+1} = \vec{x}_n - \eta \vec{\nabla}f(\vec{x}_n) \end{equation} starting from some initial guess $\vec{x}_0$. This is a well-studied problem.
Now suppose that near its minimum the "landscape" represented by the function $f(\vec{x})$ has directions with a very high gradient as well as directions with a very low gradient. That is, $f(\vec{x})$ is nearly flat along certain directions near its minimum. As a toy example, I propose the problem of minimizing the function \begin{equation} f(x,y) = \frac{1}{2}\left(x^2 + \epsilon y^2\right) \qquad\qquad (1) \end{equation} for some very small value of $\epsilon$ such as $10^{-20}$. Although this has a unique minimum at $(x,y) = (0,0)$, it is nearly flat in the $y$ direction. Most gradient descent algorithms have as a stopping condition a criterion that either $\vec{x}$ or $f$ change by an amount less than some set threshold during a given step. Under such a criterion, minimization of (1) starting from some arbitrary initial guess will result in $x$ quickly zooming in on $0$, while $y$ changes barely at all from step to step. The algorithm would most likely trigger its stopping condition before $y$ approaches $0$, resulting in an essentially random final value for $y$.
Are there variants of the gradient descent algorithm designed to handle the case where there is large variation in the magnitude of the gradient in different directions near a minimum? Or am I overthinking this, and the solution is to simply choose a very, very small value as a threshold in the stopping criterion?