What determines what $\alpha$ should be? Let's Taylor-expand the function around our trial inputs, and write $$ f(\mathbf{r}) \approx f(\mathbf{r}_0) + (\mathbf{r}-\mathbf{r}_0)^\dagger\nabla f(\mathbf{r}_0)+\frac{1}{2}(\mathbf{r}-\mathbf{r}_0)^\dagger\mathrm{B}.(\mathbf{r}-\mathbf{r}_0), $$$$ f(\mathbf{r}) \approx f(\mathbf{r}_0) + (\mathbf{r}-\mathbf{r}_0)^\dagger\nabla f(\mathbf{r}_0)+\frac{1}{2}(\mathbf{r}-\mathbf{r}_0)^\dagger\mathrm{B}.(\mathbf{r}-\mathbf{r}_0),\tag{2} $$ where $\mathrm{B}$ is the Hessian matrix, which is the matrix of second derivatives, $$ \mathrm{B}_{ij}=\left.\frac{\partial^2 f}{\partial r_i \partial r_j}\right\vert_{\mathbf{r}=\mathbf{r}_0}. $$$$ \mathrm{B}_{ij}=\left.\frac{\partial^2 f}{\partial r_i \partial r_j}\right\vert_{\mathbf{r}=\mathbf{r}_0}.\tag{3} $$ At the minimum, $\mathbf{r}=\mathbf{r}_\mathrm{opt}$, we know that the gradient should be zero - so let's differentiate the Taylor expansion to get the gradient expression: $$ \nabla f(\mathbf{r}) \approx \nabla f(\mathbf{r}_0)+\mathrm{B}.(\mathbf{r}-\mathbf{r}_0), $$$$ \nabla f(\mathbf{r}) \approx \nabla f(\mathbf{r}_0)+\mathrm{B}.(\mathbf{r}-\mathbf{r}_0),\tag{4} $$ Now we set this to zero, and rearrange
$$ \begin{array}{lcl} & \nabla f(\mathbf{r}_0)+\mathrm{B}.(\mathbf{r}_\mathrm{opt}-\mathbf{r}_0) &= 0\\ \Rightarrow & \mathrm{B}.(\mathbf{r}_\mathrm{opt}-\mathbf{r}_0) &= -\nabla f(\mathbf{r}_0)\\ \Rightarrow & (\mathbf{r}_\mathrm{opt}-\mathbf{r}_0) &= -\mathrm{B}^{-1}\nabla f(\mathbf{r}_0)\\ \Rightarrow & \mathbf{r}_\mathrm{opt} &= \mathbf{r}_0 -\mathrm{B}^{-1}\nabla f(\mathbf{r}_0) \end{array} $$$$ \begin{eqnarray}{} & \nabla f(\mathbf{r}_0)+\mathrm{B}.(\mathbf{r}_\mathrm{opt}-\mathbf{r}_0) &= 0\tag{5}\\ \Rightarrow & \mathrm{B}.(\mathbf{r}_\mathrm{opt}-\mathbf{r}_0) &= -\nabla f(\mathbf{r}_0)\tag{6}\\ \Rightarrow & (\mathbf{r}_\mathrm{opt}-\mathbf{r}_0) &= -\mathrm{B}^{-1}\nabla f(\mathbf{r}_0)\tag{7}\\ \Rightarrow & \mathbf{r}_\mathrm{opt} &= \mathbf{r}_0 -\mathrm{B}^{-1}\nabla f(\mathbf{r}_0)\tag{8} \end{eqnarray} $$ Comparing this with equation \eqref{eq:step}, we see that the basic form is the same, except that the scalar $\alpha$ has been replaced by the matrix $\mathrm{B}^{-1}$.
If $\mathrm{B}^{-1}$ is a diagonal matrix, and all the eigenvalues are the same, then it has the form, $$ \mathrm{B}^{-1}=\left(\begin{array}{ccc} \lambda & 0 & 0 & \ldots\\ 0 & \lambda & 0 & \ldots\\ 0 & 0 &\lambda & \\ \vdots & \vdots & & \ddots \end{array}\right) =\lambda\mathrm{I}, $$$$\tag{9} \mathrm{B}^{-1}=\left(\begin{array}{ccc} \lambda & 0 & 0 & \ldots\\ 0 & \lambda & 0 & \ldots\\ 0 & 0 &\lambda & \\ \vdots & \vdots & & \ddots \end{array}\right) =\lambda\mathrm{I}, $$ where $\mathrm{I}$ is the identity matrix. In this case, setting $\alpha=\lambda$ will give the ideal step length and, in fact, will jump straight to the minimum of the function in a single step.
If the eigenvalues of $\mathrm{B}^{-1}$ are different, then there is no single value of $\alpha$ which can mimic $\mathrm{B}^{-1}$ perfectly. The best approximation is for $\alpha$ to match $\mathrm{B}^{-1}$ as closely as possible. Since we used $\mathrm{B}^{-1}$ because of the relation, $$ \mathrm{B}^{-1}\mathrm{B}=I, $$$$ \mathrm{B}^{-1}\mathrm{B}=I,\tag{10} $$ this leads naturally to the conclusion that we want $$ \alpha\mathrm{B} \approx I. $$$$ \alpha\mathrm{B} \approx I.\tag{11} $$ The identity matrix $\mathrm{I}$ has eigenvalues of 1, and the best approximation for $\alpha$ is when the average eigenvalue of $\alpha\mathrm{B}$ is one.
"Electrons and positrons in metal vacancies", M. Manninen, R. Nieminen, P. Hautojärvi, and J. Arponen, Phys. Rev. B 12, 4012 (1975); https://doi.org/10.1103/PhysRevB.12.4012
"Efficient iteration scheme for self-consistent pseudopotential calculations", G. P. Kerker, Phys. Rev. B 23, 3082 (1981); https://doi.org/10.1103/PhysRevB.23.3082
"A class of methods for solving nonlinear simultaneous equations", C. G. Broyden, Math. Comp. 19, 577-593 (1965)
"Convergence acceleration of iterative sequences. the case of scf iteration", P. Pulay, Chem. Phys. Lett. 73, 393 (1980); https://doi.org/10.1016/0009-2614(80)80396-4
"Improved SCF convergence acceleration", P. Pulay, J. Comput. Chem. 3, 556(1982); https://doi.org/10.1002/jcc.540030413
"Iterative Procedures for Nonlinear Integral Equations", D. G. Anderson, J. ACM 12 4 (1965); https://doi.org/10.1145/321296.321305
"Electrons and positrons in metal vacancies", M. Manninen, R. Nieminen, P. Hautojärvi, and J. Arponen, Phys. Rev. B 12, 4012 (1975).
"Efficient iteration scheme for self-consistent pseudopotential calculations", G. P. Kerker, Phys. Rev. B 23, 3082 (1981).
"A class of methods for solving nonlinear simultaneous equations", C. G. Broyden, Math. Comp. 19, 577-593 (1965).
"Convergence acceleration of iterative sequences. the case of scf iteration", P. Pulay, Chem. Phys. Lett. 73, 393 (1980).
"Improved SCF convergence acceleration", P. Pulay, J. Comput. Chem. 3, 556(1982).
"Iterative Procedures for Nonlinear Integral Equations", D. G. Anderson, J. ACM 12 4 (1965).