In chapter 1 of Nielsen's Neural Networks and Deep Learning it says
To make gradient descent work correctly, we need to choose the learning rate η to be small enough that Equation (9) is a good approximation. If we don't, we might end up with $ΔC>0$$\Delta C>0$, which obviously would not be good! At the same time, we don't want $η$$\eta$ to be too small, since that will make the changes $Δv$$\Delta v$ tiny, and thus the gradient descent algorithm will work very slowly. In practical implementations, $η$$\eta$ is often varied so that Equation (9) remains a good approximation, but the algorithm isn't too slow. We'll see later how this works.
But just a few paragraphs before we established that $ΔC≈−η∇C⋅∇C=−η‖∇C‖^2$$\Delta C\approx−\eta\nabla C⋅\nabla C=−\eta\|\nabla C\|^2$ is obviously always negative (for positive $η$$\eta$). So how can $ΔC$$\Delta C$ be positive if we don't choose a small enough learning rate? What is meant there?