Recall the relationship between the gradient $\nabla f(p)$ and the directional derivative $df(p) \textbf{u}$ in the direction of a unit vector $\textbf{u}$: $$df(p) \textbf{u} = \nabla f(p) \cdot \textbf{u}$$ Since $$\nabla f(p) \cdot \textbf{u} = ||\nabla f(p)|| \cos \theta$$ where $\theta$ is the angle between $\nabla f(p)$ and $\textbf{u}$, the unit vector which maximizes the directional derivative is clearly the one whose angle against $\nabla f(p)$ has cosine $1$, an this is $\theta = 0$.
The place where this geometric intuition collides with the phenomenon that you worry about is at the beginning: why are the partial derivatives, i.e. the directional derivatives in just two directions, enough to determine the directional derivatives in all directions? The answer is: they aren't. An unmentioned hypothesis in what I wrote above is that the gradient is continuous at $p$, and without continuity the behavior you describe is actually possible. Here's an example.
Let $f(x,y) = \frac{x^2y}{x^2 + y^2}$. First, let us calculate $f_x(0,0)$ by taking the derivative of the function $f(t,0)$ at $t=0$: $$f_x(0,0) = \left.\frac{d}{dt}\right|_{t=0} \frac{0}{t^2} = 0$$ Similarly, $f_y(0,0) = 0$. Now let us calculate the directional derivative of $f$ in the direction of the vector $\textbf{v} = (1,1)$: $$df(0,0) \textbf{v} = \left.\frac{d}{dt}\right|_{t=0} f(t,t) = \left.\frac{d}{dt}\right|_{t=0} \frac{t^3}{2t^2} = \frac{1}{2}$$ So indeed, this function exhibits the sort of behavior you describe, though you can calculate that the partial derivatives of this function are not continuous at $(0,0)$.
In fact, even worse behavior is possible: there is a function whose directional derivative exists and is zero in every direction but which has nonzero derivative along the parabola $(t,t^2)$. But all of this pathology is eliminated if the partial derivatives are continuous. It is worthwhile to study the proof of this fact.