If I choose to place this sphere several million units away from the player [...] same performance impact as the same sphere shrunken into a ball that can fit in the player's hand?
That makes no difference, it is the screenspace size that matters, if anything. The in-game size is inconsequential (well, not really, floating point math with both very near and very far things in the same scene can be a major problem, but that's irrelevant for the question).
Note that although a few thousand triangles are "nothing" in general, you will still want to use LOD to reduce the number of triangles if an object is small (which "size of player's hand" suggests, unless the hand fills the whole screen, which it usually doesn't). Vertex processing is massively parallel (assembly and rasterization has to be serial, but vertices are independent, there's no hindrance to running parallel), and no challenge to a modern GPU, but that's not the issue.
Drawing small objects with too high tesselation will cause fragment shader invocations to explode, and will put a lot of pressure on ROP. Now since ROP has been mentioned, it is exactly that one thing in GPUs that doesn't grow faster than Moore's Law. Which means that if there is a bottleneck in your drawing, it is almost certainly not ALU -- it is almost certainly ROP.
Is it a real problem insofar as it will cripple your game? I cannot tell, nobody can tell but you (you'd have to try). It is, however, in any case a very real problem from an architectural point of view.
There exist different approaches to hardware rendering, one of them rasterizing all triangles into fragments and then processing these massively in parallel, and the other common one using a tile-based approach where information is first collected and spatially organized in an attempt to minimize ROP and memory bandwidth (operations can happen in a tiny tile-sized piece of local memory which is then only copied to global memory once). The tile based approach may introduce additional constraints, but the most stringent one follows in the next paragraph.
Fragments are rendered in small (usually 2x2) groups, which is necessary because GPUs support gradients, and that's how they're generated. For every fragment that a triangle touches, the corresponding 2x2 group has to be processed, even if only one of the 4 samples (or none of them!) is finally saved. The smaller your triangles get, the higher the likelihood of rendering and discarding useless fragments becomes. As triangles approach (or pass below) the size of fragments, it gets worse and worse.
Again, that doesn't necessarily mean that your game's performance will suck. It just depends on how much fragment and ROP bound it already is. But you pay a rapidly increasing amount of resources for for something you can't see (on the contrary, you might indeed see aliasing, it might look worse).
So... if one can help it, this is something one tries to avoid.
This example illustrates the pretty much most desastrous case of too small triangles:

Of the shown 6 triangles, only one actually contributes to a pixel. Every other one touches (but does not affect in the end) at least two fragments, and all of them are in different 2x2 fragment blocks.
The red triangle touches all four 2x2 blocks although it doesn't contribute to any single pixel. It causes 16 fragment shader invocations for exactly zero effect. In total, the 6 triangles run 244 fragment shader invocations and 244 ROP operations for one pixel of output.
Compare that to a textured billboard, which will, ideally, run one fragment shader invocation and one ROP operation.
But worse, you cannot even consider this a high image quality, despite its huge cost. If the vertices of that one triangle which does affect a pixel move only so and so much, even only by rounding errors, you will have the pixel flickering on and off between frames ("temporal aliasing").