In an ideal world, we wouldn't want them. We'd just use the same high quality textures far away and not care about wasted resources.
Not really. If we forget about wasted resources and use fast enough machines, simple sampling of the textures, with e.g. bilinear filtering, will result in what is known as aliasing. In particular, this will look like noise in the areas where the texture is effectively downsampled. See e.g. faraway part of the pavement in the following scene (click to see in 100% scale to avoid blurring):

This noise will also flicker badly on slightest motion of the camera.
In a really ideal world we would fight this problem of texture aliasing (and all other aliasing problems, e.g. jagged edges of polygons) with supersampling applied to the full scene. With this technique applied, the scene above would look like this:

Now, our world is not really ideal. Moreover, it's not even slightly ideal :) In particular, supersampling is very demanding on GPU. It basically means rendering the same scene N times, where N is the number of samples per pixel (usually N=16 is a good enough figure for polygon edges, not so good for textures).
Mip mapping is a technique that aims to approximate the ideal result of supersampling by downsampling the textures in steps, reducing their size 2× at a time, and then sampling from the closest mip level (or using weighted average of two closest, via linear interpolation). When generating the mip map, each downsampling step is done in a high-quality way, avoiding aliasing. This reduces noise and generally works satisfactorily for objects whose sides don't differ too much in level of detail.
But in computer games and some other kinds of graphics software, inclined surfaces like e.g. roads appear quite often. The result is that, although in one dimension the texture is squeezed, its other dimension's squeezing coefficient is much smaller. The result is that the above scene gets approximated poorly. The noise does get removed, but blurriness is excessive:

Anisotropic filtering is a modification of mip mapping, where sizes of each mip level get applied separately to width and to height of the image, yielding combinations of different widths and heights in the resulting mip map. This takes considerably more space than normal mip map (up to 4× the original texture), but the result is a drastic increase of image quality. The texture rendering becomes comparable to the supersampled result:
