The fourth component is a trick to keep track of perspective projection. When you do a perspective projection, you want to divide by z: x' = x/z, y' = y/z, but this isn't an operation that can be implemented by a 3x3 matrix operating on a vector of x, y, z. The trick that has become standard for doing this is to append a fourth coordinate, w, and declare that x, y, z will always be divided by w after all transformations are applied and before rasterization.
Perspective projection is then accomplished by having a matrix that moves z into w, so that you end up dividing by z. But it also gives you the flexibility to leave w = 1.0 if you don't want to do a divide; for instance if you just want a parallel projection, or a rotation or whatever.
The ability to encode pointspositions as w = 1, vectorsdirections as w = 0 and use the fourth row/column of a matrix for translation is a nice side benefit, but it's not the primary reason for appending w. One could use affine transformations (a 3x3 matrix plus a 3-component translation vector) to accomplish translation without any w in sight. (One would have to keep track of what's a pointposition and what's a vectordirection, and apply different transformation functions to each; that's a bit inconvenient, but not really a big deal.)
(BTW, mathematically, vectors augmented with w are known as homogeneous coordinates, and they live in a place called projective space. However, you don't need to understand the higher math to do 3D graphics.)