What hasn't been said is that your concept of delay is much like that of an echo. You say something, then wait, then the echo repeats your speech. If you play music through an echo, it makes each note seem to happen twice. In these cases, the echo and the original are completely separated.
Where things are different is if this delayed signal is occurring while the original is still going on. The original and the delayed one will mix together and produce some very interesting effects. You get a single result that is different from either part. When this is happening, you are less interested in how much time separates the two, and rather what part of the original signal is combining with which part of its replica.
If we know, for example, that we can take a sound and cancel it out by adding in a replica that is "upside down", what we are really saying is that as the waves go up and down we want to take a copy where it is at the lowest part of the wave, and combine it with the original right where it is at the highest part. Where do these occur? We can find out once we know the frequency. The frequency tells us how much time passes between the top and the bottom of the wave. And that means that if we try this at a different frequency, we will have to do the calculation again to get the new time delay. It's more convenient to say, "Find the high a low points. They are halfway through the cycle." That statement works for any frequency, and the actual time delay needed can be calculated as necessary.
That "half a cycle" is a phase determination, and so how much of the wave has passed becomes described as a fraction of a cycle, and that's what they end up calling the phase angle.