I'm trying to wrap my head around the concept of a real jitter buffer. I'm basing the only knowledge that I have around this article:
http://toncar.cz/Tutorials/VoIP/VoIP_Basics_Jitter.html
This article states that:
In the jitter estimator formula, the value D(i-1, i) is the difference of relative transit times for the two packets. The difference is computed as
D(i,j) = (Rj - Ri) - (Sj - Si) = (Rj - Sj) - (Ri - Si)
Si is the timestamp from the packet i and Ri is the time of arrival for packet i.
I've been trying to figure out in my head how it's possible to get the time it takes a packet to get from one system to the other even using TCP. If I'm not mistaken, won't the timestamps on the two devices be out of sync even if I were to send them as headers? Even if I were to sync timestamps before beginning pushing out the audio data, wouldn't that be received several milliseconds afterwards making syncing not possible?
So my question is, how can I actually calculate how long it takes for packets to arrive at their destination to calculate jitter?