How does PSOLA pitch-shift work in real-time?

Question

How does the PSOLA pitch-shifting algorithm manage to remove or add audio frames in real-time without creating gaps?

Wouldn't extending the "windows" (it's like in FFT windowing) to longer than the audio buffer create asynchronity in the buffering? Similarly compressing the windows would lead to the need to "read new audio in advance" in order to not leave silence.

Or what's the idea?

I guess there's some clever math (ratios?) in order to know how to fill and remove windows so that they still match up reasonably in the overlap-add process.

You can solve these issues by introducing a processing latency that covers all the sample terrain you need for decision making and finding blocks to repeat. — Jazzmaniac
– Jazzmaniac, Commented Dec 1, 2015 at 15:21

hotpaw2 · Accepted Answer · 2015-12-01 17:36:16Z

In PSOLA, pitch estimation is used to space out or overlap segments by integer multiples of the estimated pitch period in time. Previous frames (or pitch period length sub-segments thereof) can be duplicated (as needed) in order to not leave any gaps between frames. Segments can then be concatenated or cross-faded. This can lengthen or shorted the composited result, and thus change the time duration. Then this intermediate result can resampled to change the pitch (and possibly undo some or all of the time duration change), and to provide the proper number of samples needed for the output sample rate.

There will be some jitter, as frames are usually only shifted by a minimum of one whole pitch period, not by fractions thereof. Buffering may be required to cover this time jitter, the amount related to the longest period (lowest pitch) expected to be handled.

The "clever math" might be in the choice and implementation of an accurate enough but low latency pitch estimation algorithm. If the pitch cannot be estimated (due to consonants or polyphony, etc.), apparent quality may suffer.

i am wondering if there is a little semantic confusion regarding "PSOLA", which i have usually equated with "Lent's Algorithm" (also credited to Hamon) that shifts pitch without shifting formants and conflating this with "TDHS" (time-domain harmonic scaling), which is what the Harmonizers (of various brands) did which do shift the formants along with pitch. — robert bristow-johnson
– robert bristow-johnson, Commented Dec 1, 2015 at 17:14
I've seen PSOLA used as just a descriptive synonym for TDHS. — hotpaw2
– hotpaw2, Commented Dec 1, 2015 at 17:38
i'm having a little trouble getting references (they're nearly 3 decades old and not IEEE). Hamon Lent and TDHS (not so good references) — robert bristow-johnson
– robert bristow-johnson, Commented Dec 1, 2015 at 17:43
The difference might make a good separate question. Then a good enough answer can be used as a reference to update the wikipedia page (which currently says that they are the same). — hotpaw2
– hotpaw2, Commented Dec 1, 2015 at 20:44

Stack Exchange Network

How does PSOLA pitch-shift work in real-time?

1 Answer 1

Hot Network Questions

How does PSOLA pitch-shift work in real-time?

1 Answer 1

Related

Hot Network Questions