PSOLA can mean different things to different people.
the "Formant Preserving Pitch Shifting" that is done in the time domain requires a really good, tracking, pitch detector. knowing the pitch (which changes) at every arbitrary time means you know the period length.
so this Formant Preserving Pitch Shifting in the time domain, identifies the onsets of every period of the input waveform, when it is voiced. each period is isolated with a window and now becomes a "grain" or (my poorly-considered name for it) a "wavelet". then you launch these grains or wavelets out at the rate of the new fundamental frequency. but, to preserve the formants, you neither stretch (if down-shifting) nor scrunch (if up-shifting) the grains or wavelets. you overlap and add them.
the rate of firing out those grains determines the output pitch. the degree of stretching or scrunching the grains along the time axis determines the formant shift.