Skip to main content
5 events
when toggle format what by license comment
Oct 17, 2020 at 20:30 vote accept Joe Black
Jul 15, 2020 at 3:56 comment added Tim Mak In the paper, "operate on audio waveform" does not mean "take audio waveform as input". It simply means that they model the audio waveform directly. Your post is off topic though. Try StackOverflow next time perhaps.
Jun 15, 2020 at 21:58 comment added Joe Black where's the quote in the paper "creates a raw audio waveform from the text it is given"? I i couldn't the find it in the paper though i understand Wavenet is supposed to generate audio and that's why it's unclear to me, which is the reason stated in the title and why i made this question.
Jun 15, 2020 at 21:55 comment added Joe Black I understand it's supposed to generate audio, but could you reconcile what i quoted? how else one to interpret "operating directly on the raw audio waveform"? what's the input to wavenet when it's used with tacotron-2 for text-to-speech, esp the input to input_convolution that described in the OP?
Jun 15, 2020 at 20:36 history answered sjp CC BY-SA 4.0