DeepMind: In our new paper [https://arxiv.org/abs/2006.03575] we propose EATS: End-to-End Adversarial Text-to-Speech, which allows for speech synthesis directly from text or phonemes without the need for multi-stage training pipelines or additional supervision.
Audio: https://bit.ly/2Ya9rRK https://t.co/h8Ye4FfC0M
8 replies, 740 likes
Sander Dieleman: Our latest work on GANs for text-to-speech, from characters/phonemes to waveforms with a single model.
Learning varying alignment without teacher forcing is tricky, but we found dynamic time warping (DTW) to be very effective.
2 replies, 163 likes
Sander Dieleman: We've updated the EATS paper on arXiv: https://arxiv.org/abs/2006.03575
'End-to-end' has many possible interpretations – Table 5 in the appendix (p. 21) describes some of the many ways in which the TTS pipeline has been factorised into stages in the literature, for easier comparison. https://t.co/ku0K7EJ5QA
2 replies, 82 likes
AiNews.page: Tweet of the day #TextToSpeech #DeepMind #NLP
0 replies, 2 likes
Vishal Chandra 👻: I hereby reserve the title of my deep learning research paper as follows:
Fast Adversarial Reinforcement-learning for Text-to-Speech
0 replies, 1 likes
Found on Jun 08 2020 at https://arxiv.org/pdf/2006.03575.pdf