Indexed by:
Abstract:
With the rapid development of deep learning, although speech conversion had made great progress, there are still rare researches in deep learning to model on singing voice conversion, which is mainly based on statistical methods at present and can only achieve one-to-one conversion with parallel training datasets. So far, its application is limited. This paper proposes a generative adversarial learning model, MSVC-GAN, for many-to-many singing voice conversion using non-parallel datasets. First, the generator of our model is concatenated by the singer label, which denotes domain constraint. Furthermore, the model integrates self-attention mechanism to capture long-term dependence on the spectral features. Finally, switchable normalization is employed to stabilize network training. Both the objective and subjective evaluation results show that our model achieves the highest similarity and naturalness not only on the parallel speech dataset but also on the non-parallel singing dataset. © 2019 IEEE.
Keyword:
Reprint 's Address:
Email:
Source :
2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
Year: 2019
Page: 125-132
Language: English
Cited Count:
SCOPUS Cited Count: 3
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 1
Affiliated Colleges: