2022-11-09 17:25:27 +08:00
2022-11-04 16:38:13 +08:00
2022-11-04 16:38:13 +08:00
2022-11-09 17:25:27 +08:00
2022-11-04 16:39:44 +08:00
2022-11-04 16:38:13 +08:00
2022-11-04 16:38:13 +08:00
2022-11-04 16:38:13 +08:00
2022-11-04 16:38:13 +08:00
2022-08-18 09:58:10 +08:00
2022-11-06 14:54:45 +08:00
2022-11-04 17:26:20 +08:00
2022-11-04 16:38:13 +08:00
2022-11-04 16:38:13 +08:00
2022-11-04 16:38:13 +08:00
2022-11-04 16:38:13 +08:00
2022-11-04 16:38:13 +08:00
2022-11-04 16:38:13 +08:00
2022-11-04 15:18:28 +08:00
2022-11-04 16:38:13 +08:00
2022-11-04 16:38:13 +08:00
2022-11-04 16:38:13 +08:00
2022-11-04 16:38:13 +08:00
2022-11-04 16:38:13 +08:00
2022-11-04 16:38:13 +08:00
2022-11-04 16:38:13 +08:00
2022-11-04 16:38:13 +08:00
2022-11-04 16:38:13 +08:00

voice-cloning

Version updates

2022.05.19 We calculated GE2E loss in encoder with cuda rather than originally-configured CPU. It speeds up the encoder training speed.
2022.07.15 We added Loss animation plot for synthesizer and vocoder.
2022.07.19 We added response time and Griffin-Lim vocoder results for demo_toolbox.
2022.07.29 We added model validation for encoder, synthesizer and vocoder.
2022.08.02 We added voxceleb train and dev data for encoder. We added noise reduce method for the output wav from vocoder.
noisereduce reference: https://github.com/timsainb/noisereduce
2022.08.06 We split the long text into short sentences using spacy for input of synthesizer. Make sure to install English dataset en_core_web_sm, say by python -m spacy download en_core_web_sm
2022.09.02 We set prop_decrease=0.6 for male and 0.9 for female in noisereduce function.(输出滤波,男女声使用不同的滤波参数)
2022.09.26 We added speed adjustment(声音变速) for output audios using praat, install parselmouth using pip: pip install praat-parselmouth
2022.10.10 We added voice filter functioning(声音美颜) for input audios, the weight ratio of the input audio embed and the standard audio embed is 7: 3.
2022.10.25 We set small values(<0.06) to zeros in embed.(对嵌入向量较小值置零)
2022.10.26 The split frequency for input audio is 170Hz. The split frequency for output noise reduce is 165Hz.

Description
an improved version of Real-time-voice-cloning
Readme 21 MiB
Languages
Python 91.2%
Praat 5.3%
HTML 2.6%
CSS 0.9%