2022-12-01 14:21:43 +08:00
2022-11-18 11:22:55 +08:00
2022-11-18 11:28:27 +08:00
2022-11-04 16:38:13 +08:00
2022-11-04 16:38:13 +08:00
2022-11-18 11:57:07 +08:00
2022-12-01 10:27:37 +08:00
2022-11-04 16:38:13 +08:00
2022-11-17 21:36:21 +08:00
2022-11-04 16:38:13 +08:00
2022-11-04 16:38:13 +08:00
2022-12-01 09:55:28 +08:00
2022-11-04 16:38:13 +08:00
2022-11-04 16:38:13 +08:00
2022-11-17 21:36:21 +08:00
2022-11-04 16:38:13 +08:00
2022-11-17 21:36:21 +08:00
2022-11-04 16:38:13 +08:00
2022-11-04 16:38:13 +08:00

Voice-Cloning

Installation

pip install -r requirements.txt

Training Commands

Encoder Preprocessing

python encoder_preprocess.py <datasets_root>

Encoder Training

python encoder_train.py my_run <datasets_root>/SV2TTS/encoder

Synthesizer Preprocessing:

python synthesizer_preprocess_audio.py <datasets_root>
python synthesizer_preprocess_embeds.py <datasets_root>/SV2TTS/synthesizer

Synthesizer Training:

python synthesizer_train.py my_run <datasets_root>/SV2TTS/synthesizer

Vocoder Preprocessing:

python vocoder_preprocess.py <datasets_root>

Vocoder Training:

python vocoder_train.py my_run <datasets_root>

Inference Commands

Terminal:

python demo_cli.py

GUI:

python demo_toolbox.py

Version updates

2022.05.19 We calculated GE2E loss in encoder with cuda rather than originally-configured CPU. It speeds up the encoder training speed.
2022.07.15 We added Loss animation plot for synthesizer and vocoder.
2022.07.19 We added response time and Griffin-Lim vocoder results for demo_toolbox.
2022.07.29 We added model validation for encoder, synthesizer and vocoder.
2022.08.02 We added voxceleb train and dev data for encoder. We added noise reduce method for the output wav from vocoder.
noisereduce reference: https://github.com/timsainb/noisereduce
2022.08.06 We split the long text into short sentences using spacy for input of synthesizer. Make sure to install English dataset en_core_web_sm, say by python -m spacy download en_core_web_sm
2022.09.02 We set prop_decrease=0.6 for male and 0.9 for female in noisereduce function.(输出滤波,男女声使用不同的滤波参数)
2022.09.26 We added speed adjustment(声音变速) for output audios using praat, install parselmouth using pip: pip install praat-parselmouth
2022.10.10 We added voice filter functioning(声音美颜) for input audios, the weight ratio of the input audio embed and the standard audio embed is 7: 3.
2022.10.25 We set small values(<0.06) to zeros in embed.(对嵌入向量较小值置零)
2022.10.26 The split frequency for input audio is 170Hz. The split frequency for output noise reduce is 165Hz.
2022.12.01 merge the single sentences to input.

Description
an improved version of Real-time-voice-cloning
Readme 21 MiB
Languages
Python 91.2%
Praat 5.3%
HTML 2.6%
CSS 0.9%