new commits

This commit is contained in:
liuhaozhe6788
2023-07-17 19:44:29 +08:00
parent 7c2afa8cbb
commit a9697e1f09
3 changed files with 24 additions and 97 deletions

18
CHANGELOG.md Normal file
View File

@@ -0,0 +1,18 @@
## What's new
**2022.05.19** We calculated GE2E loss in encoder with CUDA rather than originally-configured CPU. It speeds up the encoder training speed.<br>
**2022.07.15** We added Loss animation plot for synthesizer and vocoder.<br>
**2022.07.19** We added response time and Griffin-Lim vocoder results for demo_toolbox.<br>
**2022.07.29** We added model validation for encoder, synthesizer and vocoder.<br>
**2022.08.02** We added voxceleb train and dev data for encoder. We added [noisereduce](https://github.com/timsainb/noisereduce) denoiser for the output wav from vocoder.<br>
**2022.08.06** We split the long text into short sentences using spacy for input of synthesizer. Make sure to install spaCy model en_core_web_sm by
`python -m spacy download en_core_web_sm`<br>
**2022.09.02** We set prop_decrease=0.6 for male and 0.9 for female in noisereduce function.(输出滤波,男女声使用不同的滤波参数)<br>
**2022.09.26** We added speed adjustment(声音变速) for output audios using praat, install parselmouth using pip: `pip install praat-parselmouth`<br>
**2022.10.10** We added voice filter functioning(声音美颜) for input audios, the weight ratio of the input audio embed and the standard audio embed is 7: 3. <br>
**2022.10.25** We set small values(<0.06) to zeros in embed.(对嵌入向量较小值置零)<br>
**2022.10.26** The split frequency for input audio is 170Hz. The split frequency for output noise reduce is 165Hz.<br>
**2022.12.01** merge the single sentences to input.<br>
**2022.12.31** added speaker embeddings dimension reduction visualzation results.<br>
**2023.01.01** did more text preprocessing and text cleaning for TTS text input.<br>
**2023.02.27** preprocessed ascii chars and abbreviations.<br>
**2023.06.09** We added VCTK train and dev data for synthesizer. We also combine a [deep learning denoiser](https://github.com/facebookresearch/denoiser) with the [noisereduce](https://github.com/timsainb/noisereduce) denoiser for optimized output wav quality.<br>

View File

@@ -1,7 +1,7 @@
# Real-Time Voice Cloning v2
### What is this?
It is an improved version of [Real-Time-Voice-Cloning](https://github.com/CorentinJ/Real-Time-Voice-Cloning).
It is an improved version of [Real-Time-Voice-Cloning](https://github.com/CorentinJ/Real-Time-Voice-Cloning). Our emotion voice cloning implementation is [here](https://github.com/liuhaozhe6788/voice-cloning-collab/tree/add_emotion)!
## Installation
1. Install [ffmpeg](https://ffmpeg.org/download.html#get-packages). This is necessary for reading audio files.
@@ -103,6 +103,11 @@ Training breakpoints are saved periodically, so you can run the training command
python demo_cli.py
```
First input the number of audios, then input the audio file paths, then input the text message. The attention alignments and mel spectrogram are stored in syn_results/. The generated audio is stored in out_audios/.
GUI demo:
```
python demo_toolbox.py
```
## Dimension reduction visualization
**Download dataset**
@@ -122,21 +127,3 @@ The results are saved in dim_reduction_results/.
## Pretrained models
You can download the pretrained model from [this](https://drive.google.com/drive/folders/19fhjjAbWq60zv1Bl6Y51snGbG1r5kaN2) and extract as saved_models/20230609
## What's new
**2022.05.19** We calculated GE2E loss in encoder with CUDA rather than originally-configured CPU. It speeds up the encoder training speed.<br>
**2022.07.15** We added Loss animation plot for synthesizer and vocoder.<br>
**2022.07.19** We added response time and Griffin-Lim vocoder results for demo_toolbox.<br>
**2022.07.29** We added model validation for encoder, synthesizer and vocoder.<br>
**2022.08.02** We added voxceleb train and dev data for encoder. We added [noisereduce](https://github.com/timsainb/noisereduce) denoiser for the output wav from vocoder.<br>
**2022.08.06** We split the long text into short sentences using spacy for input of synthesizer. Make sure to install spaCy model en_core_web_sm by
`python -m spacy download en_core_web_sm`<br>
**2022.09.02** We set prop_decrease=0.6 for male and 0.9 for female in noisereduce function.(输出滤波,男女声使用不同的滤波参数)<br>
**2022.09.26** We added speed adjustment(声音变速) for output audios using praat, install parselmouth using pip: `pip install praat-parselmouth`<br>
**2022.10.10** We added voice filter functioning(声音美颜) for input audios, the weight ratio of the input audio embed and the standard audio embed is 7: 3. <br>
**2022.10.25** We set small values(<0.06) to zeros in embed.(对嵌入向量较小值置零)<br>
**2022.10.26** The split frequency for input audio is 170Hz. The split frequency for output noise reduce is 165Hz.<br>
**2022.12.01** merge the single sentences to input.<br>
**2022.12.31** added speaker embeddings dimension reduction visualzation results.<br>
**2023.01.01** did more text preprocessing and text cleaning for TTS text input.<br>
**2023.02.27** preprocessed ascii chars and abbreviations.<br>
**2023.06.09** We added VCTK train and dev data for synthesizer. We also combine a [deep learning denoiser](https://github.com/facebookresearch/denoiser) with the [noisereduce](https://github.com/timsainb/noisereduce) denoiser for optimized output wav quality.<br>

View File

@@ -1,78 +0,0 @@
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.animation import FuncAnimation
import argparse
def main(module_name):
if module_name == "syn":
# function to update the data
def my_function(i):
# get data
train_loss_arr = np.load("synthesizer_loss/synthesizer_train_loss.npy")
dev_loss_arr = np.load("synthesizer_loss/synthesizer_dev_loss.npy")
# clear axis
ax.cla()
# plot cpu
ax.plot(train_loss_arr)
ax.plot(dev_loss_arr)
ax.legend(["Train Loss", "Dev Loss"])
ax.scatter(len(train_loss_arr) - 1, train_loss_arr[-1])
ax.text(len(train_loss_arr)-1, train_loss_arr[-1], f"({len(train_loss_arr) - 1}, {train_loss_arr[-1]:.6})")
ax.scatter(len(dev_loss_arr) - 1, dev_loss_arr[-1])
ax.text(len(dev_loss_arr)-1, dev_loss_arr[-1], f"({len(dev_loss_arr) - 1}, {dev_loss_arr[-1]:.6})")
# ax.set_ylim([0, 1])
plt.xlabel("*100Steps")
plt.ylabel("Loss")
plt.title("Synthesizer Loss")
# define and adjust figure
fig, ax = plt.subplots()
ax.set_facecolor('#DEDEDE')
plt.xlabel("total steps")
# animate
ani = FuncAnimation(fig, my_function, interval=1000)
plt.show()
elif module_name == "voc":
# function to update the data
def my_function(i):
# get data
train_loss_arr = np.load("vocoder_loss/vocoder_train_loss.npy")
dev_loss_arr = np.load("vocoder_loss/vocoder_dev_loss.npy")
# clear axis
ax.cla()
# plot cpu
ax.plot(train_loss_arr)
ax.plot(dev_loss_arr)
ax.legend(["Train Loss", "Dev Loss"])
ax.scatter(len(train_loss_arr) - 1, train_loss_arr[-1])
ax.text(len(train_loss_arr), train_loss_arr[-1]+0.1, f"({len(train_loss_arr) - 1}, {train_loss_arr[-1]:.6})")
ax.scatter(len(dev_loss_arr) - 1, dev_loss_arr[-1])
ax.text(len(dev_loss_arr), dev_loss_arr[-1]-0.1, f"({len(dev_loss_arr) - 1}, {dev_loss_arr[-1]:.6})")
ax.set_ylim([0, 5])
plt.xlabel("*100Steps")
plt.ylabel("Loss")
plt.title("Vocoder Loss")
# define and adjust figure
fig, ax = plt.subplots()
ax.set_facecolor('#DEDEDE')
plt.xlabel("total steps")
# animate
ani = FuncAnimation(fig, my_function, interval=1000)
plt.show()
if __name__ == "__main__":
parser = argparse.ArgumentParser(
formatter_class=argparse.ArgumentDefaultsHelpFormatter
)
parser.add_argument("model", type=str, help= \
"The model to show plot, model name is syn or voc")
args = parser.parse_args()
arg_dict = vars(args)
try:
main(arg_dict["model"])
except Exception as e:
print("Caught exception: %s" % repr(e))
print("Restarting\n")