mirror of
https://github.com/liuhaozhe6788/voice-cloning-collab.git
synced 2026-05-18 05:04:51 +02:00
new commits
This commit is contained in:
18
CHANGELOG.md
Normal file
18
CHANGELOG.md
Normal file
@@ -0,0 +1,18 @@
|
||||
## What's new
|
||||
**2022.05.19:** We calculated GE2E loss in encoder with CUDA rather than originally-configured CPU. It speeds up the encoder training speed.<br>
|
||||
**2022.07.15:** We added Loss animation plot for synthesizer and vocoder.<br>
|
||||
**2022.07.19:** We added response time and Griffin-Lim vocoder results for demo_toolbox.<br>
|
||||
**2022.07.29:** We added model validation for encoder, synthesizer and vocoder.<br>
|
||||
**2022.08.02:** We added voxceleb train and dev data for encoder. We added [noisereduce](https://github.com/timsainb/noisereduce) denoiser for the output wav from vocoder.<br>
|
||||
**2022.08.06:** We split the long text into short sentences using spacy for input of synthesizer. Make sure to install spaCy model en_core_web_sm by
|
||||
`python -m spacy download en_core_web_sm`<br>
|
||||
**2022.09.02:** We set prop_decrease=0.6 for male and 0.9 for female in noisereduce function.(输出滤波,男女声使用不同的滤波参数)<br>
|
||||
**2022.09.26:** We added speed adjustment(声音变速) for output audios using praat, install parselmouth using pip: `pip install praat-parselmouth`<br>
|
||||
**2022.10.10:** We added voice filter functioning(声音美颜) for input audios, the weight ratio of the input audio embed and the standard audio embed is 7: 3. <br>
|
||||
**2022.10.25:** We set small values(<0.06) to zeros in embed.(对嵌入向量较小值置零)<br>
|
||||
**2022.10.26:** The split frequency for input audio is 170Hz. The split frequency for output noise reduce is 165Hz.<br>
|
||||
**2022.12.01:** merge the single sentences to input.<br>
|
||||
**2022.12.31:** added speaker embeddings dimension reduction visualzation results.<br>
|
||||
**2023.01.01:** did more text preprocessing and text cleaning for TTS text input.<br>
|
||||
**2023.02.27:** preprocessed ascii chars and abbreviations.<br>
|
||||
**2023.06.09:** We added VCTK train and dev data for synthesizer. We also combine a [deep learning denoiser](https://github.com/facebookresearch/denoiser) with the [noisereduce](https://github.com/timsainb/noisereduce) denoiser for optimized output wav quality.<br>
|
||||
25
README.md
25
README.md
@@ -1,7 +1,7 @@
|
||||
# Real-Time Voice Cloning v2
|
||||
|
||||
### What is this?
|
||||
It is an improved version of [Real-Time-Voice-Cloning](https://github.com/CorentinJ/Real-Time-Voice-Cloning).
|
||||
It is an improved version of [Real-Time-Voice-Cloning](https://github.com/CorentinJ/Real-Time-Voice-Cloning). Our emotion voice cloning implementation is [here](https://github.com/liuhaozhe6788/voice-cloning-collab/tree/add_emotion)!
|
||||
|
||||
## Installation
|
||||
1. Install [ffmpeg](https://ffmpeg.org/download.html#get-packages). This is necessary for reading audio files.
|
||||
@@ -103,6 +103,11 @@ Training breakpoints are saved periodically, so you can run the training command
|
||||
python demo_cli.py
|
||||
```
|
||||
First input the number of audios, then input the audio file paths, then input the text message. The attention alignments and mel spectrogram are stored in syn_results/. The generated audio is stored in out_audios/.
|
||||
|
||||
GUI demo:
|
||||
```
|
||||
python demo_toolbox.py
|
||||
```
|
||||
## Dimension reduction visualization
|
||||
**Download dataset:**
|
||||
|
||||
@@ -122,21 +127,3 @@ The results are saved in dim_reduction_results/.
|
||||
|
||||
## Pretrained models
|
||||
You can download the pretrained model from [this](https://drive.google.com/drive/folders/19fhjjAbWq60zv1Bl6Y51snGbG1r5kaN2) and extract as saved_models/20230609
|
||||
## What's new
|
||||
**2022.05.19:** We calculated GE2E loss in encoder with CUDA rather than originally-configured CPU. It speeds up the encoder training speed.<br>
|
||||
**2022.07.15:** We added Loss animation plot for synthesizer and vocoder.<br>
|
||||
**2022.07.19:** We added response time and Griffin-Lim vocoder results for demo_toolbox.<br>
|
||||
**2022.07.29:** We added model validation for encoder, synthesizer and vocoder.<br>
|
||||
**2022.08.02:** We added voxceleb train and dev data for encoder. We added [noisereduce](https://github.com/timsainb/noisereduce) denoiser for the output wav from vocoder.<br>
|
||||
**2022.08.06:** We split the long text into short sentences using spacy for input of synthesizer. Make sure to install spaCy model en_core_web_sm by
|
||||
`python -m spacy download en_core_web_sm`<br>
|
||||
**2022.09.02:** We set prop_decrease=0.6 for male and 0.9 for female in noisereduce function.(输出滤波,男女声使用不同的滤波参数)<br>
|
||||
**2022.09.26:** We added speed adjustment(声音变速) for output audios using praat, install parselmouth using pip: `pip install praat-parselmouth`<br>
|
||||
**2022.10.10:** We added voice filter functioning(声音美颜) for input audios, the weight ratio of the input audio embed and the standard audio embed is 7: 3. <br>
|
||||
**2022.10.25:** We set small values(<0.06) to zeros in embed.(对嵌入向量较小值置零)<br>
|
||||
**2022.10.26:** The split frequency for input audio is 170Hz. The split frequency for output noise reduce is 165Hz.<br>
|
||||
**2022.12.01:** merge the single sentences to input.<br>
|
||||
**2022.12.31:** added speaker embeddings dimension reduction visualzation results.<br>
|
||||
**2023.01.01:** did more text preprocessing and text cleaning for TTS text input.<br>
|
||||
**2023.02.27:** preprocessed ascii chars and abbreviations.<br>
|
||||
**2023.06.09:** We added VCTK train and dev data for synthesizer. We also combine a [deep learning denoiser](https://github.com/facebookresearch/denoiser) with the [noisereduce](https://github.com/timsainb/noisereduce) denoiser for optimized output wav quality.<br>
|
||||
|
||||
@@ -1,78 +0,0 @@
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
from matplotlib.animation import FuncAnimation
|
||||
import argparse
|
||||
|
||||
def main(module_name):
|
||||
|
||||
if module_name == "syn":
|
||||
# function to update the data
|
||||
def my_function(i):
|
||||
# get data
|
||||
train_loss_arr = np.load("synthesizer_loss/synthesizer_train_loss.npy")
|
||||
dev_loss_arr = np.load("synthesizer_loss/synthesizer_dev_loss.npy")
|
||||
|
||||
# clear axis
|
||||
ax.cla()
|
||||
# plot cpu
|
||||
ax.plot(train_loss_arr)
|
||||
ax.plot(dev_loss_arr)
|
||||
ax.legend(["Train Loss", "Dev Loss"])
|
||||
ax.scatter(len(train_loss_arr) - 1, train_loss_arr[-1])
|
||||
ax.text(len(train_loss_arr)-1, train_loss_arr[-1], f"({len(train_loss_arr) - 1}, {train_loss_arr[-1]:.6})")
|
||||
ax.scatter(len(dev_loss_arr) - 1, dev_loss_arr[-1])
|
||||
ax.text(len(dev_loss_arr)-1, dev_loss_arr[-1], f"({len(dev_loss_arr) - 1}, {dev_loss_arr[-1]:.6})")
|
||||
# ax.set_ylim([0, 1])
|
||||
plt.xlabel("*100Steps")
|
||||
plt.ylabel("Loss")
|
||||
plt.title("Synthesizer Loss")
|
||||
# define and adjust figure
|
||||
fig, ax = plt.subplots()
|
||||
ax.set_facecolor('#DEDEDE')
|
||||
plt.xlabel("total steps")
|
||||
# animate
|
||||
ani = FuncAnimation(fig, my_function, interval=1000)
|
||||
plt.show()
|
||||
|
||||
elif module_name == "voc":
|
||||
# function to update the data
|
||||
def my_function(i):
|
||||
# get data
|
||||
train_loss_arr = np.load("vocoder_loss/vocoder_train_loss.npy")
|
||||
dev_loss_arr = np.load("vocoder_loss/vocoder_dev_loss.npy")
|
||||
# clear axis
|
||||
ax.cla()
|
||||
# plot cpu
|
||||
ax.plot(train_loss_arr)
|
||||
ax.plot(dev_loss_arr)
|
||||
ax.legend(["Train Loss", "Dev Loss"])
|
||||
ax.scatter(len(train_loss_arr) - 1, train_loss_arr[-1])
|
||||
ax.text(len(train_loss_arr), train_loss_arr[-1]+0.1, f"({len(train_loss_arr) - 1}, {train_loss_arr[-1]:.6})")
|
||||
ax.scatter(len(dev_loss_arr) - 1, dev_loss_arr[-1])
|
||||
ax.text(len(dev_loss_arr), dev_loss_arr[-1]-0.1, f"({len(dev_loss_arr) - 1}, {dev_loss_arr[-1]:.6})")
|
||||
ax.set_ylim([0, 5])
|
||||
plt.xlabel("*100Steps")
|
||||
plt.ylabel("Loss")
|
||||
plt.title("Vocoder Loss")
|
||||
# define and adjust figure
|
||||
fig, ax = plt.subplots()
|
||||
ax.set_facecolor('#DEDEDE')
|
||||
plt.xlabel("total steps")
|
||||
# animate
|
||||
ani = FuncAnimation(fig, my_function, interval=1000)
|
||||
plt.show()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(
|
||||
formatter_class=argparse.ArgumentDefaultsHelpFormatter
|
||||
)
|
||||
parser.add_argument("model", type=str, help= \
|
||||
"The model to show plot, model name is syn or voc")
|
||||
args = parser.parse_args()
|
||||
arg_dict = vars(args)
|
||||
try:
|
||||
main(arg_dict["model"])
|
||||
except Exception as e:
|
||||
print("Caught exception: %s" % repr(e))
|
||||
print("Restarting\n")
|
||||
Reference in New Issue
Block a user