new commits

2026-05-18 05:04:51 +02:00 · 2023-07-17 19:44:29 +08:00
parent 7c2afa8cbb
commit a9697e1f09
3 changed files with 24 additions and 97 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -0,0 +1,18 @@
+## What's new
+**2022.05.19：** We calculated GE2E loss in encoder with CUDA rather than originally-configured CPU. It speeds up the encoder training speed.<br>
+**2022.07.15：** We added Loss animation plot for synthesizer and vocoder.<br>
+**2022.07.19：** We added response time and Griffin-Lim vocoder results for demo_toolbox.<br>
+**2022.07.29：** We added model validation for encoder, synthesizer and vocoder.<br>
+**2022.08.02：** We added voxceleb train and dev data for encoder. We added [noisereduce](https://github.com/timsainb/noisereduce) denoiser for the output wav from vocoder.<br>
+**2022.08.06：** We split the long text into short sentences using spacy for input of synthesizer. Make sure to install spaCy model en_core_web_sm by 
+`python -m spacy download en_core_web_sm`<br>
+**2022.09.02：** We set prop_decrease=0.6 for male and 0.9 for female in noisereduce function.(输出滤波，男女声使用不同的滤波参数)<br>
+**2022.09.26：** We added speed adjustment(声音变速) for output audios using praat, install parselmouth using pip: `pip install praat-parselmouth`<br>
+**2022.10.10：** We added voice filter functioning(声音美颜) for input audios, the weight ratio of the input audio embed and the standard audio embed is 7: 3. <br>
+**2022.10.25：** We set small values(<0.06) to zeros in embed.(对嵌入向量较小值置零)<br>
+**2022.10.26：** The split frequency for input audio is 170Hz. The split frequency for output noise reduce is 165Hz.<br>
+**2022.12.01：** merge the single sentences to input.<br>
+**2022.12.31：** added speaker embeddings dimension reduction visualzation results.<br>
+**2023.01.01：** did more text preprocessing and text cleaning for TTS text input.<br>
+**2023.02.27：** preprocessed ascii chars and abbreviations.<br>
+**2023.06.09：** We added VCTK train and dev data for synthesizer. We also combine a [deep learning denoiser](https://github.com/facebookresearch/denoiser) with the [noisereduce](https://github.com/timsainb/noisereduce) denoiser for optimized output wav quality.<br>
--- a/README.md
+++ b/README.md
@@ -1,7 +1,7 @@
 # Real-Time Voice Cloning v2

 ### What is this?
-It is an improved version of [Real-Time-Voice-Cloning](https://github.com/CorentinJ/Real-Time-Voice-Cloning).
+It is an improved version of [Real-Time-Voice-Cloning](https://github.com/CorentinJ/Real-Time-Voice-Cloning). Our emotion voice cloning implementation is [here](https://github.com/liuhaozhe6788/voice-cloning-collab/tree/add_emotion)!

 ## Installation
 1. Install [ffmpeg](https://ffmpeg.org/download.html#get-packages). This is necessary for reading audio files.
@@ -103,6 +103,11 @@ Training breakpoints are saved periodically, so you can run the training command
 python demo_cli.py
 ```
 First input the number of audios, then input the audio file paths, then input the text message. The attention alignments and mel spectrogram are stored in syn_results/. The generated audio is stored in out_audios/.
+
+GUI demo:
+```
+python demo_toolbox.py
+```
 ## Dimension reduction visualization
 **Download dataset：** 

@@ -122,21 +127,3 @@ The results are saved in dim_reduction_results/.

 ## Pretrained models
 You can download the pretrained model from [this](https://drive.google.com/drive/folders/19fhjjAbWq60zv1Bl6Y51snGbG1r5kaN2) and extract as saved_models/20230609
-## What's new
-**2022.05.19：** We calculated GE2E loss in encoder with CUDA rather than originally-configured CPU. It speeds up the encoder training speed.<br>
-**2022.07.15：** We added Loss animation plot for synthesizer and vocoder.<br>
-**2022.07.19：** We added response time and Griffin-Lim vocoder results for demo_toolbox.<br>
-**2022.07.29：** We added model validation for encoder, synthesizer and vocoder.<br>
-**2022.08.02：** We added voxceleb train and dev data for encoder. We added [noisereduce](https://github.com/timsainb/noisereduce) denoiser for the output wav from vocoder.<br>
-**2022.08.06：** We split the long text into short sentences using spacy for input of synthesizer. Make sure to install spaCy model en_core_web_sm by 
-`python -m spacy download en_core_web_sm`<br>
-**2022.09.02：** We set prop_decrease=0.6 for male and 0.9 for female in noisereduce function.(输出滤波，男女声使用不同的滤波参数)<br>
-**2022.09.26：** We added speed adjustment(声音变速) for output audios using praat, install parselmouth using pip: `pip install praat-parselmouth`<br>
-**2022.10.10：** We added voice filter functioning(声音美颜) for input audios, the weight ratio of the input audio embed and the standard audio embed is 7: 3. <br>
-**2022.10.25：** We set small values(<0.06) to zeros in embed.(对嵌入向量较小值置零)<br>
-**2022.10.26：** The split frequency for input audio is 170Hz. The split frequency for output noise reduce is 165Hz.<br>
-**2022.12.01：** merge the single sentences to input.<br>
-**2022.12.31：** added speaker embeddings dimension reduction visualzation results.<br>
-**2023.01.01：** did more text preprocessing and text cleaning for TTS text input.<br>
-**2023.02.27：** preprocessed ascii chars and abbreviations.<br>
-**2023.06.09：** We added VCTK train and dev data for synthesizer. We also combine a [deep learning denoiser](https://github.com/facebookresearch/denoiser) with the [noisereduce](https://github.com/timsainb/noisereduce) denoiser for optimized output wav quality.<br>
--- a/update_plot.py
+++ b/update_plot.py
@@ -1,78 +0,0 @@
-import matplotlib.pyplot as plt
-import numpy as np
-from matplotlib.animation import FuncAnimation
-import argparse
-
-def main(module_name):
-
-    if module_name == "syn":
-        # function to update the data
-        def my_function(i):
-            # get data
-            train_loss_arr = np.load("synthesizer_loss/synthesizer_train_loss.npy")
-            dev_loss_arr = np.load("synthesizer_loss/synthesizer_dev_loss.npy")
-
-            # clear axis
-            ax.cla()
-            # plot cpu
-            ax.plot(train_loss_arr)
-            ax.plot(dev_loss_arr)
-            ax.legend(["Train Loss", "Dev Loss"])
-            ax.scatter(len(train_loss_arr) - 1, train_loss_arr[-1])
-            ax.text(len(train_loss_arr)-1, train_loss_arr[-1], f"({len(train_loss_arr) - 1}, {train_loss_arr[-1]:.6})")
-            ax.scatter(len(dev_loss_arr) - 1, dev_loss_arr[-1])
-            ax.text(len(dev_loss_arr)-1, dev_loss_arr[-1], f"({len(dev_loss_arr) - 1}, {dev_loss_arr[-1]:.6})")
-            # ax.set_ylim([0, 1])
-            plt.xlabel("*100Steps")
-            plt.ylabel("Loss")
-            plt.title("Synthesizer Loss")
-        # define and adjust figure
-        fig, ax = plt.subplots()
-        ax.set_facecolor('#DEDEDE')
-        plt.xlabel("total steps")
-        # animate
-        ani = FuncAnimation(fig, my_function, interval=1000)
-        plt.show()
-
-    elif module_name == "voc":
-        # function to update the data
-        def my_function(i):
-            # get data
-            train_loss_arr = np.load("vocoder_loss/vocoder_train_loss.npy")
-            dev_loss_arr = np.load("vocoder_loss/vocoder_dev_loss.npy")
-            # clear axis
-            ax.cla()
-            # plot cpu
-            ax.plot(train_loss_arr)
-            ax.plot(dev_loss_arr)
-            ax.legend(["Train Loss", "Dev Loss"])
-            ax.scatter(len(train_loss_arr) - 1, train_loss_arr[-1])
-            ax.text(len(train_loss_arr), train_loss_arr[-1]+0.1, f"({len(train_loss_arr) - 1}, {train_loss_arr[-1]:.6})")
-            ax.scatter(len(dev_loss_arr) - 1, dev_loss_arr[-1])
-            ax.text(len(dev_loss_arr), dev_loss_arr[-1]-0.1, f"({len(dev_loss_arr) - 1}, {dev_loss_arr[-1]:.6})")
-            ax.set_ylim([0, 5])
-            plt.xlabel("*100Steps")
-            plt.ylabel("Loss")
-            plt.title("Vocoder Loss")
-        # define and adjust figure
-        fig, ax = plt.subplots()
-        ax.set_facecolor('#DEDEDE')
-        plt.xlabel("total steps")
-        # animate
-        ani = FuncAnimation(fig, my_function, interval=1000)
-        plt.show()
-
-
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser(
-        formatter_class=argparse.ArgumentDefaultsHelpFormatter
-    )
-    parser.add_argument("model", type=str, help= \
-    "The model to show plot, model name is syn or voc")
-    args = parser.parse_args()
-    arg_dict = vars(args)
-    try:
-        main(arg_dict["model"])
-    except Exception as e:
-        print("Caught exception: %s" % repr(e))
-        print("Restarting\n")