💓 Please support the original RVC repository. Without it, obviously this fork wouldn't have been possible. The Mangio-RVC-Fork aims to essentially enhance the features that the original RVC repo has in my own way. Please note that this fork is NOT STABLE and was forked with the intention of experimentation. Do not use this Fork thinking it is a "better" version of the original repo. Think of it more like another "version" of the original repo. Please note that this doesn't have a google colab. If you want to use google colab, go to the original repository. This fork is intended to be used with paperspace and local machines for now.

Add me on discord: Funky Town#2048

I am able to communicate with you here and there.

Special thanks to discord user @kalomaze#2983 for creating a temporary colab notebook for this fork for the time being. Eventually, an official, more stable notebook will be included with this fork. Please use paperspace instead if you can as it is much more stable.

The original RVC Demo Video here!

Realtime Voice Conversion Software using RVC : w-okada/voice-changer

The dataset for the pre-training model uses nearly 50 hours of high quality VCTK open source dataset.

High quality licensed song datasets will be added to training-set one after another for your use, without worrying about copyright infringement.

Summary 📘

Features that this fork (Mangio-RVC-Fork) has that the original repo doesn't ☑️

f0 Inference algorithm overhaul: 🌟
- Added pyworld dio f0 method.
- Added torchcrepe crepe f0 method. (Increases pitch accuracy and stability ALOT)
- Added torchcrepe crepe-tiny model. (Faster on inference, but probably worse quality than normal full crepe)
- Modifiable crepe_hop_length for the crepe algorithm via the web_gui
f0 Crepe Pitch Extraction for training. 🌟 (EXPERIMENTAL) Works on paperspace machines but not local mac/windows machines. Potential memory leak. Watch out.
Paperspace integration 🌟
- Paperspace argument on infer-web.py (--paperspace) that shares a gradio link
- Make file for paperspace users
Tensorboard access via Makefile (make tensorboard)
Total epoch slider for the training now limited to 10,000 not just 1000.
Added CLI functionality
- added train-index-cli.py to train the feature index without the GUI
- added extract-small-model.py to extract the small model without the GUI
- added infer-cli.py to do inference without the GUI.

This repository has the following features too:

Reduce tone leakage by replacing source feature to training-set feature using top1 retrieval;
Easy and fast training, even on relatively poor graphics cards;
Training with a small amount of data also obtains relatively good results (>=10min low noise speech recommended);
Supporting model fusion to change timbres (using ckpt processing tab->ckpt merge);
Easy-to-use Webui interface;
Use the UVR5 model to quickly separate vocals and instruments.

Features planned to be added during the fork's development ▶️

An inference batcher script. Improvement Proposal:
- According to various github users, apparently inferencing 30 second samples at a time both increases quality and prevents harvest memory errors.
Improved GUI (More convenience).
Automatic removal of old generations to save space.
Potentially a pyin f0 method or a hybrid f0 crepe method.
More Optimized training on paperspace machines

About this fork's crepe training:

Crepe training is still incredibly instable and there's been report of a memory leak. This will be fixed in the future, however it works quite well on paperspace machines. Please note that crepe training adds a little bit of difference against a harvest trained model. Crepe sounds clearer on some parts, but sounds more robotic on some parts too. Both I would say are equally good to train with, but I still think crepe on INFERENCE is not only quicker, but more pitch stable (especially with vocal layers). Right now, its quite stable to train with a harvest model and infer it with crepe. If you are training with crepe however (f0 feature extraction), please make sure your datasets are as dry as possible to reduce artifacts and unwanted harmonics as I assume the crepe pitch estimation latches on to reverb more.

If you get CUDA issues with crepe training, or pm and harvest etc.

This is due to the number of processes (n_p) being too high. Make sure to cut the number of threads down. Please lower the value of the "Number of CPU Threads to use" slider on the feature extraction GUI.

Installing the Dependencies 🖥️

Using pip (python3.9.8 is stable with this fork)

Paperspace Users:

cd Mangio-RVC-Fork
make install # Do this everytime you start your paperspace machine

Windows/MacOS

Notice: faiss 1.7.2 will raise Segmentation Fault: 11 under MacOS, please use pip install faiss-cpu==1.7.0 if you use pip to install it manually. Swig can be installed via brew under MacOS

brew install swig

Install requirements:

pip install -r requirements.txt

Preparation of other Pre-models ⬇️

Paperspace Users:

cd Mangio-RVC-Fork
make base # Do only once after cloning this fork (No need to do it again unless pre-models change on hugging face)

Local Users

RVC requires other pre-models to infer and train. You need to download them from our Huggingface space.

Here's a list of Pre-models and other files that RVC needs:

hubert_base.pt

./pretrained 

./uvr5_weights

#If you are using Windows, you may also need this dictionary, skip if FFmpeg is installed
ffmpeg.exe

Running the Web GUI to Infer & Train 💪

For paperspace users:

cd Mangio-RVC-Fork
make run

Then click the gradio link it provides.

Inference & Training with CLI 💪 🔠

Pre-processing the Dataset without the GUI

# arg 1 = Dataset Path
# arg 2 = Sample Rate
# arg 3 = Number of Threads
# arg 4 = Export Directory (logs/*YOUR DATASET FOLDER NAME*)
# arg 5 = No parallel: True or False
python trainset_preprocess_pipeline_print.py /INSERTDATASETNAMEHERE 40000 8 logs/mi-test True

f0 Feature Extraction without the GUI

# arg 1 = Path of model logs (logs/*YOUR MODEL NAME*)
# arg 2 = Number of threads to use
# arg 3 = f0 method: pm, harvest, dio, crepe
# arg 4 = Crepe Hop Length (Value is used if using crepe method)
python extract_f0_print.py logs/mi-test 4 harvest 128
# arg 1 = device
# arg 2 = n_part
# arg 3 = i_part
# arg 4 = GPU Device number ("0")
# arg 5 = Export Directory logs/*MODEL NAME*
python extract_feature_print.py cpu 1 0 0 logs/mi-test

Training without the GUI

# Arguments
# -e = Name of model
# -sr = Sample Rate
# -f0 = Model has pitch guidance? 1 for yes. 0 for no.
# -bs = Batch size
# -g = GPU card slot
# -te = Total Epochs
# -se = Save epoch interval
# -pg = Pretrained Generator Model Path
# -pd = Pretrained Discriminator Model Path
# -l = Save only latest model? 1 for yes 0 for no
# -c = Cache data in gpu? 1 for yes 0 for no
python train_nsf_sim_cache_sid_load_pretrain.py -e mi-test -sr 40k -f0 1 -bs 8 -g 0 -te 10000 -se 50 -pg pretrained/f0G40k.pth -pd pretrained/f0D40k.pth -l 0 -c 0

Training the Feature Index without the GUI

# + Mangio-RVC-Fork Feature. Train the index with the CLI
# arg1 = Model Name (name of the model folder in logs) 
python train-index-cli.py mi-test

Extract Model from checkpoint without the GUI

# + Mangio-RVC-Fork Feature. Extract Small Model from checkpoint from the CLI.
# The small model refers to the model that can be used for inference
# Arguments:
# arg1 = Path of the model checkpoint (g file path)
# arg2 = Model Save Name
# arg3 = Sample Rate: "32k" "40k" or "48k"
# arg4 = Has Pitch guidance (f0)? Either 1 for yes or 0 for no
# arg5 = Model Information. (OPTIONAL). 
python extract-small-model-cli.py logs/G_99750.pth MyModel 40k 1 "This is a cool model."

Inference without the GUI (Voice Conversion)

# + Mangio-RVC-Fork Feature. Infer audio with just the CLI
# Arguments
# arg1 = model name in weights folder. (mi-test.pth)
# arg2 = source file path (.wav)
# arg3 = output file name to be placed in ./audio-outputs (myoutput.wav).
# arg4 = feature index file path. (E:\added_IVF3042_Flat_nprobe_1.index)
# arg5 = speaker ID (0)
# arg6 = transposition. (12 = 12 semitones up)
# arg7 = f0 method. (harvest, pm, crepe, dio, crepe-tiny)
# arg8 = crepe hop length. Use 128. (applies to crepe f0 method only)
# arg9 = feature index ratio (0.78)
python infer-cli.py mi-test.pth E:\my-source-file.wav conversion_output.wav E:\added_IVF3042_Flat_nprobe_1.index 0 -2 pm 128 0.78

Running the Tensorboard 📉

cd Mangio-RVC-Fork
make tensorboard

Then click the tensorboard link it provides and refresh the data.

Other

If you are using Windows, you can download and extract RVC-beta.7z to use RVC directly and use go-web.bat to start Webui.

There's also a tutorial on RVC in Chinese and you can check it out if needed.

Credits

Thanks to all contributors for their efforts

Languages

Python 89.4%

Jupyter Notebook 4.8%

Shell 2.9%

Batchfile 2%

Makefile 0.9%