mirror of
https://github.com/Mangio621/Mangio-RVC-Fork.git
synced 2026-02-24 03:49:51 +01:00
RMVPE training & GUI QOL changes
RMVPE training, remove one click, add formant shifting, stop training button all added.
This commit is contained in:
96
README.md
96
README.md
@@ -1,3 +1,99 @@
|
||||
# Features:
|
||||
- Experimental Formant Shift using StftPitchShift(tried using praat with praatio but to no avail)
|
||||
- Added `Stop Training` button when training, no need to restart RVC every time you want to stop the training of a model!
|
||||
- Auto-detect Index path for models selected + Auto-detect paths, no more default values like this: `E:\codes\py39\vits_vc_gpu_train\logs\mi-test-1key\total_fea.npy`, We're getting Root Dir and subfolders using
|
||||
```python
|
||||
os.path.abspath(os.getcwd())
|
||||
```
|
||||
- Audio files dropdown by autodetecting files in `/audios/` folder
|
||||
- More stable version of Gradio(3.34.0) with support of Themes
|
||||
- Removed `One Click Training` button in `Training` Tab, since it was glitchy and has confused a lot of users.
|
||||
- Changed default training settings to be more optimal for newer users.
|
||||
- Auto-open TensorBoard localhost URL when `tensor-launch.py` is executed
|
||||
- RMVPE implemented in both inferencing and training (the one in `Training` tab doesn't work properly though, requires some additional work to do)
|
||||
|
||||
## Installation:
|
||||
|
||||
1. Simply either extract directly or use git clone
|
||||
|
||||
2. Run `installstft.bat`. It'll automatically:
|
||||
- Upgrade/Downgrade Gradio if its version isn't 3.34.0;
|
||||
- Install `rmvpe.pt` if it hasn't been already installed;
|
||||
- Install `StftPitchShift` if it hasn't been already installed;
|
||||
|
||||
|
||||
|
||||
3. Done! You're good to go and use the RVC-WebUI Tweaked by me for you to use :)
|
||||
|
||||
## Change Gradio Theme:
|
||||
|
||||
- [OPTIONAL] Change Gradio's theme:
|
||||
1. Open `infer-web.py` in any code/text editing software (e.g. `notepad++`, `notepad`, `vscode`, etc)
|
||||
|
||||
2a. Press Ctrl+F and search for `with gr.Blocks(`, select the one that's not fully commented
|
||||
|
||||
2b. Go to line `1842`, you'll see the `with gr.Blocks(theme='HaleyCH/HaleyCH_Theme') as app:`
|
||||
|
||||
3. Go to [Gradio Theme Gallery](https://huggingface.co/spaces/gradio/theme-gallery):
|
||||
|
||||
3.1 Select any theme you like (e.g. [this one](https://huggingface.co/spaces/freddyaboulton/dracula_revamped))
|
||||
|
||||
3.2 Look at the top of the page
|
||||
|
||||

|
||||
|
||||
3.3 Copy theme variable(in this case, it's `theme='freddyaboulton/dracula_revamped'`)
|
||||
|
||||
4. Replace `theme='HaleyCH/HaleyCH_Theme'` in `infer-web.py` with any value of a theme from [Gradio Theme Gallery](https://huggingface.co/spaces/gradio/theme-gallery)
|
||||
|
||||
### Current Todo-list:
|
||||
|
||||
- [x] Fix `Unload voice to save GPU memory` button Traceback
|
||||
- [ ] Add Accordions so people with Firefox browser get a much more compact GUI rather than [This](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/assets/79400603/67e0cc08-82a2-4dc3-86cf-e23d1dcad9f8).
|
||||
- [ ] Fix weird way Median Filtering value inputted in a slider is utilized
|
||||
- [ ] Replace regular refresh buttons with these tiny ones from [AUTOMATIC'S1111 Stable DIffusion](https://github.com/AUTOMATIC1111/stable-diffusion-webui)
|
||||

|
||||
- [ ] Add a way to change the Gradio's theme from WebUI itself, like in [AUTOMATIC'S1111 Stable DIffusion](https://github.com/AUTOMATIC1111/stable-diffusion-webui)
|
||||

|
||||
- [ ] Implement Praat in the GUI for f0 curve file manipulation and easier usage
|
||||
|
||||
|
||||
# Screenshots:
|
||||
|
||||
- ## Inference Tab:
|
||||
|
||||

|
||||
|
||||
- ## UVR Tab:
|
||||
|
||||

|
||||
|
||||
- ## Training Tab:
|
||||
|
||||

|
||||
|
||||
- ## Ckpt-Processing Tab:
|
||||
|
||||

|
||||
|
||||
The rest of the tabs are left untouched code-wise.
|
||||
|
||||
|
||||
|
||||
# Formant Shift:
|
||||
|
||||

|
||||
|
||||
- ### Click `Apply` button every time you change the values for inferencing.
|
||||
|
||||
- ### As the name implies, you can only use `wav` files so far, also it is very slow, so be patient.
|
||||
|
||||
- ### If you added a new `preset.txt` in the `\formantshiftcfg\` folder, click button with refresh emoji
|
||||
|
||||
- ### If the preset you selected somehow got edited, by pressing refresh emoji button you'll update values, by grabbing them from the file
|
||||
|
||||
|
||||
|
||||
<div align="center">
|
||||
<h1>Mangio-RVC-Fork with v2 Support! 💻 </h1>
|
||||
A fork of an easy-to-use SVC framework based on VITS with top1 retrieval 💯. In general, this fork provides a CLI interface in addition. And also gives you more f0 methods to use, as well as a personlized 'hybrid' f0 estimation method using nanmedian. <br><br>
|
||||
|
||||
@@ -17,6 +17,12 @@ from multiprocessing import Process
|
||||
exp_dir = sys.argv[1]
|
||||
f = open("%s/extract_f0_feature.log" % exp_dir, "a+")
|
||||
|
||||
DoFormant = False
|
||||
|
||||
with open('formanting.txt', 'r') as fvf:
|
||||
content = fvf.readlines()
|
||||
Quefrency, Timbre = content[1].split('\n')[0], content[2].split('\n')[0]
|
||||
|
||||
|
||||
def printt(strr):
|
||||
print(strr)
|
||||
@@ -199,7 +205,7 @@ class FeatureInput(object):
|
||||
return f0_median_hybrid
|
||||
|
||||
def compute_f0(self, path, f0_method, crepe_hop_length):
|
||||
x = load_audio(path, self.fs)
|
||||
x = load_audio(path, self.fs, DoFormant, Quefrency, Timbre)
|
||||
p_len = x.shape[0] // self.hop
|
||||
if f0_method == "pm":
|
||||
time_step = 160 / 16000 * 1000
|
||||
@@ -227,6 +233,14 @@ class FeatureInput(object):
|
||||
frame_period=1000 * self.hop / self.fs,
|
||||
)
|
||||
f0 = pyworld.stonemask(x.astype(np.double), f0, t, self.fs)
|
||||
elif f0_method == "rmvpe":
|
||||
if hasattr(self, "model_rmvpe") == False:
|
||||
from rmvpe import RMVPE
|
||||
print("loading rmvpe model")
|
||||
self.model_rmvpe = RMVPE(
|
||||
"rmvpe.pt", is_half=False, device="cuda:0"
|
||||
)
|
||||
f0 = self.model_rmvpe.infer_from_audio(x, thred=0.03)
|
||||
elif f0_method == "dio":
|
||||
f0, t = pyworld.dio(
|
||||
x.astype(np.double),
|
||||
|
||||
3
formanting.txt
Normal file
3
formanting.txt
Normal file
@@ -0,0 +1,3 @@
|
||||
False
|
||||
8.0
|
||||
1.2
|
||||
2
formantshiftcfg/f2m.txt
Normal file
2
formantshiftcfg/f2m.txt
Normal file
@@ -0,0 +1,2 @@
|
||||
8.0
|
||||
-1.2
|
||||
2
formantshiftcfg/m2f.txt
Normal file
2
formantshiftcfg/m2f.txt
Normal file
@@ -0,0 +1,2 @@
|
||||
8.0
|
||||
1.2
|
||||
2
formantshiftcfg/random.txt
Normal file
2
formantshiftcfg/random.txt
Normal file
@@ -0,0 +1,2 @@
|
||||
16.0
|
||||
9.8
|
||||
6210
infer-web.py
6210
infer-web.py
File diff suppressed because it is too large
Load Diff
48
my_utils.py
48
my_utils.py
@@ -1,8 +1,13 @@
|
||||
import ffmpeg
|
||||
import numpy as np
|
||||
#import praatio
|
||||
#import praatio.praat_scripts
|
||||
import os
|
||||
#from os.path import join
|
||||
|
||||
#praatEXE = join('.',os.path.abspath(os.getcwd()) + r"\Praat.exe")
|
||||
|
||||
def load_audio(file, sr):
|
||||
def load_audio(file, sr, DoFormant, Quefrency, Timbre):
|
||||
try:
|
||||
# https://github.com/openai/whisper/blob/main/whisper/audio.py#L26
|
||||
# This launches a subprocess to decode audio while down-mixing and resampling as necessary.
|
||||
@@ -10,11 +15,44 @@ def load_audio(file, sr):
|
||||
file = (
|
||||
file.strip(" ").strip('"').strip("\n").strip('"').strip(" ")
|
||||
) # 防止小白拷路径头尾带了空格和"和回车
|
||||
out, _ = (
|
||||
ffmpeg.input(file, threads=0)
|
||||
.output("-", format="f32le", acodec="pcm_f32le", ac=1, ar=sr)
|
||||
.run(cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True)
|
||||
file_formanted = (
|
||||
file.strip(" ").strip('"').strip("\n").strip('"').strip(" ")
|
||||
)
|
||||
with open('formanting.txt', 'r') as fvf:
|
||||
content = fvf.readlines()
|
||||
if 'True' in content[0].split('\n')[0]:
|
||||
#print("true")
|
||||
DoFormant = True
|
||||
Quefrency, Timbre = content[1].split('\n')[0], content[2].split('\n')[0]
|
||||
|
||||
else:
|
||||
#print("not true")
|
||||
DoFormant = False
|
||||
|
||||
if DoFormant:
|
||||
#os.system(f"stftpitchshift -i {file} -q {Quefrency} -t {Timbre} -o {file_formanted}")
|
||||
#print('stftpitchshift -i "%s" -p 1.0 --rms -w 128 -v 8 -q %s -t %s -o "%s"' % (file, Quefrency, Timbre, file_formanted))
|
||||
print("formanting...")
|
||||
|
||||
os.system('stftpitchshift -i "%s" -q %s -t %s -o "%sFORMANTED"' % (file, Quefrency, Timbre, file_formanted))
|
||||
print("formanted!")
|
||||
#filepraat = (os.path.abspath(os.getcwd()) + '\\' + file).replace('/','\\')
|
||||
#file_formantedpraat = ('"' + os.path.abspath(os.getcwd()) + '/' + 'formanted'.join(file_formanted) + '"').replace('/','\\')
|
||||
|
||||
out, _ = (
|
||||
ffmpeg.input('%sFORMANTED%s' % (file_formanted, '.wav'), threads=0)
|
||||
.output("-", format="f32le", acodec="pcm_f32le", ac=1, ar=sr)
|
||||
.run(cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True)
|
||||
)
|
||||
|
||||
os.remove('%sFORMANTED%s' % (file_formanted, '.wav'))
|
||||
else:
|
||||
|
||||
out, _ = (
|
||||
ffmpeg.input(file, threads=0)
|
||||
.output("-", format="f32le", acodec="pcm_f32le", ac=1, ar=sr)
|
||||
.run(cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True)
|
||||
)
|
||||
except Exception as e:
|
||||
raise RuntimeError(f"Failed to load audio: {e}")
|
||||
|
||||
|
||||
@@ -45,3 +45,4 @@ httpx==0.23.0
|
||||
#onnxruntime-gpu
|
||||
torchcrepe==0.0.20
|
||||
fastapi==0.88
|
||||
stftpitchshift==1.5.1
|
||||
|
||||
@@ -568,7 +568,28 @@ def train_and_evaluate(
|
||||
),
|
||||
)
|
||||
)
|
||||
|
||||
|
||||
with open("stop.txt", "r+") as tostop:
|
||||
content = tostop.read()
|
||||
if 'stop' in content:
|
||||
logger.info("Stop Button was pressed. The program is closed.")
|
||||
if hasattr(net_g, "module"):
|
||||
ckpt = net_g.module.state_dict()
|
||||
else:
|
||||
ckpt = net_g.state_dict()
|
||||
logger.info(
|
||||
"saving final ckpt:%s"
|
||||
% (
|
||||
savee(
|
||||
ckpt, hps.sample_rate, hps.if_f0, hps.name, epoch, hps.version, hps
|
||||
)
|
||||
)
|
||||
)
|
||||
|
||||
tostop.truncate(0)
|
||||
tostop.writelines("not")
|
||||
os._exit(2333333)
|
||||
|
||||
if rank == 0:
|
||||
logger.info("====> Epoch: {} {}".format(epoch, epoch_recorder.record()))
|
||||
if epoch >= hps.total_epoch and rank == 0:
|
||||
|
||||
@@ -17,9 +17,16 @@ import multiprocessing
|
||||
from my_utils import load_audio
|
||||
import tqdm
|
||||
|
||||
DoFormant = False
|
||||
Quefrency = 0.0
|
||||
Timbre = 0.0
|
||||
|
||||
mutex = multiprocessing.Lock()
|
||||
f = open("%s/preprocess.log" % exp_dir, "a+")
|
||||
|
||||
with open('formanting.txt', 'r') as fvf:
|
||||
content = fvf.readlines()
|
||||
Quefrency, Timbre = content[1].split('\n')[0], content[2].split('\n')[0]
|
||||
|
||||
def println(strr):
|
||||
mutex.acquire()
|
||||
@@ -77,7 +84,7 @@ class PreProcess:
|
||||
|
||||
def pipeline(self, path, idx0):
|
||||
try:
|
||||
audio = load_audio(path, self.sr)
|
||||
audio = load_audio(path, self.sr, DoFormant, Quefrency, Timbre)
|
||||
# zero phased digital filter cause pre-ringing noise...
|
||||
# audio = signal.filtfilt(self.bh, self.ah, audio)
|
||||
audio = signal.lfilter(self.bh, self.ah, audio)
|
||||
|
||||
Reference in New Issue
Block a user