RMVPE training & GUI QOL changes

RMVPE training, remove one click, add formant shifting, stop training button all added.
This commit is contained in:
kalomaze
2023-07-22 18:15:38 -05:00
committed by GitHub
13 changed files with 3932 additions and 2480 deletions

View File

@@ -1,3 +1,99 @@
# Features:
- Experimental Formant Shift using StftPitchShift(tried using praat with praatio but to no avail)
- Added `Stop Training` button when training, no need to restart RVC every time you want to stop the training of a model!
- Auto-detect Index path for models selected + Auto-detect paths, no more default values like this: `E:\codes\py39\vits_vc_gpu_train\logs\mi-test-1key\total_fea.npy`, We're getting Root Dir and subfolders using
```python
os.path.abspath(os.getcwd())
```
- Audio files dropdown by autodetecting files in `/audios/` folder
- More stable version of Gradio(3.34.0) with support of Themes
- Removed `One Click Training` button in `Training` Tab, since it was glitchy and has confused a lot of users.
- Changed default training settings to be more optimal for newer users.
- Auto-open TensorBoard localhost URL when `tensor-launch.py` is executed
- RMVPE implemented in both inferencing and training (the one in `Training` tab doesn't work properly though, requires some additional work to do)
## Installation:
1. Simply either extract directly or use git clone
2. Run `installstft.bat`. It'll automatically:
- Upgrade/Downgrade Gradio if its version isn't 3.34.0;
- Install `rmvpe.pt` if it hasn't been already installed;
- Install `StftPitchShift` if it hasn't been already installed;
3. Done! You're good to go and use the RVC-WebUI Tweaked by me for you to use :)
## Change Gradio Theme:
- [OPTIONAL] Change Gradio's theme:
1. Open `infer-web.py` in any code/text editing software (e.g. `notepad++`, `notepad`, `vscode`, etc)
2a. Press Ctrl+F and search for `with gr.Blocks(`, select the one that's not fully commented
2b. Go to line `1842`, you'll see the `with gr.Blocks(theme='HaleyCH/HaleyCH_Theme') as app:`
3. Go to [Gradio Theme Gallery](https://huggingface.co/spaces/gradio/theme-gallery):
3.1 Select any theme you like (e.g. [this one](https://huggingface.co/spaces/freddyaboulton/dracula_revamped))
3.2 Look at the top of the page
![image](https://github.com/alexlnkp/Mangio-RVC-Tweaks/assets/79400603/59e3e6a9-bdda-4ede-8161-00ee957c1715)
3.3 Copy theme variable(in this case, it's `theme='freddyaboulton/dracula_revamped'`)
4. Replace `theme='HaleyCH/HaleyCH_Theme'` in `infer-web.py` with any value of a theme from [Gradio Theme Gallery](https://huggingface.co/spaces/gradio/theme-gallery)
### Current Todo-list:
- [x] Fix `Unload voice to save GPU memory` button Traceback
- [ ] Add Accordions so people with Firefox browser get a much more compact GUI rather than [This](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/assets/79400603/67e0cc08-82a2-4dc3-86cf-e23d1dcad9f8).
- [ ] Fix weird way Median Filtering value inputted in a slider is utilized
- [ ] Replace regular refresh buttons with these tiny ones from [AUTOMATIC'S1111 Stable DIffusion](https://github.com/AUTOMATIC1111/stable-diffusion-webui)
![image](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/assets/79400603/fe37051e-4c95-4d30-9254-87d44436bb9e)
- [ ] Add a way to change the Gradio's theme from WebUI itself, like in [AUTOMATIC'S1111 Stable DIffusion](https://github.com/AUTOMATIC1111/stable-diffusion-webui)
![image](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/assets/79400603/7b93e167-564a-41d3-9a76-ded20063cdc3)
- [ ] Implement Praat in the GUI for f0 curve file manipulation and easier usage
# Screenshots:
- ## Inference Tab:
![image](https://github.com/Mangio621/Mangio-RVC-Fork/assets/79400603/107aa15a-4e8d-4f77-a327-45f35a235fcf)
- ## UVR Tab:
![image](https://github.com/Mangio621/Mangio-RVC-Fork/assets/79400603/7e57242a-4950-40c8-bf2a-8f77e992af26)
- ## Training Tab:
![image](https://github.com/Mangio621/Mangio-RVC-Fork/assets/79400603/a19ce156-5532-4761-aa06-8a537f80c368)
- ## Ckpt-Processing Tab:
![image](https://github.com/Mangio621/Mangio-RVC-Fork/assets/79400603/0cdc285e-a184-48f3-92a7-65f6120caf2f)
The rest of the tabs are left untouched code-wise.
# Formant Shift:
![image](https://github.com/Mangio621/Mangio-RVC-Fork/assets/79400603/300ebce2-36c7-4761-b1dd-b31403ad2cd1)
- ### Click `Apply` button every time you change the values for inferencing.
- ### As the name implies, you can only use `wav` files so far, also it is very slow, so be patient.
- ### If you added a new `preset.txt` in the `\formantshiftcfg\` folder, click button with refresh emoji
- ### If the preset you selected somehow got edited, by pressing refresh emoji button you'll update values, by grabbing them from the file
<div align="center">
<h1>Mangio-RVC-Fork with v2 Support! 💻 </h1>
A fork of an easy-to-use SVC framework based on VITS with top1 retrieval 💯. In general, this fork provides a CLI interface in addition. And also gives you more f0 methods to use, as well as a personlized 'hybrid' f0 estimation method using nanmedian. <br><br>

View File

@@ -17,6 +17,12 @@ from multiprocessing import Process
exp_dir = sys.argv[1]
f = open("%s/extract_f0_feature.log" % exp_dir, "a+")
DoFormant = False
with open('formanting.txt', 'r') as fvf:
content = fvf.readlines()
Quefrency, Timbre = content[1].split('\n')[0], content[2].split('\n')[0]
def printt(strr):
print(strr)
@@ -199,7 +205,7 @@ class FeatureInput(object):
return f0_median_hybrid
def compute_f0(self, path, f0_method, crepe_hop_length):
x = load_audio(path, self.fs)
x = load_audio(path, self.fs, DoFormant, Quefrency, Timbre)
p_len = x.shape[0] // self.hop
if f0_method == "pm":
time_step = 160 / 16000 * 1000
@@ -227,6 +233,14 @@ class FeatureInput(object):
frame_period=1000 * self.hop / self.fs,
)
f0 = pyworld.stonemask(x.astype(np.double), f0, t, self.fs)
elif f0_method == "rmvpe":
if hasattr(self, "model_rmvpe") == False:
from rmvpe import RMVPE
print("loading rmvpe model")
self.model_rmvpe = RMVPE(
"rmvpe.pt", is_half=False, device="cuda:0"
)
f0 = self.model_rmvpe.infer_from_audio(x, thred=0.03)
elif f0_method == "dio":
f0, t = pyworld.dio(
x.astype(np.double),

3
formanting.txt Normal file
View File

@@ -0,0 +1,3 @@
False
8.0
1.2

2
formantshiftcfg/f2m.txt Normal file
View File

@@ -0,0 +1,2 @@
8.0
-1.2

2
formantshiftcfg/m2f.txt Normal file
View File

@@ -0,0 +1,2 @@
8.0
1.2

View File

@@ -0,0 +1,2 @@
16.0
9.8

File diff suppressed because it is too large Load Diff

View File

@@ -1,8 +1,13 @@
import ffmpeg
import numpy as np
#import praatio
#import praatio.praat_scripts
import os
#from os.path import join
#praatEXE = join('.',os.path.abspath(os.getcwd()) + r"\Praat.exe")
def load_audio(file, sr):
def load_audio(file, sr, DoFormant, Quefrency, Timbre):
try:
# https://github.com/openai/whisper/blob/main/whisper/audio.py#L26
# This launches a subprocess to decode audio while down-mixing and resampling as necessary.
@@ -10,11 +15,44 @@ def load_audio(file, sr):
file = (
file.strip(" ").strip('"').strip("\n").strip('"').strip(" ")
) # 防止小白拷路径头尾带了空格和"和回车
out, _ = (
ffmpeg.input(file, threads=0)
.output("-", format="f32le", acodec="pcm_f32le", ac=1, ar=sr)
.run(cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True)
file_formanted = (
file.strip(" ").strip('"').strip("\n").strip('"').strip(" ")
)
with open('formanting.txt', 'r') as fvf:
content = fvf.readlines()
if 'True' in content[0].split('\n')[0]:
#print("true")
DoFormant = True
Quefrency, Timbre = content[1].split('\n')[0], content[2].split('\n')[0]
else:
#print("not true")
DoFormant = False
if DoFormant:
#os.system(f"stftpitchshift -i {file} -q {Quefrency} -t {Timbre} -o {file_formanted}")
#print('stftpitchshift -i "%s" -p 1.0 --rms -w 128 -v 8 -q %s -t %s -o "%s"' % (file, Quefrency, Timbre, file_formanted))
print("formanting...")
os.system('stftpitchshift -i "%s" -q %s -t %s -o "%sFORMANTED"' % (file, Quefrency, Timbre, file_formanted))
print("formanted!")
#filepraat = (os.path.abspath(os.getcwd()) + '\\' + file).replace('/','\\')
#file_formantedpraat = ('"' + os.path.abspath(os.getcwd()) + '/' + 'formanted'.join(file_formanted) + '"').replace('/','\\')
out, _ = (
ffmpeg.input('%sFORMANTED%s' % (file_formanted, '.wav'), threads=0)
.output("-", format="f32le", acodec="pcm_f32le", ac=1, ar=sr)
.run(cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True)
)
os.remove('%sFORMANTED%s' % (file_formanted, '.wav'))
else:
out, _ = (
ffmpeg.input(file, threads=0)
.output("-", format="f32le", acodec="pcm_f32le", ac=1, ar=sr)
.run(cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True)
)
except Exception as e:
raise RuntimeError(f"Failed to load audio: {e}")

View File

@@ -45,3 +45,4 @@ httpx==0.23.0
#onnxruntime-gpu
torchcrepe==0.0.20
fastapi==0.88
stftpitchshift==1.5.1

0
stop.txt Normal file
View File

View File

@@ -568,7 +568,28 @@ def train_and_evaluate(
),
)
)
with open("stop.txt", "r+") as tostop:
content = tostop.read()
if 'stop' in content:
logger.info("Stop Button was pressed. The program is closed.")
if hasattr(net_g, "module"):
ckpt = net_g.module.state_dict()
else:
ckpt = net_g.state_dict()
logger.info(
"saving final ckpt:%s"
% (
savee(
ckpt, hps.sample_rate, hps.if_f0, hps.name, epoch, hps.version, hps
)
)
)
tostop.truncate(0)
tostop.writelines("not")
os._exit(2333333)
if rank == 0:
logger.info("====> Epoch: {} {}".format(epoch, epoch_recorder.record()))
if epoch >= hps.total_epoch and rank == 0:

View File

@@ -17,9 +17,16 @@ import multiprocessing
from my_utils import load_audio
import tqdm
DoFormant = False
Quefrency = 0.0
Timbre = 0.0
mutex = multiprocessing.Lock()
f = open("%s/preprocess.log" % exp_dir, "a+")
with open('formanting.txt', 'r') as fvf:
content = fvf.readlines()
Quefrency, Timbre = content[1].split('\n')[0], content[2].split('\n')[0]
def println(strr):
mutex.acquire()
@@ -77,7 +84,7 @@ class PreProcess:
def pipeline(self, path, idx0):
try:
audio = load_audio(path, self.sr)
audio = load_audio(path, self.sr, DoFormant, Quefrency, Timbre)
# zero phased digital filter cause pre-ringing noise...
# audio = signal.filtfilt(self.bh, self.ah, audio)
audio = signal.lfilter(self.bh, self.ah, audio)