upload TTS finetuned 330M model

2026-04-03 09:46:45 +02:00 · 2024-04-05 16:42:59 -07:00
parent bae982eb15
commit 142772c3df
2 changed files with 21 additions and 18 deletions
--- a/README.md
+++ b/README.md
@@ -7,21 +7,7 @@ VoiceCraft is a token infilling neural codec language model, that achieves state

 To clone or edit an unseen voice, VoiceCraft needs only a few seconds of reference.

-## News
-:star: 03/28/2024: Model weights are up on HuggingFace🤗 [here](https://huggingface.co/pyp1/VoiceCraft/tree/main)!
-
-## TODO
- [x] Codebase upload
- [x] Environment setup
- [x] Inference demo for speech editing and TTS
- [x] Training guidance
- [x] RealEdit dataset and training manifest
- [x] Model weights (both 330M and 830M, the former seems to be just as good)
- [x] Write colab notebooks for better hands-on experience
- [ ] HuggingFace Spaces demo
- [ ] Better guidance on training/finetuning
-
-## How to run TTS inference
+## How to run inference
 There are three ways:

 1. with Google Colab. see [quickstart colab](#quickstart-colab)
@@ -32,6 +18,23 @@ When you are inside the docker image or you have installed all dependencies, Che

 If you want to do model development such as training/finetuning, I recommend following [envrionment setup](#environment-setup) and [training](#training).

+## News
+:star: 03/28/2024: Model weights for giga330M and giga830M are up on HuggingFace🤗 [here](https://huggingface.co/pyp1/VoiceCraft/tree/main)!
+
+:star: 04/05/2024: I finetuned giga330M with the TTS objective on gigaspeech and 1/5 of librilight, the model outperforms giga830M on TTS. Weights are [here](https://huggingface.co/pyp1/VoiceCraft/tree/main). Make sure maximal prompt + generation length <= 16 seconds (due to our limited compute, we had to drop utterances longer than 16s in training data)
+
+## TODO
+- [x] Codebase upload
+- [x] Environment setup
+- [x] Inference demo for speech editing and TTS
+- [x] Training guidance
+- [x] RealEdit dataset and training manifest
+- [x] Model weights (giga330M.pth, giga830M.pth, and gigaHalfLibri330M_TTSEnhanced_max16s.pth)
+- [x] Write colab notebooks for better hands-on experience
+- [ ] HuggingFace Spaces demo
+- [ ] Better guidance on training/finetuning
+
+
 ## QuickStart Colab

 :star: To try out speech editing or TTS Inference with VoiceCraft, the simplest way is using Google Colab.