a little massage

2026-04-03 09:46:45 +02:00 · 2024-04-11 07:17:28 -07:00
parent b818145ad9
commit ad6c2cd836
4 changed files with 26 additions and 22 deletions
--- a/README.md
+++ b/README.md
@@ -1,6 +1,5 @@
 # VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
-[Demo](https://jasonppy.github.io/VoiceCraft_web) [Paper](https://jasonppy.github.io/assets/pdfs/VoiceCraft.pdf)
-
+[![Paper](https://img.shields.io/badge/arXiv-2301.12503-brightgreen.svg?style=flat-square)](https://jasonppy.github.io/assets/pdfs/VoiceCraft.pdf)  [![githubio](https://img.shields.io/badge/GitHub.io-Audio_Samples-blue?logo=Github&style=flat-square)](https://jasonppy.github.io/VoiceCraft_web/)  [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/pyp1/VoiceCraft_gradio)  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1IOjpglQyMTO2C3Y94LD9FY0Ocn-RJRg6?usp=sharing)

 ### TL;DR
 VoiceCraft is a token infilling neural codec language model, that achieves state-of-the-art performance on both **speech editing** and **zero-shot text-to-speech (TTS)** on in-the-wild data including audiobooks, internet videos, and podcasts.
@@ -8,20 +7,22 @@ VoiceCraft is a token infilling neural codec language model, that achieves state
 To clone or edit an unseen voice, VoiceCraft needs only a few seconds of reference.

 ## How to run inference
-There are three ways:
+There are three ways (besides running Gradio in Colab):

-1. with Google Colab. see [quickstart colab](#quickstart-colab)
+1. More flexible inference beyond Gradio UI in Google Colab. see [quickstart colab](#quickstart-colab)
 2. with docker. see [quickstart docker](#quickstart-docker)
-3. without docker. see [environment setup](#environment-setup)
+3. without docker. see [environment setup](#environment-setup). You can also run gradio locally if you choose this option

 When you are inside the docker image or you have installed all dependencies, Checkout [`inference_tts.ipynb`](./inference_tts.ipynb).

 If you want to do model development such as training/finetuning, I recommend following [envrionment setup](#environment-setup) and [training](#training).

 ## News
-:star: 03/28/2024: Model weights for giga330M and giga830M are up on HuggingFace🤗 [here](https://huggingface.co/pyp1/VoiceCraft/tree/main)!
+:star: 04/11/2024: VoiceCraft Gradio is now available on HuggingFace Spaces [here](https://huggingface.co/spaces/pyp1/VoiceCraft_gradio)! Major thanks to [@zuev-stepan](https://github.com/zuev-stepan), [@Sewlell](https://github.com/Sewlell), [@pgsoar](https://github.com/pgosar) [@Ph0rk0z](https://github.com/Ph0rk0z). 

-:star: 04/05/2024: I finetuned giga330M with the TTS objective on gigaspeech and 1/5 of librilight, the model outperforms giga830M on TTS. Weights are [here](https://huggingface.co/pyp1/VoiceCraft/tree/main). Make sure maximal prompt + generation length <= 16 seconds (due to our limited compute, we had to drop utterances longer than 16s in training data)
+:star: 04/05/2024: I finetuned giga330M with the TTS objective on gigaspeech and 1/5 of librilight. Weights are [here](https://huggingface.co/pyp1/VoiceCraft/tree/main). Make sure maximal prompt + generation length <= 16 seconds (due to our limited compute, we had to drop utterances longer than 16s in training data). Even stronger models forthcomming, stay tuned!
+
+:star: 03/28/2024: Model weights for giga330M and giga830M are up on HuggingFace🤗 [here](https://huggingface.co/pyp1/VoiceCraft/tree/main)!

 ## TODO
 - [x] Codebase upload
@@ -30,9 +31,12 @@ If you want to do model development such as training/finetuning, I recommend fol
 - [x] Training guidance
 - [x] RealEdit dataset and training manifest
 - [x] Model weights (giga330M.pth, giga830M.pth, and gigaHalfLibri330M_TTSEnhanced_max16s.pth)
- [x] Write colab notebooks for better hands-on experience
- [ ] HuggingFace Spaces demo
- [ ] Better guidance on training/finetuning
+- [x] Better guidance on training/finetuning
+- [x] Colab notebooks
+- [x] HuggingFace Spaces demo
+- [ ] Command line
+- [ ] Improve efficiency
+


 ## QuickStart Colab
@@ -109,7 +113,7 @@ Checkout [`inference_speech_editing.ipynb`](./inference_speech_editing.ipynb) an
 ## Gradio
 ### Run in colab

-[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zuev-stepan/VoiceCraft-gradio/blob/feature/colab-notebook/voicecraft-gradio-colab.ipynb)
+[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1IOjpglQyMTO2C3Y94LD9FY0Ocn-RJRg6?usp=sharing)

 ### Run locally
 After environment setup install additional dependencies: