2023-04-26 10:02:30 +08:00
# AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
2023-03-23 23:19:33 +08:00
2023-04-26 10:02:30 +08:00
[](https://arxiv.org/abs/2304.12995)
[](https://github.com/AIGC-Audio/AudioGPT)
2023-04-26 23:24:29 +08:00

2023-04-26 10:02:30 +08:00
[](https://huggingface.co/spaces/AIGC-Audio/AudioGPT)
2023-03-23 23:19:33 +08:00
2023-04-26 23:24:29 +08:00
We provide our implementation and pretrained models as open source in this repository.
## Get Started
Please refer to [run.md ](run.md )
2023-03-27 19:54:59 +08:00
## Capabilities
2023-03-28 15:19:16 +08:00
2023-04-19 00:46:11 +08:00
Here we list the capability of AudioGPT at this time. More supported models and tasks are coming soon. For prompt examples, refer to [asset ](assets/README.md ).
2023-03-27 19:54:59 +08:00
2023-04-30 17:58:31 +02:00
Currently not every model has repository.
2023-03-28 15:19:16 +08:00
### Speech
2023-04-06 23:54:27 +08:00
| Task | Supported Foundation Models | Status |
|:--------------------------:|:-------------------------------:|:------:|
2023-04-30 17:58:31 +02:00
| Text-to-Speech | [FastSpeech ](https://github.com/ming024/FastSpeech2 ), [SyntaSpeech ](https://github.com/yerfor/SyntaSpeech ), [VITS ](https://github.com/jaywalnut310/vits ) | Yes (WIP) |
| Style Transfer | [GenerSpeech ](https://github.com/Rongjiehuang/GenerSpeech ) | Yes |
| Speech Recognition | [whisper ](https://github.com/openai/whisper ), [Conformer ](https://github.com/sooftware/conformer ) | Yes |
2023-04-12 15:17:54 +08:00
| Speech Enhancement | [ConvTasNet]() | Yes (WIP) |
2023-04-30 17:58:31 +02:00
| Speech Separation | [TF-GridNet ](https://arxiv.org/pdf/2211.12433.pdf ) | Yes (WIP) |
| Speech Translation | [Multi-decoder ](https://arxiv.org/pdf/2109.12804.pdf ) | WIP |
| Mono-to-Binaural | [NeuralWarp ](https://github.com/fdarmon/NeuralWarp ) | Yes |
2023-03-28 15:19:16 +08:00
### Sing
| Task | Supported Foundation Models | Status |
|:-------------------------:|:-------------------------------:|:------:|
2023-04-30 17:59:47 +02:00
| Text-to-Sing | [DiffSinger ](https://github.com/MoonInTheRiver/DiffSinger ), [VISinger ](https://github.com/jerryuhoo/VISinger ) | Yes (WIP) |
2023-03-28 15:19:16 +08:00
### Audio
2023-04-06 23:54:27 +08:00
| Task | Supported Foundation Models | Status |
|:----------------------:|:---------------------------:|:------:|
| Text-to-Audio | [Make-An-Audio]() | Yes |
| Audio Inpainting | [Make-An-Audio]() | Yes |
| Image-to-Audio | [Make-An-Audio]() | Yes |
2023-04-30 17:58:31 +02:00
| Sound Detection | [Audio-transformer ](https://github.com/RetroCirce/HTS-Audio-Transformer ) | Yes |
| Target Sound Detection | [TSDNet ](https://github.com/gy65896/TSDNet ) | Yes |
| Sound Extraction | [LASSNet ](https://github.com/liuxubo717/LASS ) | Yes |
2023-03-29 21:24:07 +08:00
2023-03-28 15:19:16 +08:00
### Talking Head
2023-04-02 20:05:12 +07:00
| Task | Supported Foundation Models | Status |
|:-------------------------:|:-------------------------------:|:----------:|
2023-04-30 17:58:31 +02:00
| Talking Head Synthesis | [GeneFace ](https://github.com/yerfor/GeneFace ) | Yes (WIP) |
2023-03-23 23:19:33 +08:00
## Acknowledgement
We appreciate the open source of the following projects:
2023-04-26 10:02:30 +08:00
[ESPNet ](https://github.com/espnet/espnet )   ;
[NATSpeech ](https://github.com/NATSpeech/NATSpeech )   ;
2023-03-23 23:19:33 +08:00
[Visual ChatGPT ](https://github.com/microsoft/visual-chatgpt )   ;
[Hugging Face ](https://github.com/huggingface )   ;
[LangChain ](https://github.com/hwchase17/langchain )   ;
[Stable Diffusion ](https://github.com/CompVis/stable-diffusion )   ;
2023-04-21 11:19:36 +08:00