2023-03-16 15:36:54 +08:00
# AudioGPT
2023-03-23 23:19:33 +08:00
2023-03-28 15:19:16 +08:00
**AudioGPT** connects ChatGPT and a series of Audio Foundation Models to enable **sending ** and **receiving ** speech, sing, audio, and talking head during chatting.
2023-03-23 23:19:33 +08:00
2023-04-06 23:54:27 +08:00
<a src="https://img.shields.io/badge/%F0%9F%A4%97-Open%20in%20Spaces-blue" href="https://huggingface.co/spaces/AIGC-Audio/AudioGPT">
<img src="https://img.shields.io/badge/%F0%9F%A4%97-Open%20in%20Spaces-blue" alt="Open in Spaces">
</a>
2023-03-23 23:19:33 +08:00
2023-03-27 19:54:59 +08:00
## Capabilities
2023-03-28 15:19:16 +08:00
2023-04-19 00:46:11 +08:00
Here we list the capability of AudioGPT at this time. More supported models and tasks are coming soon. For prompt examples, refer to [asset ](assets/README.md ).
2023-03-27 19:54:59 +08:00
2023-03-28 15:19:16 +08:00
### Speech
2023-04-06 23:54:27 +08:00
| Task | Supported Foundation Models | Status |
|:--------------------------:|:-------------------------------:|:------:|
| Text-to-Speech | [FastSpeech](), [SyntaSpeech](), [VITS]() | Yes (WIP) |
| Style Transfer | [GenerSpeech]() | Yes |
| Speech Recognition | [whisper](), [Conformer]() | Yes |
2023-04-12 15:17:54 +08:00
| Speech Enhancement | [ConvTasNet]() | Yes (WIP) |
| Speech Separation | [TF-GridNet]() | Yes (WIP) |
2023-04-06 23:54:27 +08:00
| Speech Translation | [Multi-decoder]() | WIP |
| Mono-to-Binaural | [NeuralWarp]() | Yes |
2023-03-28 15:19:16 +08:00
### Sing
| Task | Supported Foundation Models | Status |
|:-------------------------:|:-------------------------------:|:------:|
2023-03-28 01:35:02 -04:00
| Text-to-Sing | [DiffSinger](), [VISinger]() | Yes (WIP) |
2023-03-28 15:19:16 +08:00
### Audio
2023-04-06 23:54:27 +08:00
| Task | Supported Foundation Models | Status |
|:----------------------:|:---------------------------:|:------:|
| Text-to-Audio | [Make-An-Audio]() | Yes |
| Audio Inpainting | [Make-An-Audio]() | Yes |
| Image-to-Audio | [Make-An-Audio]() | Yes |
| Sound Detection | [Audio-transformer]() | Yes |
| Target Sound Detection | [TSDNet]() | Yes |
| Sound Extraction | [LASSNet]() | Yes |
2023-03-29 21:24:07 +08:00
2023-03-28 15:19:16 +08:00
### Talking Head
2023-04-02 20:05:12 +07:00
| Task | Supported Foundation Models | Status |
|:-------------------------:|:-------------------------------:|:----------:|
| Talking Head Synthesis | [GeneFace]() | Yes (WIP) |
2023-03-23 23:19:33 +08:00
2023-03-27 21:38:17 +08:00
## Todo
2023-04-02 20:05:12 +07:00
- [x] clean text to sing/speech code
2023-03-27 21:38:17 +08:00
- [ ] merge talking head synthesis into main
2023-04-02 20:05:12 +07:00
- [x] change audio/video log output
2023-04-06 23:54:27 +08:00
- [x] support huggingface space
2023-03-27 21:38:17 +08:00
2023-03-23 23:19:33 +08:00
## Acknowledgement
We appreciate the open source of the following projects:
[Visual ChatGPT ](https://github.com/microsoft/visual-chatgpt )   ;
[Hugging Face ](https://github.com/huggingface )   ;
[LangChain ](https://github.com/hwchase17/langchain )   ;
[Stable Diffusion ](https://github.com/CompVis/stable-diffusion )   ;