Files
AudioGPT/README.md

67 lines
3.6 KiB
Markdown
Raw Permalink Normal View History

2023-04-26 10:02:30 +08:00
# AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
2023-03-23 23:19:33 +08:00
2023-04-26 10:02:30 +08:00
[![arXiv](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)](https://arxiv.org/abs/2304.12995)
[![GitHub Stars](https://img.shields.io/github/stars/AIGC-Audio/AudioGPT?style=social)](https://github.com/AIGC-Audio/AudioGPT)
2023-04-26 23:24:29 +08:00
![visitors](https://visitor-badge.glitch.me/badge?page_id=AIGC-Audio.AudioGPT)
2023-04-26 10:02:30 +08:00
[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-blue)](https://huggingface.co/spaces/AIGC-Audio/AudioGPT)
2023-03-23 23:19:33 +08:00
2023-04-26 23:24:29 +08:00
We provide our implementation and pretrained models as open source in this repository.
## Get Started
Please refer to [run.md](run.md)
2023-03-27 19:54:59 +08:00
## Capabilities
2023-03-28 15:19:16 +08:00
2023-04-19 00:46:11 +08:00
Here we list the capability of AudioGPT at this time. More supported models and tasks are coming soon. For prompt examples, refer to [asset](assets/README.md).
2023-03-27 19:54:59 +08:00
2023-04-30 17:58:31 +02:00
Currently not every model has repository.
2023-03-28 15:19:16 +08:00
### Speech
2023-04-06 23:54:27 +08:00
| Task | Supported Foundation Models | Status |
|:--------------------------:|:-------------------------------:|:------:|
2023-04-30 17:58:31 +02:00
| Text-to-Speech | [FastSpeech](https://github.com/ming024/FastSpeech2), [SyntaSpeech](https://github.com/yerfor/SyntaSpeech), [VITS](https://github.com/jaywalnut310/vits) | Yes (WIP) |
| Style Transfer | [GenerSpeech](https://github.com/Rongjiehuang/GenerSpeech) | Yes |
| Speech Recognition | [whisper](https://github.com/openai/whisper), [Conformer](https://github.com/sooftware/conformer) | Yes |
2023-04-12 15:17:54 +08:00
| Speech Enhancement | [ConvTasNet]() | Yes (WIP) |
2023-04-30 17:58:31 +02:00
| Speech Separation | [TF-GridNet](https://arxiv.org/pdf/2211.12433.pdf) | Yes (WIP) |
| Speech Translation | [Multi-decoder](https://arxiv.org/pdf/2109.12804.pdf) | WIP |
| Mono-to-Binaural | [NeuralWarp](https://github.com/fdarmon/NeuralWarp) | Yes |
2023-03-28 15:19:16 +08:00
### Sing
| Task | Supported Foundation Models | Status |
|:-------------------------:|:-------------------------------:|:------:|
2023-04-30 17:59:47 +02:00
| Text-to-Sing | [DiffSinger](https://github.com/MoonInTheRiver/DiffSinger), [VISinger](https://github.com/jerryuhoo/VISinger) | Yes (WIP) |
2023-03-28 15:19:16 +08:00
### Audio
2023-04-06 23:54:27 +08:00
| Task | Supported Foundation Models | Status |
|:----------------------:|:---------------------------:|:------:|
| Text-to-Audio | [Make-An-Audio]() | Yes |
| Audio Inpainting | [Make-An-Audio]() | Yes |
| Image-to-Audio | [Make-An-Audio]() | Yes |
2023-04-30 17:58:31 +02:00
| Sound Detection | [Audio-transformer](https://github.com/RetroCirce/HTS-Audio-Transformer) | Yes |
| Target Sound Detection | [TSDNet](https://github.com/gy65896/TSDNet) | Yes |
| Sound Extraction | [LASSNet](https://github.com/liuxubo717/LASS) | Yes |
2023-03-29 21:24:07 +08:00
2023-03-28 15:19:16 +08:00
### Talking Head
2023-04-02 20:05:12 +07:00
| Task | Supported Foundation Models | Status |
|:-------------------------:|:-------------------------------:|:----------:|
2023-04-30 17:58:31 +02:00
| Talking Head Synthesis | [GeneFace](https://github.com/yerfor/GeneFace) | Yes (WIP) |
2023-03-23 23:19:33 +08:00
## Acknowledgement
We appreciate the open source of the following projects:
2023-04-26 10:02:30 +08:00
[ESPNet](https://github.com/espnet/espnet) &#8194;
[NATSpeech](https://github.com/NATSpeech/NATSpeech) &#8194;
2023-03-23 23:19:33 +08:00
[Visual ChatGPT](https://github.com/microsoft/visual-chatgpt) &#8194;
[Hugging Face](https://github.com/huggingface) &#8194;
[LangChain](https://github.com/hwchase17/langchain) &#8194;
[Stable Diffusion](https://github.com/CompVis/stable-diffusion) &#8194;
2023-04-21 11:19:36 +08:00