mirror of
https://github.com/AIGC-Audio/AudioGPT.git
synced 2025-12-16 11:57:58 +01:00
7ecef2bafc8058ac0cdbc5badb855c8b7b84a8dd
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Capabilities
Here we list the capability of AudioGPT at this time. More supported models and tasks are comming soon. For prompt examples, refer to asset.
Speech
| Task | Supported Foundation Models | Status |
|---|---|---|
| Text-to-Speech | FastSpeech, SyntaSpeech, VITS | Yes (WIP) |
| Style Transfer | GenerSpeech | Yes |
| Speech Recognition | whisper, Conformer | Yes |
| Speech Enhancement | ConvTasNet | Yes (WIP) |
| Speech Separation | TF-GridNet | Yes (WIP) |
| Speech Translation | Multi-decoder | WIP |
| Mono-to-Binaural | NeuralWarp | Yes |
Sing
| Task | Supported Foundation Models | Status |
|---|---|---|
| Text-to-Sing | DiffSinger, VISinger | Yes (WIP) |
Audio
| Task | Supported Foundation Models | Status |
|---|---|---|
| Text-to-Audio | Make-An-Audio | Yes |
| Audio Inpainting | Make-An-Audio | Yes |
| Image-to-Audio | Make-An-Audio | Yes |
| Sound Detection | Audio-transformer | Yes |
| Target Sound Detection | TSDNet | Yes |
| Sound Extraction | LASSNet | Yes |
Talking Head
| Task | Supported Foundation Models | Status |
|---|---|---|
| Talking Head Synthesis | GeneFace | Yes (WIP) |
Acknowledgement
We appreciate the open source of the following projects:
ESPNet NATSpeech Visual ChatGPT Hugging Face LangChain Stable Diffusion
Description
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
https://huggingface.co/spaces/AIGC-Audio/AudioGPT
Readme
24 MiB
Languages
Python
99.8%
Shell
0.2%