mirror of
https://github.com/AIGC-Audio/AudioGPT.git
synced 2025-12-16 11:57:58 +01:00
b7ef7f0cd32b6c193e9f832eafb937e1edd0d08f
AudioGPT
AudioGPT connects ChatGPT and a series of Audio Foundation Models to enable sending and receiving speech, sing, audio, and talking head during chatting.
Capabilities
Here we list the capability of AudioGPT at this time. More supported models and tasks are coming soon. For prompt examples, refer to asset.
Speech
| Task | Supported Foundation Models | Status |
|---|---|---|
| Text-to-Speech | FastSpeech, SyntaSpeech, VITS | Yes (WIP) |
| Style Transfer | GenerSpeech | Yes |
| Speech Recognition | whisper, Conformer | Yes |
| Speech Enhancement | ConvTasNet | Yes (WIP) |
| Speech Separation | TF-GridNet | Yes (WIP) |
| Speech Translation | Multi-decoder | WIP |
| Mono-to-Binaural | NeuralWarp | Yes |
Sing
| Task | Supported Foundation Models | Status |
|---|---|---|
| Text-to-Sing | DiffSinger, VISinger | Yes (WIP) |
Audio
| Task | Supported Foundation Models | Status |
|---|---|---|
| Text-to-Audio | Make-An-Audio | Yes |
| Audio Inpainting | Make-An-Audio | Yes |
| Image-to-Audio | Make-An-Audio | Yes |
| Sound Detection | Audio-transformer | Yes |
| Target Sound Detection | TSDNet | Yes |
| Sound Extraction | LASSNet | Yes |
Talking Head
| Task | Supported Foundation Models | Status |
|---|---|---|
| Talking Head Synthesis | GeneFace | Yes (WIP) |
Todo
- clean text to sing/speech code
- merge talking head synthesis into main
- change audio/video log output
- support huggingface space
Acknowledgement
We appreciate the open source of the following projects:
Description
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
https://huggingface.co/spaces/AIGC-Audio/AudioGPT
Readme
24 MiB
Languages
Python
99.8%
Shell
0.2%