mirror of
https://github.com/AIGC-Audio/AudioGPT.git
synced 2025-12-16 03:47:55 +01:00
9e2a24b1174c1dd100bb3c7bd2b17942a6a68dd3
AudioGPT
AudioGPT connects ChatGPT and a series of Audio Foundation Models to enable sending and receiving speech, sing, audio, and talking head during chatting.
Capabilities
Up-to-date link: https://93868c7fa583f4b5.gradio.app
Here we list the capability of AudioGPT at this time. More supported models and tasks are comming soon. For prompt examples, refer to asset.
Speech
| Task | Supported Foundation Models | Status |
|---|---|---|
| Text-to-Speech | FastSpeech, SyntaSpeech, VITS | Yes (WIP) |
| Style Transfer | GenerSpeech | Yes |
| Speech Recognition | whisper, Conformer | Yes |
| Speech Enhancement | ConvTasNet | WIP |
| Speech Separation | TF-GridNet | WIP |
| Speech Translation | Multi-decoder | WIP |
| Mono-to-Binaural Speech | NeuralWarp | Yes |
Sing
| Task | Supported Foundation Models | Status |
|---|---|---|
| Text-to-Sing | DiffSinger, VISinger | Yes (WIP) |
Audio
| Task | Supported Foundation Models | Status |
|---|---|---|
| Text-to-Audio | Make-An-Audio | Yes |
| Audio Inpainting | Make-An-Audio | WIP |
| Image-to-Audio | Make-An-Audio | Yes |
| sound detection | Audio-transformer | Yes |
Talking Head
| Task | Supported Foundation Models | Status |
|---|---|---|
| Talking Head Synthesis | GeneFace | WIP |
Internal Version Updates
3.27 Support Style Transfer/Talking head Synthesis
3.23 Support Text-to-Sing
3.21 Support Image-to-Sing
3.19 Support Speech Recognition
3.17 Support Text-to-Audio
Todo
- clean text to sing/speech code
- import Espnet models for speech tasks
- merge talking head synthesis into main
- change audio/video log output
- support huggingface space
Acknowledgement
We appreciate the open source of the following projects:
Description
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
https://huggingface.co/spaces/AIGC-Audio/AudioGPT
Readme
24 MiB
Languages
Python
99.8%
Shell
0.2%