mirror of
https://github.com/AIGC-Audio/AudioGPT.git
synced 2025-12-16 03:47:55 +01:00
84a549325370bc59cc3a795387a2db5643219a95
AudioGPT
AudioGPT connects ChatGPT and a series of Audio Foundation Models to enable sending and receiving speech, sing, audio, and talking head during chatting.
Capabilities
Up-to-date link: https://cdb7b543afd1c8e8.gradio.app
Here we list the capability of AudioGPT at this time. More supported models and tasks are comming soon. For prompt examples, refer to asset.
Speech
| Task | Supported Foundation Models | Status |
|---|---|---|
| Text-to-Speech | FastSpeech, SyntaSpeech, VITS | Yes (WIP) |
| Style Transfer | GenerSpeech | Yes |
| Speech Recognition | whisper, Conformer | Yes |
| Speech Enhancement | ConvTasNet | WIP |
| Speech Separation | TF-GridNet | WIP |
| Speech Translation | Multi-decoder | WIP |
| Mono-to-Binaural | NeuralWarp | Yes |
Sing
| Task | Supported Foundation Models | Status |
|---|---|---|
| Text-to-Sing | DiffSinger, VISinger | Yes (WIP) |
Audio
| Task | Supported Foundation Models | Status |
|---|---|---|
| Text-to-Audio | Make-An-Audio | Yes |
| Audio Inpainting | Make-An-Audio | Yes |
| Image-to-Audio | Make-An-Audio | Yes |
| Sound Detection | Audio-transformer | Yes |
| Target Sound Detection | TSDNet | Yes |
| Sound Extraction | LASSNet | Yes |
Talking Head
| Task | Supported Foundation Models | Status |
|---|---|---|
| Talking Head Synthesis | GeneFace | Yes (WIP) |
Internal Version Updates
4.6 Support Sound Extraction/Detection
4.3 Support huggingface demo space
4.1 Support Audio inpainting and clean codes
3.27 Support Style Transfer/Talking head Synthesis
3.23 Support Text-to-Sing
3.21 Support Image-to-Audio
3.19 Support Speech Recognition
3.17 Support Text-to-Audio
Todo
- clean text to sing/speech code
- merge talking head synthesis into main
- change audio/video log output
- support huggingface space
Acknowledgement
We appreciate the open source of the following projects:
Description
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
https://huggingface.co/spaces/AIGC-Audio/AudioGPT
Readme
24 MiB
Languages
Python
99.8%
Shell
0.2%