yangdongchao a70460047f update_tsd
2023-04-05 16:07:59 +08:00
2023-03-27 19:54:59 +08:00
2023-04-05 16:07:59 +08:00
2023-03-29 21:20:32 +08:00
TSD
2023-04-04 22:48:47 +08:00
2023-03-27 19:54:59 +08:00
2023-03-25 21:45:49 +08:00
2023-03-25 15:59:59 +08:00
2023-03-25 21:45:49 +08:00
2023-04-05 16:07:59 +08:00
2023-04-05 16:07:59 +08:00
2023-03-27 21:38:17 +08:00
2023-03-24 01:29:17 +08:00
2023-03-24 13:43:06 +08:00

AudioGPT

AudioGPT connects ChatGPT and a series of Audio Foundation Models to enable sending and receiving speech, sing, and audio during chatting.

Capabilities

Here we list the capability of AudioGPT at this time. More supported models and tasks are comming soon. For prompt examples, refer to asset.

Task Supported Foundation Models Status
----------Speech--------- / /
Text-to-Speech FastSpeech, SyntaSpeech Yes
Style Transfer GenerSpeech WIP
Speech Recognition whisper Yes
----------Sing--------- /
Text-to-Sing DiffSinger Yes
----------Audio--------- /
Text-to-Audio Make-An-Audio Yes
Audio Inpainting Make-An-Audio WIP
Image-to-Audio Make-An-Audio Yes
----------Face---------
Talking Head Synthesis GeneFace WIP

Internal Version Updates

3.27 Support Style Transfer/Talking head Synthesis
3.23 Support Text-to-Sing
3.21 Support Image-to-Sing
3.19 Support Speech Recognition
3.17 Support Text-to-Audio

Todo

  • clean text to sing/speech code
  • import Espnet models for speech tasks
  • merge talking head synthesis into main
  • change audio/video log output
  • support huggingface space

Acknowledgement

We appreciate the open source of the following projects:

Visual ChatGPT Hugging Face LangChain Stable Diffusion

Description
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head https://huggingface.co/spaces/AIGC-Audio/AudioGPT
Readme 24 MiB
Languages
Python 99.8%
Shell 0.2%