AudioGPT connects ChatGPT and a series of Audio Foundation Models to enable sending and receiving speech, sing, and audio during chatting.

Capability

Here we list the capability of AudioGPT at this time. More supported models and tasks are comming soon.

Task	Foundation Model	Status
----------Speech---------	/	/
Text-to-Speech	FastSpeech, SyntaSpeech	WIP
Neural Vocoding	BigVGAN, FastDiff	WIP
Style Transfer	GenerSpeech	WIP
Speech Recognition	whisper	Yes
----------Sing---------	/
Text-to-Sing	DiffSinger	Yes
----------Audio---------	/
Text-to-Audio	Make-An-Audio	Yes
Audio Inpainting	Make-An-Audio	WIP
Image-to-Audio	Make-An-Audio	Yes

Internal Version Updates

3.23 Support Text-to-Sing
3.21 Support Image-to-Sing
3.19 Support Speech Recognition
3.17 Support Text-to-Audio

Acknowledgement

We appreciate the open source of the following projects:

Visual ChatGPT Hugging Face LangChain Stable Diffusion

README.md Unescape Escape

AudioGPT

Capability

Internal Version Updates

Acknowledgement

README.md