2023-04-19 00:46:11 +08:00
2023-04-09 17:05:42 +08:00
2023-04-06 22:18:47 +08:00
2023-03-28 23:30:18 +08:00
2023-04-09 17:02:38 +08:00
2023-04-06 00:11:23 +08:00
2023-03-27 19:54:59 +08:00
2023-04-02 20:05:01 +07:00
2023-04-13 16:08:30 +08:00
2023-04-09 17:02:38 +08:00
2023-04-09 17:02:38 +08:00
2023-04-19 00:46:11 +08:00
2023-04-11 08:07:49 -04:00
2023-03-24 13:43:06 +08:00

AudioGPT

AudioGPT connects ChatGPT and a series of Audio Foundation Models to enable sending and receiving speech, sing, audio, and talking head during chatting.

Open in Spaces

Capabilities

Here we list the capability of AudioGPT at this time. More supported models and tasks are coming soon. For prompt examples, refer to asset.

Speech

Task Supported Foundation Models Status
Text-to-Speech FastSpeech, SyntaSpeech, VITS Yes (WIP)
Style Transfer GenerSpeech Yes
Speech Recognition whisper, Conformer Yes
Speech Enhancement ConvTasNet Yes (WIP)
Speech Separation TF-GridNet Yes (WIP)
Speech Translation Multi-decoder WIP
Mono-to-Binaural NeuralWarp Yes

Sing

Task Supported Foundation Models Status
Text-to-Sing DiffSinger, VISinger Yes (WIP)

Audio

Task Supported Foundation Models Status
Text-to-Audio Make-An-Audio Yes
Audio Inpainting Make-An-Audio Yes
Image-to-Audio Make-An-Audio Yes
Sound Detection Audio-transformer Yes
Target Sound Detection TSDNet Yes
Sound Extraction LASSNet Yes

Talking Head

Task Supported Foundation Models Status
Talking Head Synthesis GeneFace Yes (WIP)

Todo

  • clean text to sing/speech code
  • merge talking head synthesis into main
  • change audio/video log output
  • support huggingface space

Acknowledgement

We appreciate the open source of the following projects:

Visual ChatGPT Hugging Face LangChain Stable Diffusion

Description
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head https://huggingface.co/spaces/AIGC-Audio/AudioGPT
Readme 24 MiB
Languages
Python 99.8%
Shell 0.2%