README.md

# AudioGPT

**AudioGPT** connects ChatGPT and a series of Audio Foundation Models to enable **sending** and **receiving** speech, sing, audio, and talking head during chatting.

<a src="https://img.shields.io/badge/%F0%9F%A4%97-Open%20in%20Spaces-blue" href="https://huggingface.co/spaces/AIGC-Audio/AudioGPT">
    <img src="https://img.shields.io/badge/%F0%9F%A4%97-Open%20in%20Spaces-blue" alt="Open in Spaces">
</a>

## Capabilities

Here we list the capability of AudioGPT at this time. More supported models and tasks are coming soon. For prompt examples, refer to [asset](assets/README.md).

### Speech
|            Task            |   Supported Foundation Models   | Status |
|:--------------------------:|:-------------------------------:|:------:|
|       Text-to-Speech       | [FastSpeech](), [SyntaSpeech](), [VITS]() |  Yes (WIP)   |
|       Style Transfer       |         [GenerSpeech]()         |  Yes   |
|     Speech Recognition     |           [whisper](), [Conformer]()           |  Yes   |
|     Speech Enhancement     |          [ConvTasNet]()         |  Yes (WIP)   |
|     Speech Separation      |          [TF-GridNet]()         |  Yes (WIP)   |
|     Speech Translation     |          [Multi-decoder]()      |  WIP   |
|      Mono-to-Binaural      |          [NeuralWarp]()         |  Yes   |

### Sing

|           Task            |   Supported Foundation Models   | Status |
|:-------------------------:|:-------------------------------:|:------:|
|       Text-to-Sing        |         [DiffSinger](), [VISinger]()          |  Yes (WIP)   |

### Audio
|          Task          | Supported Foundation Models | Status |
|:----------------------:|:---------------------------:|:------:|
|     Text-to-Audio      |      [Make-An-Audio]()      |  Yes   |
|    Audio Inpainting    |      [Make-An-Audio]()      |  Yes   |
|     Image-to-Audio     |      [Make-An-Audio]()      |  Yes   |
|    Sound Detection     |    [Audio-transformer]()    | Yes    |
| Target Sound Detection |    [TSDNet]()    |  Yes   |
|    Sound Extraction    |    [LASSNet]()    |  Yes   |


### Talking Head

|           Task            |   Supported Foundation Models   |   Status   |
|:-------------------------:|:-------------------------------:|:----------:|
|  Talking Head Synthesis   |          [GeneFace]()           | Yes (WIP)  |


## Todo
- [x] clean text to sing/speech code
- [ ] merge talking head synthesis into main
- [x] change audio/video log output
- [x] support huggingface space

## Acknowledgement
We appreciate the open source of the following projects:

[Visual ChatGPT](https://github.com/microsoft/visual-chatgpt) &#8194;
[Hugging Face](https://github.com/huggingface) &#8194;
[LangChain](https://github.com/hwchase17/langchain) &#8194;
[Stable Diffusion](https://github.com/CompVis/stable-diffusion) &#8194;
first commit 2023-03-16 15:36:54 +08:00			`# AudioGPT`
update 2023-03-23 23:19:33 +08:00
update 2023-03-28 15:19:16 +08:00			`AudioGPT connects ChatGPT and a series of Audio Foundation Models to enable sending and receiving speech, sing, audio, and talking head during chatting.`
update 2023-03-23 23:19:33 +08:00
update huggingface 2023-04-06 23:54:27 +08:00			`<a src="https://img.shields.io/badge/%F0%9F%A4%97-Open%20in%20Spaces-blue" href="https://huggingface.co/spaces/AIGC-Audio/AudioGPT">`
			`<img src="https://img.shields.io/badge/%F0%9F%A4%97-Open%20in%20Spaces-blue" alt="Open in Spaces">`
			`</a>`
update 2023-03-23 23:19:33 +08:00
update 2023-03-27 19:54:59 +08:00			`## Capabilities`
update 2023-03-28 15:19:16 +08:00
Update README.md 2023-04-19 00:46:11 +08:00			`Here we list the capability of AudioGPT at this time. More supported models and tasks are coming soon. For prompt examples, refer to [asset](assets/README.md).`
update 2023-03-27 19:54:59 +08:00
update 2023-03-28 15:19:16 +08:00			`### Speech`
update huggingface 2023-04-06 23:54:27 +08:00			`\| Task \| Supported Foundation Models \| Status \|`
			`\|:--------------------------:\|:-------------------------------:\|:------:\|`
			`\| Text-to-Speech \| [FastSpeech](), [SyntaSpeech](), [VITS]() \| Yes (WIP) \|`
			`\| Style Transfer \| [GenerSpeech]() \| Yes \|`
			`\| Speech Recognition \| [whisper](), [Conformer]() \| Yes \|`
clean some codes 2023-04-12 15:17:54 +08:00			`\| Speech Enhancement \| [ConvTasNet]() \| Yes (WIP) \|`
			`\| Speech Separation \| [TF-GridNet]() \| Yes (WIP) \|`
update huggingface 2023-04-06 23:54:27 +08:00			`\| Speech Translation \| [Multi-decoder]() \| WIP \|`
			`\| Mono-to-Binaural \| [NeuralWarp]() \| Yes \|`
update 2023-03-28 15:19:16 +08:00
			`### Sing`

			`\| Task \| Supported Foundation Models \| Status \|`
			`\|:-------------------------:\|:-------------------------------:\|:------:\|`
Add WIP items for the project 2023-03-28 01:35:02 -04:00			`\| Text-to-Sing \| [DiffSinger](), [VISinger]() \| Yes (WIP) \|`
update 2023-03-28 15:19:16 +08:00
			`### Audio`
update huggingface 2023-04-06 23:54:27 +08:00			`\| Task \| Supported Foundation Models \| Status \|`
			`\|:----------------------:\|:---------------------------:\|:------:\|`
			`\| Text-to-Audio \| [Make-An-Audio]() \| Yes \|`
			`\| Audio Inpainting \| [Make-An-Audio]() \| Yes \|`
			`\| Image-to-Audio \| [Make-An-Audio]() \| Yes \|`
			`\| Sound Detection \| [Audio-transformer]() \| Yes \|`
			`\| Target Sound Detection \| [TSDNet]() \| Yes \|`
			`\| Sound Extraction \| [LASSNet]() \| Yes \|`
Update README.md 2023-03-29 21:24:07 +08:00
update 2023-03-28 15:19:16 +08:00
			`### Talking Head`

delect cache 2023-04-02 20:05:12 +07:00			`\| Task \| Supported Foundation Models \| Status \|`
			`\|:-------------------------:\|:-------------------------------:\|:----------:\|`
			`\| Talking Head Synthesis \| [GeneFace]() \| Yes (WIP) \|`
update 2023-03-23 23:19:33 +08:00

update 2023-03-27 21:38:17 +08:00			`## Todo`
delect cache 2023-04-02 20:05:12 +07:00			`- [x] clean text to sing/speech code`
update 2023-03-27 21:38:17 +08:00			`- [ ] merge talking head synthesis into main`
delect cache 2023-04-02 20:05:12 +07:00			`- [x] change audio/video log output`
update huggingface 2023-04-06 23:54:27 +08:00			`- [x] support huggingface space`
update 2023-03-27 21:38:17 +08:00
update 2023-03-23 23:19:33 +08:00			`## Acknowledgement`
			`We appreciate the open source of the following projects:`

			`[Visual ChatGPT](https://github.com/microsoft/visual-chatgpt)  `
			`[Hugging Face](https://github.com/huggingface)  `
			`[LangChain](https://github.com/hwchase17/langchain)  `
			`[Stable Diffusion](https://github.com/CompVis/stable-diffusion)  `