update huggingface

2025-12-16 11:57:58 +01:00 · 2023-04-06 23:54:27 +08:00
parent 112d87b6f9
commit 2da3ccdd67
1 changed files with 23 additions and 20 deletions
--- a/README.md
+++ b/README.md
@@ -2,6 +2,9 @@

 **AudioGPT** connects ChatGPT and a series of Audio Foundation Models to enable **sending** and **receiving** speech, sing, audio, and talking head during chatting.

+<a src="https://img.shields.io/badge/%F0%9F%A4%97-Open%20in%20Spaces-blue" href="https://huggingface.co/spaces/AIGC-Audio/AudioGPT">
+    <img src="https://img.shields.io/badge/%F0%9F%A4%97-Open%20in%20Spaces-blue" alt="Open in Spaces">
+</a>

 ## Capabilities

@@ -10,15 +13,15 @@ Up-to-date link: https://93868c7fa583f4b5.gradio.app
 Here we list the capability of AudioGPT at this time. More supported models and tasks are comming soon. For prompt examples, refer to [asset](assets/README.md).

 ### Speech
-|           Task            |   Supported Foundation Models   | Status |
-|:-------------------------:|:-------------------------------:|:------:|
-|      Text-to-Speech       | [FastSpeech](), [SyntaSpeech](), [VITS]() |  Yes (WIP)   |
-|      Style Transfer       |         [GenerSpeech]()         |  Yes   |
-|    Speech Recognition     |           [whisper](), [Conformer]()           |  Yes   |
-|    Speech Enhancement     |          [ConvTasNet]()         |  WIP   |
-|    Speech Separation      |          [TF-GridNet]()         |  WIP   |
-|    Speech Translation     |          [Multi-decoder]()      |  WIP   |
-|  Mono-to-Binaural Speech  |          [NeuralWarp]()         |  Yes   |
+|            Task            |   Supported Foundation Models   | Status |
+|:--------------------------:|:-------------------------------:|:------:|
+|       Text-to-Speech       | [FastSpeech](), [SyntaSpeech](), [VITS]() |  Yes (WIP)   |
+|       Style Transfer       |         [GenerSpeech]()         |  Yes   |
+|     Speech Recognition     |           [whisper](), [Conformer]()           |  Yes   |
+|     Speech Enhancement     |          [ConvTasNet]()         |  WIP   |
+|     Speech Separation      |          [TF-GridNet]()         |  WIP   |
+|     Speech Translation     |          [Multi-decoder]()      |  WIP   |
+|      Mono-to-Binaural      |          [NeuralWarp]()         |  Yes   |

 ### Sing

@@ -27,14 +30,14 @@ Here we list the capability of AudioGPT at this time. More supported models and
 |       Text-to-Sing        |         [DiffSinger](), [VISinger]()          |  Yes (WIP)   |

 ### Audio
-|       Task       | Supported Foundation Models |  Status   |
-|:----------------:|:---------------------------:|:---------:|
-|  Text-to-Audio   |      [Make-An-Audio]()      |    Yes    |
-| Audio Inpainting |      [Make-An-Audio]()      |    Yes    |
-|  Image-to-Audio  |      [Make-An-Audio]()      |    Yes    |
-| Sound Detection  |    [Audio-transformer]()    | Yes (WIP) |
-| Target sound detection  |    [TSDNet]()    | Yes (WIP) |
-| Sound Extraction  |    [LASSNet]()    | Yes (WIP) |
+|          Task          | Supported Foundation Models | Status |
+|:----------------------:|:---------------------------:|:------:|
+|     Text-to-Audio      |      [Make-An-Audio]()      |  Yes   |
+|    Audio Inpainting    |      [Make-An-Audio]()      |  Yes   |
+|     Image-to-Audio     |      [Make-An-Audio]()      |  Yes   |
+|    Sound Detection     |    [Audio-transformer]()    | Yes    |
+| Target Sound Detection |    [TSDNet]()    |  Yes   |
+|    Sound Extraction    |    [LASSNet]()    |  Yes   |


 ### Talking Head
@@ -44,7 +47,8 @@ Here we list the capability of AudioGPT at this time. More supported models and
 |  Talking Head Synthesis   |          [GeneFace]()           | Yes (WIP)  |

 ## Internal Version Updates
-4.3 Support Talking Head Synthesis\
+4.6 Support Sound Extraction/Detection\
+4.3 Support huggingface demo space\
 4.1 Support Audio inpainting and clean codes\
 3.27 Support Style Transfer/Talking head Synthesis\
 3.23 Support Text-to-Sing\
@@ -54,10 +58,9 @@ Here we list the capability of AudioGPT at this time. More supported models and

 ## Todo
 - [x] clean text to sing/speech code
- [ ] import Espnet models for speech tasks
 - [ ] merge talking head synthesis into main
 - [x] change audio/video log output
- [ ] support huggingface space
+- [x] support huggingface space

 ## Acknowledgement
 We appreciate the open source of the following projects: