Merge branch 'main' of github.com:Rongjiehuang/AudioGPT

This commit is contained in:
Rongjiehuang
2023-04-09 17:02:47 +08:00
7 changed files with 30 additions and 1 deletions

View File

@@ -71,3 +71,32 @@ Output:<br />
Input Example : Generate an image of a horse<br />
Output:<br />
![](t2i.png)<br />
## Sound Detection
First upload your audio(.wav)<br />
Audio Example :<br />
<audio src="mix.wav" controls></audio><br />
Input Example : What events does this audio include?<br />
Output:<br />
![](detection.png)<br />
## Mono audio to Binaural Audio
First upload your audio(.wav)<br />
<audio src="mix.wav" controls></audio><br />
Input Example: Transfer the mono speech to a binaural one audio.<br />
Output:<br />
![](m2b.png)<br />
## Target Sound Detection
Fisrt upload your audio(.wav)<br />
<audio src="mix.wav" controls></audio><br />
Input Example: please help me detect the target sound in the audio based on desription: “I want to detect Applause event”<br />
Output:<br />
![](tsd.png)<br />
## Sound Extraction
First upload your audio(.wav)<br />
<audio src="mix.wav" controls></audio><br />
Input Example: Please help me extract the sound events from the audio based on the description: "a person shouts nearby and then emergency vehicle sirens sounds"<br />
Output:<br />
![](sound_extraction.png)<br />

BIN
assets/detection.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 119 KiB

BIN
assets/m2b.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 135 KiB

BIN
assets/mix1.wav Normal file

Binary file not shown.

BIN
assets/sound_extraction.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 172 KiB

BIN
assets/tsd.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 137 KiB

View File

@@ -843,7 +843,7 @@ class ConversationBot:
Tool(name="Extract sound event from mixture audio based on language description", func=self.extraction.inference,
description="useful for when you extract target sound from a mixture audio, you can describe the taregt sound by text, receives audio_path and text as input. "
"The input to this tool should be a comma seperated string of two, representing mixture audio path and input text."),
Tool(name="Detect the sound event from the audio based on your descriptions", func=self.TSD.inference,
Tool(name="Detect the target sound event from the audio based on your descriptions", func=self.TSD.inference,
description="useful for when you want to know the when happens the target sound event in th audio. You can use language descriptions to instruct the model. receives text description and audio_path as input. "
"The input to this tool should be a string, representing the answer. ")]
self.agent = initialize_agent(