0.32.0 (#208)
* separate other tts apis and improve chunking * move old tts config to voice agent config and implement config widget ux elements for table editing * elevenlabs updated to use their client and expose model selection * linting * separate character class into character.pt and start on voice routing * linting * tts hot swapping and chunking improvements * linting * add support for piper-tts * update gitignore * linting * support google tts fix issue where quick_toggle agent config didnt work on standard config items * linting * only show agent quick toggles if the agent is enabled * change elevenlabs to use a locally maintained voice list * tts generate before / after events * voice library refactor * linting * update openai model and voices * tweak configs * voice library ux * linting * add support for kokoro tts * fix add / remove voice * voice library tags * linting * linting * tts api status * api infos and add more kokoro voices * allow voice testing before saving a new voice * tweaks to voice library ux and some api info text * linting * voice mixer * polish * voice files go into /tts instead of templates/voice * change default narrator voice * xtts confirmation note * character voice select * koboldai format template * polish * skip empty chunks * change default voice * replace em-dash with normal dash * adjust limit * replace libebreaks * chunk cleanup for whitespace * info updated * remove invalid endif tag * sort voices by ready api * Character hashable type * clarify set_simulated_environment use to avoid unwanted character deactivated * allow manual generation of tts and fix assorted issues with tts * tts websocket handler router renamed * voice mixer: when there are only 2 voices auto adjust the other weight as needed * separate persist character functions into own mixin * auto assign voices * fix chara load and auto assign voice during chara load * smart speaker separation * tts speaker separation config * generate tts for intro text * fix prompting issues with anthropic, google and openrouter clients * decensor flag off again * only to ai assisted voice markup on narrator messages * openrouter provider configuration * linting * improved sound controls * add support for chatterbox * fix info * chatterbox dependencies * remove piper and xtts2 * linting * voice params * linting * tts model overrides and move tts info to tab * reorg toolbar * allow overriding of test text * more tts fixes, apply intensity, chatterbox voices * confirm voice delete * lintinG * groq updates * reorg decorators * tts fixes * cancelable audio queue * voice library uploads * scene voice library * Config refactor (#13) * config refactor progres * config nuke continues * fix system prompts * linting * client fun * client config refactor * fix kcpp auto embedding selection * linting * fix proxy config * remove cruft * fix remaining client bugs from config refactor always use get_config(), dont keep an instance reference * support for reasoning models * more reasoning tweaks * only allow one frontend to connect at a time * fix tests * relock * relock * more client adjustments * pattern prefill * some tts agent fixes * fix ai assist cond * tts nodes * fix config retrieval * assign voice node and fixes * sim suite char gen assign voice * fix voice assign template to consider used voices * get rid of auto break repetition which wasn't working right for a while anyhow * linting * generate tts node as string node * linting * voice change on character event * tweak chatterbox max length * koboldai default template * linting * fix saving of existing voice * relock * adjust params of eva default voice * f5tts support * f5tts samples * f5tts support * f5tts tweaks * chunk size per tts api and reorg defaul f5tts voices * chatterbox default voice reog to match f5-tts default voices * voice library ux polish pass * cleanup * f5-tts tweaks * missing samples * get rid of old save cmd * add chatterbox and f5tts * housekeeping * fix some issues with world entry editing * remove cruft * replace exclamation marks * fix save immutable check * fix replace_exclamation_marks * better error handling in websocket plugins and fix issue with saves * agent config save on dialog close * ctrl click to disable / enable agents * fix quick config * allow modifying response size of focal requests * sim suite set goal always sets story intent, encourage calling of set goal during simulation start * allow setting of model * voice param tweaks * tts tweaks * fix character card load * fix note_on_value * add mixed speaker_separation mode * indicate which message the audio is for and provide way to stop audio from the message * fix issue with some tts generation failing * linting * fix speaker separate modes * bad idea * linting * refactor speaker separation prompt * add kimi think pattern * fix issue with unwanted cover image replacemenT * no scene analysis for visual promp generation (for now) * linting * tts for context investigation messages * prompt tweaks * tweak intro * fix intro text tts not auto playing sometimes * consider narrator voice when assigning voice tro a character * allow director log messages to go only into the director console * linting * startup performance fixes * init time * linting * only show audio control for messagews taht can have it * always create story intent and dont override existing saves during character card load * fix history check in dynamic story line node add HasHistory node * linting * fix intro message not having speaker separation * voice library character manager * sequantial and cancelable auto assign all * linting * fix generation cancel handling * tooltips * fix auto assign voice from scene voices * polish * kokoro does not like lazy import * update info text * complete scene export / import * linting * wording * remove cruft * fix story intent generation during character card import * fix generation cancelled emit status inf loop * prompt tweak * reasoning quick toggle, reasoning token slider, tooltips * improved reasoning pattern handling * fix indirect coercion response parsing * fix streaming issue * response length instructions * more robust streaming * adjust default * adjust formatting * litning * remove debug output * director console log function calls * install cuda script updated * linting * add another step * adjust default * update dialogue examples * fix voice selection issues * what's happening here * third time's the charm? * Vite migration (#207) * add vite config * replace babel, webpack, vue-cli deps with vite, switch to esm modules, separate eslint config * change process.env to import.meta.env * update index.html for vite and move to root * update docs for vite * remove vue cli config * update example env with vite * bump frontend deps after rebase to 32.0 --------- Co-authored-by: pax-co <Pax_801@proton.me> * properly referencer data type * what's new * better indication of dialogue example supporting multiple lines, improve dialogue example display * fix potential issue with cached scene anlysis being reused when it shouldn't * fix character creation issues with player character toggle * fix issue where editing a message would sometimes lose parts of the message * fix slider ux thumb labels (vuetify update) * relock * narrative conversation format * remove planning step * linting * tweaks * don't overthink * update dialogue examples and intro * dont dictate response length instructions when data structures are expected * prompt tweaks * prompt tweaks * linting * fix edit message not handling : well * prompt tweaks * fix tests * fix manual revision when character message was generated in new narrative mode * fix issue with message editing * Docker packages relese (#204) * add CI workflow for Docker image build and MkDocs deployment * rename CI workflow from 'ci' to 'package' * refactor CI workflow: consolidate container build and documentation deployment into a single file * fix: correct indentation for permissions in CI workflow * fix: correct indentation for steps in deploy-docs job in CI workflow * build both cpu and cuda image * docs * docs * expose writing style during state reinforcement * prompt tweaks * test container build * test container image * update docker compose * docs * test-container-build * test container build * test container build * update docker build workflows * fix guidance prompt prefix not being dropped * mount tts dir * add gpt-5 * remove debug output * docs * openai auto toggle reasoning based on model selection * linting --------- Co-authored-by: pax-co <123330830+pax-co@users.noreply.github.com> Co-authored-by: pax-co <Pax_801@proton.me> Co-authored-by: Luis Alexandre Deschamps Brandão <brandao_luis@yahoo.com>
@@ -27,10 +27,10 @@ uv run src\talemate\server\run.py runserver --host 0.0.0.0 --port 1234
|
||||
|
||||
### Letting the frontend know about the new host and port
|
||||
|
||||
Copy `talemate_frontend/example.env.development.local` to `talemate_frontend/.env.production.local` and edit the `VUE_APP_TALEMATE_BACKEND_WEBSOCKET_URL`.
|
||||
Copy `talemate_frontend/example.env.development.local` to `talemate_frontend/.env.production.local` and edit the `VITE_TALEMATE_BACKEND_WEBSOCKET_URL`.
|
||||
|
||||
```env
|
||||
VUE_APP_TALEMATE_BACKEND_WEBSOCKET_URL=ws://localhost:1234
|
||||
VITE_TALEMATE_BACKEND_WEBSOCKET_URL=ws://localhost:1234
|
||||
```
|
||||
|
||||
Next rebuild the frontend.
|
||||
|
||||
@@ -1,22 +1,15 @@
|
||||
!!! example "Experimental"
|
||||
Talemate through docker has not received a lot of testing from me, so please let me know if you encounter any issues.
|
||||
|
||||
You can do so by creating an issue on the [:material-github: GitHub repository](https://github.com/vegu-ai/talemate)
|
||||
|
||||
## Quick install instructions
|
||||
|
||||
1. `git clone https://github.com/vegu-ai/talemate.git`
|
||||
1. `cd talemate`
|
||||
1. copy config file
|
||||
1. linux: `cp config.example.yaml config.yaml`
|
||||
1. windows: `copy config.example.yaml config.yaml`
|
||||
1. If your host has a CUDA compatible Nvidia GPU
|
||||
1. Windows (via PowerShell): `$env:CUDA_AVAILABLE="true"; docker compose up`
|
||||
1. Linux: `CUDA_AVAILABLE=true docker compose up`
|
||||
1. If your host does **NOT** have a CUDA compatible Nvidia GPU
|
||||
1. Windows: `docker compose up`
|
||||
1. Linux: `docker compose up`
|
||||
1. windows: `copy config.example.yaml config.yaml` (or just copy the file and rename it via the file explorer)
|
||||
1. `docker compose up`
|
||||
1. Navigate your browser to http://localhost:8080
|
||||
|
||||
!!! info "Pre-built Images"
|
||||
The default setup uses pre-built images from GitHub Container Registry that include CUDA support by default. To manually build the container instead, use `docker compose -f docker-compose.manual.yml up --build`.
|
||||
|
||||
!!! note
|
||||
When connecting local APIs running on the hostmachine (e.g. text-generation-webui), you need to use `host.docker.internal` as the hostname.
|
||||
|
||||
BIN
docs/img/0.32.0/add-chatterbox-voice.png
Normal file
|
After Width: | Height: | Size: 35 KiB |
BIN
docs/img/0.32.0/add-elevenlabs-voice.png
Normal file
|
After Width: | Height: | Size: 29 KiB |
BIN
docs/img/0.32.0/add-f5tts-voice.png
Normal file
|
After Width: | Height: | Size: 43 KiB |
BIN
docs/img/0.32.0/character-voice-assignment.png
Normal file
|
After Width: | Height: | Size: 65 KiB |
BIN
docs/img/0.32.0/chatterbox-api-settings.png
Normal file
|
After Width: | Height: | Size: 54 KiB |
BIN
docs/img/0.32.0/chatterbox-parameters.png
Normal file
|
After Width: | Height: | Size: 18 KiB |
BIN
docs/img/0.32.0/client-reasoning-2.png
Normal file
|
After Width: | Height: | Size: 18 KiB |
BIN
docs/img/0.32.0/client-reasoning.png
Normal file
|
After Width: | Height: | Size: 75 KiB |
BIN
docs/img/0.32.0/elevenlabs-api-settings.png
Normal file
|
After Width: | Height: | Size: 61 KiB |
BIN
docs/img/0.32.0/elevenlabs-copy-voice-id.png
Normal file
|
After Width: | Height: | Size: 9.6 KiB |
BIN
docs/img/0.32.0/f5tts-api-settings.png
Normal file
|
After Width: | Height: | Size: 72 KiB |
BIN
docs/img/0.32.0/f5tts-parameters.png
Normal file
|
After Width: | Height: | Size: 12 KiB |
BIN
docs/img/0.32.0/google-tts-api-settings.png
Normal file
|
After Width: | Height: | Size: 63 KiB |
BIN
docs/img/0.32.0/kokoro-mixer.png
Normal file
|
After Width: | Height: | Size: 33 KiB |
BIN
docs/img/0.32.0/openai-tts-api-settings.png
Normal file
|
After Width: | Height: | Size: 61 KiB |
BIN
docs/img/0.32.0/voice-agent-settings.png
Normal file
|
After Width: | Height: | Size: 107 KiB |
BIN
docs/img/0.32.0/voice-agent-status-characters.png
Normal file
|
After Width: | Height: | Size: 3.0 KiB |
BIN
docs/img/0.32.0/voice-library-access.png
Normal file
|
After Width: | Height: | Size: 9.3 KiB |
BIN
docs/img/0.32.0/voice-library-api-status.png
Normal file
|
After Width: | Height: | Size: 6.6 KiB |
BIN
docs/img/0.32.0/voice-library-interface.png
Normal file
|
After Width: | Height: | Size: 142 KiB |
58
docs/user-guide/agents/voice/chatterbox.md
Normal file
@@ -0,0 +1,58 @@
|
||||
# Chatterbox
|
||||
|
||||
Local zero shot voice cloning from .wav files.
|
||||
|
||||

|
||||
|
||||
##### Device
|
||||
|
||||
Auto-detects best available option
|
||||
|
||||
##### Model
|
||||
|
||||
Default Chatterbox model optimized for speed
|
||||
|
||||
##### Chunk size
|
||||
|
||||
Split text into chunks of this size. Smaller values will increase responsiveness at the cost of lost context between chunks. (Stuff like appropriate inflection, etc.). 0 = no chunking
|
||||
|
||||
## Adding Chatterbox Voices
|
||||
|
||||
### Voice Requirements
|
||||
|
||||
Chatterbox voices require:
|
||||
|
||||
- Reference audio file (.wav format, 5-15 seconds optimal)
|
||||
- Clear speech with minimal background noise
|
||||
- Single speaker throughout the sample
|
||||
|
||||
### Creating a Voice
|
||||
|
||||
1. Open the Voice Library
|
||||
2. Click **:material-plus: New**
|
||||
3. Select "Chatterbox" as the provider
|
||||
4. Configure the voice:
|
||||
|
||||

|
||||
|
||||
**Label:** Descriptive name (e.g., "Marcus - Deep Male")
|
||||
|
||||
**Voice ID / Upload File** Upload a .wav file containing the voice sample. The uploaded reference audio will also be the voice ID.
|
||||
|
||||
**Speed:** Adjust playback speed (0.5 to 2.0, default 1.0)
|
||||
|
||||
**Tags:** Add descriptive tags for organization
|
||||
|
||||
**Extra voice parameters**
|
||||
|
||||
There exist some optional parameters that can be set here on a per voice level.
|
||||
|
||||

|
||||
|
||||
##### Exaggeration Level
|
||||
|
||||
Exaggeration (Neutral = 0.5, extreme values can be unstable). Higher exaggeration tends to speed up speech; reducing cfg helps compensate with slower, more deliberate pacing.
|
||||
|
||||
##### CFG / Pace
|
||||
|
||||
If the reference speaker has a fast speaking style, lowering cfg to around 0.3 can improve pacing.
|
||||
@@ -1,7 +1,41 @@
|
||||
# ElevenLabs
|
||||
|
||||
If you have not configured the ElevenLabs TTS API, the voice agent will show that the API key is missing.
|
||||
Professional voice synthesis with voice cloning capabilities using ElevenLabs API.
|
||||
|
||||

|
||||

|
||||
|
||||
See the [ElevenLabs API setup](/talemate/user-guide/apis/elevenlabs/) for instructions on how to set up the API key.
|
||||
## API Setup
|
||||
|
||||
ElevenLabs requires an API key. See the [ElevenLabs API setup](/talemate/user-guide/apis/elevenlabs/) for instructions on obtaining and setting an API key.
|
||||
|
||||
## Configuration
|
||||
|
||||
**Model:** Select from available ElevenLabs models
|
||||
|
||||
!!! warning "Voice Limits"
|
||||
Your ElevenLabs subscription allows you to maintain a set number of voices (10 for the cheapest plan). Any voice that you generate audio for is automatically added to your voices at [https://elevenlabs.io/app/voice-lab](https://elevenlabs.io/app/voice-lab). This also happens when you use the "Test" button. It is recommended to test voices via their voice library instead.
|
||||
|
||||
## Adding ElevenLabs Voices
|
||||
|
||||
### Getting Voice IDs
|
||||
|
||||
1. Go to [https://elevenlabs.io/app/voice-lab](https://elevenlabs.io/app/voice-lab) to view your voices
|
||||
2. Find or create the voice you want to use
|
||||
3. Click "More Actions" -> "Copy Voice ID" for the desired voice
|
||||
|
||||

|
||||
|
||||
### Creating a Voice in Talemate
|
||||
|
||||

|
||||
|
||||
1. Open the Voice Library
|
||||
2. Click "Add Voice"
|
||||
3. Select "ElevenLabs" as the provider
|
||||
4. Configure the voice:
|
||||
|
||||
**Label:** Descriptive name for the voice
|
||||
|
||||
**Provider ID:** Paste the ElevenLabs voice ID you copied
|
||||
|
||||
**Tags:** Add descriptive tags for organization
|
||||
78
docs/user-guide/agents/voice/f5tts.md
Normal file
@@ -0,0 +1,78 @@
|
||||
# F5-TTS
|
||||
|
||||
Local zero shot voice cloning from .wav files.
|
||||
|
||||

|
||||
|
||||
##### Device
|
||||
Auto-detects best available option (GPU preferred)
|
||||
|
||||
##### Model
|
||||
|
||||
- F5TTS_v1_Base (default, most recent model)
|
||||
- F5TTS_Base
|
||||
- E2TTS_Base
|
||||
|
||||
##### NFE Step
|
||||
|
||||
Number of steps to generate the voice. Higher values result in more detailed voices.
|
||||
|
||||
##### Chunk size
|
||||
|
||||
Split text into chunks of this size. Smaller values will increase responsiveness at the cost of lost context between chunks. (Stuff like appropriate inflection, etc.). 0 = no chunking
|
||||
|
||||
##### Replace exclamation marks
|
||||
|
||||
If checked, exclamation marks will be replaced with periods. This is recommended for `F5TTS_v1_Base` since it seems to over exaggerate exclamation marks.
|
||||
|
||||
## Adding F5-TTS Voices
|
||||
|
||||
### Voice Requirements
|
||||
|
||||
F5-TTS voices require:
|
||||
|
||||
- Reference audio file (.wav format, 10-30 seconds)
|
||||
- Clear speech with minimal background noise
|
||||
- Single speaker throughout the sample
|
||||
- Reference text (optional but recommended)
|
||||
|
||||
### Creating a Voice
|
||||
|
||||
1. Open the Voice Library
|
||||
2. Click "Add Voice"
|
||||
3. Select "F5-TTS" as the provider
|
||||
4. Configure the voice:
|
||||
|
||||

|
||||
|
||||
**Label:** Descriptive name (e.g., "Emma - Calm Female")
|
||||
|
||||
**Voice ID / Upload File** Upload a .wav file containing the **reference audio** voice sample. The uploaded reference audio will also be the voice ID.
|
||||
|
||||
- Use 6-10 second samples (longer doesn't improve quality)
|
||||
- Ensure clear speech with minimal background noise
|
||||
- Record at natural speaking pace
|
||||
|
||||
**Reference Text:** Enter the exact text spoken in the reference audio for improved quality
|
||||
|
||||
- Enter exactly what is spoken in the reference audio
|
||||
- Include proper punctuation and capitalization
|
||||
- Improves voice cloning accuracy significantly
|
||||
|
||||
**Speed:** Adjust playback speed (0.5 to 2.0, default 1.0)
|
||||
|
||||
**Tags:** Add descriptive tags (gender, age, style) for organization
|
||||
|
||||
**Extra voice parameters**
|
||||
|
||||
There exist some optional parameters that can be set here on a per voice level.
|
||||
|
||||

|
||||
|
||||
##### Speed
|
||||
|
||||
Allows you to adjust the speed of the voice.
|
||||
|
||||
##### CFG Strength
|
||||
|
||||
A higher CFG strength generally leads to more faithful reproduction of the input text, while a lower CFG strength can result in more varied or creative speech output, potentially at the cost of text-to-speech accuracy.
|
||||
15
docs/user-guide/agents/voice/google.md
Normal file
@@ -0,0 +1,15 @@
|
||||
# Google Gemini-TTS
|
||||
|
||||
Google Gemini-TTS provides access to Google's text-to-speech service.
|
||||
|
||||
## API Setup
|
||||
|
||||
Google Gemini-TTS requires a Google Cloud API key.
|
||||
|
||||
See the [Google Cloud API setup](/talemate/user-guide/apis/google/) for instructions on obtaining an API key.
|
||||
|
||||
## Configuration
|
||||
|
||||

|
||||
|
||||
**Model:** Select from available Google TTS models
|
||||
@@ -1,6 +1,26 @@
|
||||
# Overview
|
||||
|
||||
Talemate supports Text-to-Speech (TTS) functionality, allowing users to convert text into spoken audio. This document outlines the steps required to configure TTS for Talemate using different providers, including ElevenLabs and a local TTS API.
|
||||
In 0.32.0 Talemate's TTS (Text-to-Speech) agent has been completely refactored to provide advanced voice capabilities including per-character voice assignment, speaker separation, and support for multiple local and remote APIs. The voice system now includes a comprehensive voice library for managing and organizing voices across all supported providers.
|
||||
|
||||
## Key Features
|
||||
|
||||
- **Per-character voice assignment** - Each character can have their own unique voice
|
||||
- **Speaker separation** - Automatic detection and separation of dialogue from narration
|
||||
- **Voice library management** - Centralized management of all voices across providers
|
||||
- **Multiple API support** - Support for both local and remote TTS providers
|
||||
- **Director integration** - Automatic voice assignment for new characters
|
||||
|
||||
## Supported APIs
|
||||
|
||||
### Local APIs
|
||||
- **Kokoro** - Fastest generation with predefined voice models and mixing
|
||||
- **F5-TTS** - Fast voice cloning with occasional mispronunciations
|
||||
- **Chatterbox** - High-quality voice cloning (slower generation)
|
||||
|
||||
### Remote APIs
|
||||
- **ElevenLabs** - Professional voice synthesis with voice cloning
|
||||
- **Google Gemini-TTS** - Google's text-to-speech service
|
||||
- **OpenAI** - OpenAI's TTS-1 and TTS-1-HD models
|
||||
|
||||
## Enable the Voice agent
|
||||
|
||||
@@ -12,28 +32,30 @@ If your voice agent is disabled - indicated by the grey dot next to the agent -
|
||||
|
||||
 
|
||||
|
||||
!!! note "Ctrl click to toggle agent"
|
||||
You can use Ctrl click to toggle the agent on and off.
|
||||
|
||||
!!! abstract "Next: Connect to a TTS api"
|
||||
Next you need to decide which service / api to use for audio generation and configure the voice agent accordingly.
|
||||
## Voice Library Management
|
||||
|
||||
- [OpenAI](openai.md)
|
||||
- [ElevenLabs](elevenlabs.md)
|
||||
- [Local TTS](local_tts.md)
|
||||
Voices are managed through the Voice Library, accessible from the main application bar. The Voice Library allows you to:
|
||||
|
||||
You can also find more information about the various settings [here](settings.md).
|
||||
- Add and organize voices from all supported providers
|
||||
- Assign voices to specific characters
|
||||
- Create mixed voices (Kokoro)
|
||||
- Manage both global and scene-specific voice libraries
|
||||
|
||||
## Select a voice
|
||||
See the [Voice Library Guide](voice-library.md) for detailed instructions.
|
||||
|
||||

|
||||
## Character Voice Assignment
|
||||
|
||||
Click on the agent to open the agent settings.
|
||||

|
||||
|
||||
Then click on the `Narrator Voice` dropdown and select a voice.
|
||||
Characters can have individual voices assigned through the Voice Library. When a character has a voice assigned:
|
||||
|
||||

|
||||
1. Their dialogue will use their specific voice
|
||||
2. The narrator voice is used for exposition in their messages (with speaker separation enabled)
|
||||
3. If their assigned voice's API is not available, it falls back to the narrator voice
|
||||
|
||||
The selection is saved automatically, click anywhere outside the agent window to close it.
|
||||
The Voice agent status will show all assigned character voices and their current status.
|
||||
|
||||
The Voice agent should now show that the voice is selected and be ready to use.
|
||||
|
||||

|
||||

|
||||
55
docs/user-guide/agents/voice/kokoro.md
Normal file
@@ -0,0 +1,55 @@
|
||||
# Kokoro
|
||||
|
||||
Kokoro provides predefined voice models and voice mixing capabilities for creating custom voices.
|
||||
|
||||
## Using Predefined Voices
|
||||
|
||||
Kokoro comes with built-in voice models that are ready to use immediately
|
||||
|
||||
Available predefined voices include various male and female voices with different characteristics.
|
||||
|
||||
## Creating Mixed Voices
|
||||
|
||||
Kokor allows you to mix voices together to create a new voice.
|
||||
|
||||
### Voice Mixing Interface
|
||||
|
||||
|
||||
To create a mixed voice:
|
||||
|
||||
1. Open the Voice Library
|
||||
2. Click ":material-plus: New"
|
||||
3. Select "Kokoro" as the provider
|
||||
4. Choose ":material-tune:Mixer" option
|
||||
5. Configure the mixed voice:
|
||||
|
||||

|
||||
|
||||
|
||||
**Label:** Descriptive name for the mixed voice
|
||||
|
||||
**Base Voices:** Select 2-4 existing Kokoro voices to combine
|
||||
|
||||
**Weights:** Set the influence of each voice (0.1 to 1.0)
|
||||
|
||||
**Tags:** Descriptive tags for organization
|
||||
|
||||
### Weight Configuration
|
||||
|
||||
Each selected voice can have its weight adjusted:
|
||||
|
||||
- Higher weights make that voice more prominent in the mix
|
||||
- Lower weights make that voice more subtle
|
||||
- Total weights need to sum to 1.0
|
||||
- Experiment with different combinations to achieve desired results
|
||||
|
||||
### Saving Mixed Voices
|
||||
|
||||
Once configured click "Add Voice", mixed voices are saved to your voice library and can be:
|
||||
|
||||
- Assigned to characters
|
||||
- Used as narrator voices
|
||||
|
||||
just like any other voice.
|
||||
|
||||
Saving a mixed cvoice may take a moment to complete.
|
||||
@@ -1,53 +0,0 @@
|
||||
# Local TTS
|
||||
|
||||
!!! warning
|
||||
This has not been tested in a while and may not work as expected. It will likely be replaced with something different in the future. If this approach is currently broken its likely to remain so until it is replaced.
|
||||
|
||||
For running a local TTS API, Talemate requires specific dependencies to be installed.
|
||||
|
||||
### Windows Installation
|
||||
|
||||
Run `install-local-tts.bat` to install the necessary requirements.
|
||||
|
||||
### Linux Installation
|
||||
|
||||
Execute the following command:
|
||||
|
||||
```bash
|
||||
pip install TTS
|
||||
```
|
||||
|
||||
### Model and Device Configuration
|
||||
|
||||
1. Choose a TTS model from the [Coqui TTS model list](https://github.com/coqui-ai/TTS).
|
||||
2. Decide whether to use `cuda` or `cpu` for the device setting.
|
||||
3. The first time you run TTS through the local API, it will download the specified model. Please note that this may take some time, and the download progress will be visible in the Talemate backend output.
|
||||
|
||||
Example configuration snippet:
|
||||
|
||||
```yaml
|
||||
tts:
|
||||
device: cuda # or 'cpu'
|
||||
model: tts_models/multilingual/multi-dataset/xtts_v2
|
||||
```
|
||||
|
||||
### Voice Samples Configuration
|
||||
|
||||
Configure voice samples by setting the `value` field to the path of a .wav file voice sample. Official samples can be downloaded from [Coqui XTTS-v2 samples](https://huggingface.co/coqui/XTTS-v2/tree/main/samples).
|
||||
|
||||
Example configuration snippet:
|
||||
|
||||
```yaml
|
||||
tts:
|
||||
voices:
|
||||
- label: English Male
|
||||
value: path/to/english_male.wav
|
||||
- label: English Female
|
||||
value: path/to/english_female.wav
|
||||
```
|
||||
|
||||
## Saving the Configuration
|
||||
|
||||
After configuring the `config.yaml` file, save your changes. Talemate will use the updated settings the next time it starts.
|
||||
|
||||
For more detailed information on configuring Talemate, refer to the `config.py` file in the Talemate source code and the `config.example.yaml` file for a barebone configuration example.
|
||||
@@ -8,16 +8,12 @@ See the [OpenAI API setup](/apis/openai.md) for instructions on how to set up th
|
||||
|
||||
## Settings
|
||||
|
||||

|
||||

|
||||
|
||||
##### Model
|
||||
|
||||
Which model to use for generation.
|
||||
|
||||
- GPT-4o Mini TTS
|
||||
- TTS-1
|
||||
- TTS-1 HD
|
||||
|
||||
!!! quote "OpenAI API documentation on quality"
|
||||
For real-time applications, the standard tts-1 model provides the lowest latency but at a lower quality than the tts-1-hd model. Due to the way the audio is generated, tts-1 is likely to generate content that has more static in certain situations than tts-1-hd. In some cases, the audio may not have noticeable differences depending on your listening device and the individual person.
|
||||
|
||||
Generally i have found that HD is fast enough for talemate, so this is the default.
|
||||
- TTS-1 HD
|
||||
@@ -1,36 +1,65 @@
|
||||
# Settings
|
||||
|
||||

|
||||

|
||||
|
||||
##### API
|
||||
##### Enabled APIs
|
||||
|
||||
The TTS API to use for voice generation.
|
||||
Select which TTS APIs to enable. You can enable multiple APIs simultaneously:
|
||||
|
||||
- OpenAI
|
||||
- ElevenLabs
|
||||
- Local TTS
|
||||
- **Kokoro** - Fastest generation with predefined voice models and mixing
|
||||
- **F5-TTS** - Fast voice cloning with occasional mispronunciations
|
||||
- **Chatterbox** - High-quality voice cloning (slower generation)
|
||||
- **ElevenLabs** - Professional voice synthesis with voice cloning
|
||||
- **Google Gemini-TTS** - Google's text-to-speech service
|
||||
- **OpenAI** - OpenAI's TTS-1 and TTS-1-HD models
|
||||
|
||||
!!! note "Multi-API Support"
|
||||
You can enable multiple APIs and assign different voices from different providers to different characters. The system will automatically route voice generation to the appropriate API based on the voice assignment.
|
||||
|
||||
##### Narrator Voice
|
||||
|
||||
The voice to use for narration. Each API will come with its own set of voices.
|
||||
The default voice used for narration and as a fallback for characters without assigned voices.
|
||||
|
||||

|
||||
The dropdown shows all available voices from all enabled APIs, with the format: "Voice Name (Provider)"
|
||||
|
||||
!!! note "Local TTS"
|
||||
For local TTS, you will have to provide voice samples yourself. See [Local TTS Instructions](local_tts.md) for more information.
|
||||
!!! info "Voice Management"
|
||||
Voices are managed through the Voice Library, accessible from the main application bar. Adding, removing, or modifying voices should be done through the Voice Library interface.
|
||||
|
||||
##### Generate for player
|
||||
##### Speaker Separation
|
||||
|
||||
Whether to generate voice for the player. If enabled, whenever the player speaks, the voice agent will generate audio for them.
|
||||
Controls how dialogue is separated from exposition in messages:
|
||||
|
||||
##### Generate for NPCs
|
||||
- **No separation** - Character messages use character voice entirely, narrator messages use narrator voice
|
||||
- **Simple** - Basic separation of dialogue from exposition using punctuation analysis, with exposition being read by the narrator voice
|
||||
- **Mixed** - Enables AI assisted separation for narrator messages and simple separation for character messages
|
||||
- **AI assisted** - AI assisted separation for both narrator and character messages
|
||||
|
||||
Whether to generate voice for NPCs. If enabled, whenever a non player character speaks, the voice agent will generate audio for them.
|
||||
!!! warning "AI Assisted Performance"
|
||||
AI-assisted speaker separation sends additional prompts to your LLM, which may impact response time and API costs.
|
||||
|
||||
##### Generate for narration
|
||||
##### Auto-generate for player
|
||||
|
||||
Whether to generate voice for narration. If enabled, whenever the narrator speaks, the voice agent will generate audio for them.
|
||||
Generate voice automatically for player messages
|
||||
|
||||
##### Split generation
|
||||
##### Auto-generate for AI characters
|
||||
|
||||
If enabled, the voice agent will generate audio in chunks, allowing for faster generation. This does however cause it lose context between chunks, and inflection may not be as good.
|
||||
Generate voice automatically for NPC/AI character messages
|
||||
|
||||
##### Auto-generate for narration
|
||||
|
||||
Generate voice automatically for narrator messages
|
||||
|
||||
##### Auto-generate for context investigation
|
||||
|
||||
Generate voice automatically for context investigation messages
|
||||
|
||||
## Advanced Settings
|
||||
|
||||
Advanced settings are configured per-API and can be found in the respective API configuration sections:
|
||||
|
||||
- **Chunk size** - Maximum text length per generation request
|
||||
- **Model selection** - Choose specific models for each API
|
||||
- **Voice parameters** - Provider-specific voice settings
|
||||
|
||||
!!! tip "Performance Optimization"
|
||||
Each API has different optimal chunk sizes and parameters. The system automatically handles chunking and queuing for optimal performance across all enabled APIs.
|
||||
156
docs/user-guide/agents/voice/voice-library.md
Normal file
@@ -0,0 +1,156 @@
|
||||
# Voice Library
|
||||
|
||||
The Voice Library is the central hub for managing all voices across all TTS providers in Talemate. It provides a unified interface for organizing, creating, and assigning voices to characters.
|
||||
|
||||
## Accessing the Voice Library
|
||||
|
||||
The Voice Library can be accessed from the main application bar at the top of the Talemate interface.
|
||||
|
||||

|
||||
|
||||
Click the voice icon to open the Voice Library dialog.
|
||||
|
||||
!!! note "Voice agent needs to be enabled"
|
||||
The Voice agent needs to be enabled for the voice library to be available.
|
||||
|
||||
## Voice Library Interface
|
||||
|
||||

|
||||
|
||||
The Voice Library interface consists of:
|
||||
|
||||
### Scope Tabs
|
||||
|
||||
- **Global** - Voices available across all scenes
|
||||
- **Scene** - Voices specific to the current scene (only visible when a scene is loaded)
|
||||
- **Characters** - Character voice assignments for the current scene (only visible when a scene is loaded)
|
||||
|
||||
### API Status
|
||||
|
||||
The toolbar shows the status of all TTS APIs:
|
||||
|
||||
- **Green** - API is enabled and ready
|
||||
- **Orange** - API is enabled but not configured
|
||||
- **Red** - API has configuration issues
|
||||
- **Gray** - API is disabled
|
||||
|
||||

|
||||
|
||||
## Managing Voices
|
||||
|
||||
### Global Voice Library
|
||||
|
||||
The global voice library contains voices that are available across all scenes. These include:
|
||||
|
||||
- Default voices provided by each TTS provider
|
||||
- Custom voices you've added
|
||||
|
||||
#### Adding New Voices
|
||||
|
||||
To add a new voice:
|
||||
|
||||
1. Click the "+ New" button
|
||||
2. Select the TTS provider
|
||||
3. Configure the voice parameters:
|
||||
- **Label** - Display name for the voice
|
||||
- **Provider ID** - Provider-specific identifier
|
||||
- **Tags** - Free-form descriptive tags you define (gender, age, style, etc.)
|
||||
- **Parameters** - Provider-specific settings
|
||||
|
||||
Check the provider specific documentation for more information on how to configure the voice.
|
||||
|
||||
#### Voice Types by Provider
|
||||
|
||||
**F5-TTS & Chatterbox:**
|
||||
|
||||
- Upload .wav reference files for voice cloning
|
||||
- Specify reference text for better quality
|
||||
- Adjust speed and other parameters
|
||||
|
||||
**Kokoro:**
|
||||
|
||||
- Select from predefined voice models
|
||||
- Create mixed voices by combining multiple models
|
||||
- Adjust voice mixing weights
|
||||
|
||||
**ElevenLabs:**
|
||||
|
||||
- Select from available ElevenLabs voices
|
||||
- Configure voice settings and stability
|
||||
- Use custom cloned voices from your ElevenLabs account
|
||||
|
||||
**OpenAI:**
|
||||
|
||||
- Choose from available OpenAI voice models
|
||||
- Configure model (GPT-4o Mini TTS, TTS-1, TTS-1-HD)
|
||||
|
||||
**Google Gemini-TTS:**
|
||||
|
||||
- Select from Google's voice models
|
||||
- Configure language and gender settings
|
||||
|
||||
### Scene Voice Library
|
||||
|
||||
Scene-specific voices are only available within the current scene. This is useful for:
|
||||
|
||||
- Scene-specific characters
|
||||
- Temporary voice experiments
|
||||
- Custom voices for specific scenarios
|
||||
|
||||
Scene voices are saved with the scene and will be available when the scene is loaded.
|
||||
|
||||
## Character Voice Assignment
|
||||
|
||||
### Automatic Assignment
|
||||
|
||||
The Director agent can automatically assign voices to new characters based on:
|
||||
|
||||
- Character tags and attributes
|
||||
- Voice tags matching character personality
|
||||
- Available voices in the voice library
|
||||
|
||||
This feature can be enabled in the Director agent settings.
|
||||
|
||||
### Manual Assignment
|
||||
|
||||

|
||||
|
||||
To manually assign a voice to a character:
|
||||
|
||||
1. Go to the "Characters" tab in the Voice Library
|
||||
2. Find the character in the list
|
||||
3. Click the voice dropdown for that character
|
||||
4. Select a voice from the available options
|
||||
5. The assignment is saved automatically
|
||||
|
||||
### Character Voice Status
|
||||
|
||||
The character list shows:
|
||||
|
||||
- **Character name**
|
||||
- **Currently assigned voice** (if any)
|
||||
- **Voice status** - whether the voice's API is available
|
||||
- **Quick assignment controls**
|
||||
|
||||
## Voice Tags and Organization
|
||||
|
||||
### Tagging System
|
||||
|
||||
Voices can be tagged with any descriptive attributes you choose. Tags are completely free-form and user-defined. Common examples include:
|
||||
|
||||
- **Gender**: male, female, neutral
|
||||
- **Age**: young, mature, elderly
|
||||
- **Style**: calm, energetic, dramatic, mysterious
|
||||
- **Quality**: deep, high, raspy, smooth
|
||||
- **Character types**: narrator, villain, hero, comic relief
|
||||
- **Custom tags**: You can create any tags that help you organize your voices
|
||||
|
||||
### Filtering and Search
|
||||
|
||||
Use the search bar to filter voices by:
|
||||
- Voice label/name
|
||||
- Provider
|
||||
- Tags
|
||||
- Character assignments
|
||||
|
||||
This makes it easy to find the right voice for specific characters or situations.
|
||||
82
docs/user-guide/clients/reasoning.md
Normal file
@@ -0,0 +1,82 @@
|
||||
# Reasoning Model Support
|
||||
|
||||
Talemate supports reasoning models that can perform step-by-step thinking before generating their final response. This feature allows models to work through complex problems internally before providing an answer.
|
||||
|
||||
## Enabling Reasoning Support
|
||||
|
||||
To enable reasoning support for a client:
|
||||
|
||||
1. Open the **Clients** dialog from the main toolbar
|
||||
2. Select the client you want to configure
|
||||
3. Navigate to the **Reasoning** tab in the client configuration
|
||||
|
||||

|
||||
|
||||
4. Check the **Enable Reasoning** checkbox
|
||||
|
||||
## Configuring Reasoning Tokens
|
||||
|
||||
Once reasoning is enabled, you can configure the **Reasoning Tokens** setting using the slider:
|
||||
|
||||

|
||||
|
||||
### Recommended Token Amounts
|
||||
|
||||
**For local reasoning models:** Use a high token allocation (recommended: 4096 tokens) to give the model sufficient space for complex reasoning.
|
||||
|
||||
**For remote APIs:** Start with lower amounts (512-1024 tokens) and adjust based on your needs and token costs.
|
||||
|
||||
### Token Allocation Behavior
|
||||
|
||||
The behavior of the reasoning tokens setting depends on your API provider:
|
||||
|
||||
**For APIs that support direct reasoning token specification:**
|
||||
|
||||
- The specified tokens will be allocated specifically for reasoning
|
||||
- The model will use these tokens for internal thinking before generating the response
|
||||
|
||||
**For APIs that do NOT support reasoning token specification:**
|
||||
|
||||
- The tokens are added as extra allowance to the response token limit for ALL requests
|
||||
- This may lead to more verbose responses than usual since Talemate normally uses response token limits to control verbosity
|
||||
|
||||
!!! warning "Increased Verbosity"
|
||||
For providers without direct reasoning token support, enabling reasoning may result in more verbose responses since the extra tokens are added to all requests.
|
||||
|
||||
## Response Pattern Configuration
|
||||
|
||||
When reasoning is enabled, you may need to configure a **Pattern to strip from the response** to remove the thinking process from the final output.
|
||||
|
||||
### Default Patterns
|
||||
|
||||
Talemate provides quick-access buttons for common reasoning patterns:
|
||||
|
||||
- **Default** - Uses the built-in pattern: `.*?</think>`
|
||||
- **`.*?◁/think▷`** - For models using arrow-style thinking delimiters
|
||||
- **`.*?</think>`** - For models using XML-style think tags
|
||||
|
||||
### Custom Patterns
|
||||
|
||||
You can also specify a custom regular expression pattern that matches your model's reasoning format. This pattern will be used to strip the thinking tokens from the response before displaying it to the user.
|
||||
|
||||
## Model Compatibility
|
||||
|
||||
Not all models support reasoning. This feature works best with:
|
||||
|
||||
- Models specifically trained for chain-of-thought reasoning
|
||||
- Models that support structured thinking patterns
|
||||
- APIs that provide reasoning token specification
|
||||
|
||||
## Important Notes
|
||||
|
||||
- **Coercion Disabled**: When reasoning is enabled, LLM coercion (pre-filling responses) is automatically disabled since reasoning models need to generate their complete thought process
|
||||
- **Response Time**: Reasoning models may take longer to respond as they work through their thinking process
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Pattern Not Working
|
||||
If the reasoning pattern isn't properly stripping the thinking process:
|
||||
|
||||
1. Check your model's actual reasoning output format
|
||||
2. Adjust the regular expression pattern to match your model's specific format
|
||||
3. Test with the default pattern first to see if it works
|
||||
@@ -35,4 +35,19 @@ A unique name for the client that makes sense to you.
|
||||
Which model to use. Currently defaults to `gpt-4o`.
|
||||
|
||||
!!! note "Talemate lags behind OpenAI"
|
||||
When OpenAI adds a new model, it currently requires a Talemate update to add it to the list of available models. We are working on making this more dynamic.
|
||||
When OpenAI adds a new model, it currently requires a Talemate update to add it to the list of available models. We are working on making this more dynamic.
|
||||
|
||||
##### Reasoning models (o1, o3, gpt-5)
|
||||
|
||||
!!! important "Enable reasoning and allocate tokens"
|
||||
The `o1`, `o3`, and `gpt-5` families are reasoning models. They always perform internal thinking before producing the final answer. To use them effectively in Talemate:
|
||||
|
||||
- Enable the **Reasoning** option in the client configuration.
|
||||
- Set **Reasoning Tokens** to a sufficiently high value to make room for the model's thinking process.
|
||||
|
||||
A good starting range is 512–1024 tokens. Increase if your tasks are complex. Without enabling reasoning and allocating tokens, these models may return minimal or empty visible content because the token budget is consumed by internal reasoning.
|
||||
|
||||
See the detailed guide: [Reasoning Model Support](/talemate/user-guide/clients/reasoning/).
|
||||
|
||||
!!! tip "Getting empty responses?"
|
||||
If these models return empty or very short answers, it usually means the reasoning budget was exhausted. Increase **Reasoning Tokens** and try again.
|
||||