docs/user-guide/clients/reasoning.md

# Reasoning Model Support

Talemate supports reasoning models that can perform step-by-step thinking before generating their final response. This feature allows models to work through complex problems internally before providing an answer.

## Enabling Reasoning Support

To enable reasoning support for a client:

1. Open the **Clients** dialog from the main toolbar
2. Select the client you want to configure
3. Navigate to the **Reasoning** tab in the client configuration

![Client reasoning configuration](/talemate/img/0.32.0/client-reasoning-2.png)

4. Check the **Enable Reasoning** checkbox

## Configuring Reasoning Tokens

Once reasoning is enabled, you can configure the **Reasoning Tokens** setting using the slider:

![Reasoning tokens configuration](/talemate/img/0.32.0/client-reasoning.png)

### Recommended Token Amounts

**For local reasoning models:** Use a high token allocation (recommended: 4096 tokens) to give the model sufficient space for complex reasoning.

**For remote APIs:** Start with lower amounts (512-1024 tokens) and adjust based on your needs and token costs.

### Token Allocation Behavior

The behavior of the reasoning tokens setting depends on your API provider:

**For APIs that support direct reasoning token specification:**

- The specified tokens will be allocated specifically for reasoning
- The model will use these tokens for internal thinking before generating the response

**For APIs that do NOT support reasoning token specification:**

- The tokens are added as extra allowance to the response token limit for ALL requests
- This may lead to more verbose responses than usual since Talemate normally uses response token limits to control verbosity

!!! warning "Increased Verbosity"
    For providers without direct reasoning token support, enabling reasoning may result in more verbose responses since the extra tokens are added to all requests.

## Response Pattern Configuration

When reasoning is enabled, you may need to configure a **Pattern to strip from the response** to remove the thinking process from the final output.

### Default Patterns

Talemate provides quick-access buttons for common reasoning patterns:

- **Default** - Uses the built-in pattern: `.*?</think>`
- **`.*?◁/think▷`** - For models using arrow-style thinking delimiters
- **`.*?</think>`** - For models using XML-style think tags

### Custom Patterns

You can also specify a custom regular expression pattern that matches your model's reasoning format. This pattern will be used to strip the thinking tokens from the response before displaying it to the user.

## Model Compatibility

Not all models support reasoning. This feature works best with:

- Models specifically trained for chain-of-thought reasoning
- Models that support structured thinking patterns
- APIs that provide reasoning token specification

## Important Notes

- **Coercion Disabled**: When reasoning is enabled, LLM coercion (pre-filling responses) is automatically disabled since reasoning models need to generate their complete thought process
- **Response Time**: Reasoning models may take longer to respond as they work through their thinking process

## Troubleshooting

### Pattern Not Working
If the reasoning pattern isn't properly stripping the thinking process:

1. Check your model's actual reasoning output format
2. Adjust the regular expression pattern to match your model's specific format
3. Test with the default pattern first to see if it works
0.32.0 (#208) * separate other tts apis and improve chunking * move old tts config to voice agent config and implement config widget ux elements for table editing * elevenlabs updated to use their client and expose model selection * linting * separate character class into character.pt and start on voice routing * linting * tts hot swapping and chunking improvements * linting * add support for piper-tts * update gitignore * linting * support google tts fix issue where quick_toggle agent config didnt work on standard config items * linting * only show agent quick toggles if the agent is enabled * change elevenlabs to use a locally maintained voice list * tts generate before / after events * voice library refactor * linting * update openai model and voices * tweak configs * voice library ux * linting * add support for kokoro tts * fix add / remove voice * voice library tags * linting * linting * tts api status * api infos and add more kokoro voices * allow voice testing before saving a new voice * tweaks to voice library ux and some api info text * linting * voice mixer * polish * voice files go into /tts instead of templates/voice * change default narrator voice * xtts confirmation note * character voice select * koboldai format template * polish * skip empty chunks * change default voice * replace em-dash with normal dash * adjust limit * replace libebreaks * chunk cleanup for whitespace * info updated * remove invalid endif tag * sort voices by ready api * Character hashable type * clarify set_simulated_environment use to avoid unwanted character deactivated * allow manual generation of tts and fix assorted issues with tts * tts websocket handler router renamed * voice mixer: when there are only 2 voices auto adjust the other weight as needed * separate persist character functions into own mixin * auto assign voices * fix chara load and auto assign voice during chara load * smart speaker separation * tts speaker separation config * generate tts for intro text * fix prompting issues with anthropic, google and openrouter clients * decensor flag off again * only to ai assisted voice markup on narrator messages * openrouter provider configuration * linting * improved sound controls * add support for chatterbox * fix info * chatterbox dependencies * remove piper and xtts2 * linting * voice params * linting * tts model overrides and move tts info to tab * reorg toolbar * allow overriding of test text * more tts fixes, apply intensity, chatterbox voices * confirm voice delete * lintinG * groq updates * reorg decorators * tts fixes * cancelable audio queue * voice library uploads * scene voice library * Config refactor (#13) * config refactor progres * config nuke continues * fix system prompts * linting * client fun * client config refactor * fix kcpp auto embedding selection * linting * fix proxy config * remove cruft * fix remaining client bugs from config refactor always use get_config(), dont keep an instance reference * support for reasoning models * more reasoning tweaks * only allow one frontend to connect at a time * fix tests * relock * relock * more client adjustments * pattern prefill * some tts agent fixes * fix ai assist cond * tts nodes * fix config retrieval * assign voice node and fixes * sim suite char gen assign voice * fix voice assign template to consider used voices * get rid of auto break repetition which wasn't working right for a while anyhow * linting * generate tts node as string node * linting * voice change on character event * tweak chatterbox max length * koboldai default template * linting * fix saving of existing voice * relock * adjust params of eva default voice * f5tts support * f5tts samples * f5tts support * f5tts tweaks * chunk size per tts api and reorg defaul f5tts voices * chatterbox default voice reog to match f5-tts default voices * voice library ux polish pass * cleanup * f5-tts tweaks * missing samples * get rid of old save cmd * add chatterbox and f5tts * housekeeping * fix some issues with world entry editing * remove cruft * replace exclamation marks * fix save immutable check * fix replace_exclamation_marks * better error handling in websocket plugins and fix issue with saves * agent config save on dialog close * ctrl click to disable / enable agents * fix quick config * allow modifying response size of focal requests * sim suite set goal always sets story intent, encourage calling of set goal during simulation start * allow setting of model * voice param tweaks * tts tweaks * fix character card load * fix note_on_value * add mixed speaker_separation mode * indicate which message the audio is for and provide way to stop audio from the message * fix issue with some tts generation failing * linting * fix speaker separate modes * bad idea * linting * refactor speaker separation prompt * add kimi think pattern * fix issue with unwanted cover image replacemenT * no scene analysis for visual promp generation (for now) * linting * tts for context investigation messages * prompt tweaks * tweak intro * fix intro text tts not auto playing sometimes * consider narrator voice when assigning voice tro a character * allow director log messages to go only into the director console * linting * startup performance fixes * init time * linting * only show audio control for messagews taht can have it * always create story intent and dont override existing saves during character card load * fix history check in dynamic story line node add HasHistory node * linting * fix intro message not having speaker separation * voice library character manager * sequantial and cancelable auto assign all * linting * fix generation cancel handling * tooltips * fix auto assign voice from scene voices * polish * kokoro does not like lazy import * update info text * complete scene export / import * linting * wording * remove cruft * fix story intent generation during character card import * fix generation cancelled emit status inf loop * prompt tweak * reasoning quick toggle, reasoning token slider, tooltips * improved reasoning pattern handling * fix indirect coercion response parsing * fix streaming issue * response length instructions * more robust streaming * adjust default * adjust formatting * litning * remove debug output * director console log function calls * install cuda script updated * linting * add another step * adjust default * update dialogue examples * fix voice selection issues * what's happening here * third time's the charm? * Vite migration (#207) * add vite config * replace babel, webpack, vue-cli deps with vite, switch to esm modules, separate eslint config * change process.env to import.meta.env * update index.html for vite and move to root * update docs for vite * remove vue cli config * update example env with vite * bump frontend deps after rebase to 32.0 --------- Co-authored-by: pax-co <Pax_801@proton.me> * properly referencer data type * what's new * better indication of dialogue example supporting multiple lines, improve dialogue example display * fix potential issue with cached scene anlysis being reused when it shouldn't * fix character creation issues with player character toggle * fix issue where editing a message would sometimes lose parts of the message * fix slider ux thumb labels (vuetify update) * relock * narrative conversation format * remove planning step * linting * tweaks * don't overthink * update dialogue examples and intro * dont dictate response length instructions when data structures are expected * prompt tweaks * prompt tweaks * linting * fix edit message not handling : well * prompt tweaks * fix tests * fix manual revision when character message was generated in new narrative mode * fix issue with message editing * Docker packages relese (#204) * add CI workflow for Docker image build and MkDocs deployment * rename CI workflow from 'ci' to 'package' * refactor CI workflow: consolidate container build and documentation deployment into a single file * fix: correct indentation for permissions in CI workflow * fix: correct indentation for steps in deploy-docs job in CI workflow * build both cpu and cuda image * docs * docs * expose writing style during state reinforcement * prompt tweaks * test container build * test container image * update docker compose * docs * test-container-build * test container build * test container build * update docker build workflows * fix guidance prompt prefix not being dropped * mount tts dir * add gpt-5 * remove debug output * docs * openai auto toggle reasoning based on model selection * linting --------- Co-authored-by: pax-co <123330830+pax-co@users.noreply.github.com> Co-authored-by: pax-co <Pax_801@proton.me> Co-authored-by: Luis Alexandre Deschamps Brandão <brandao_luis@yahoo.com> 2025-08-08 13:56:29 +03:00			`# Reasoning Model Support`

			`Talemate supports reasoning models that can perform step-by-step thinking before generating their final response. This feature allows models to work through complex problems internally before providing an answer.`

			`## Enabling Reasoning Support`

			`To enable reasoning support for a client:`

			`1. Open the Clients dialog from the main toolbar`
			`2. Select the client you want to configure`
			`3. Navigate to the Reasoning tab in the client configuration`

			`![Client reasoning configuration](/talemate/img/0.32.0/client-reasoning-2.png)`

			`4. Check the Enable Reasoning checkbox`

			`## Configuring Reasoning Tokens`

			`Once reasoning is enabled, you can configure the Reasoning Tokens setting using the slider:`

			`![Reasoning tokens configuration](/talemate/img/0.32.0/client-reasoning.png)`

			`### Recommended Token Amounts`

			`For local reasoning models: Use a high token allocation (recommended: 4096 tokens) to give the model sufficient space for complex reasoning.`

			`For remote APIs: Start with lower amounts (512-1024 tokens) and adjust based on your needs and token costs.`

			`### Token Allocation Behavior`

			`The behavior of the reasoning tokens setting depends on your API provider:`

			`For APIs that support direct reasoning token specification:`

			`- The specified tokens will be allocated specifically for reasoning`
			`- The model will use these tokens for internal thinking before generating the response`

			`For APIs that do NOT support reasoning token specification:`

			`- The tokens are added as extra allowance to the response token limit for ALL requests`
			`- This may lead to more verbose responses than usual since Talemate normally uses response token limits to control verbosity`

			`!!! warning "Increased Verbosity"`
			`For providers without direct reasoning token support, enabling reasoning may result in more verbose responses since the extra tokens are added to all requests.`

			`## Response Pattern Configuration`

			`When reasoning is enabled, you may need to configure a Pattern to strip from the response to remove the thinking process from the final output.`

			`### Default Patterns`

			`Talemate provides quick-access buttons for common reasoning patterns:`

			- Default - Uses the built-in pattern: `.*?</think>`
			- *`.?◁/think▷`** - For models using arrow-style thinking delimiters
			- *`.?</think>`** - For models using XML-style think tags

			`### Custom Patterns`

			`You can also specify a custom regular expression pattern that matches your model's reasoning format. This pattern will be used to strip the thinking tokens from the response before displaying it to the user.`

			`## Model Compatibility`

			`Not all models support reasoning. This feature works best with:`

			`- Models specifically trained for chain-of-thought reasoning`
			`- Models that support structured thinking patterns`
			`- APIs that provide reasoning token specification`

			`## Important Notes`

			`- Coercion Disabled: When reasoning is enabled, LLM coercion (pre-filling responses) is automatically disabled since reasoning models need to generate their complete thought process`
			`- Response Time: Reasoning models may take longer to respond as they work through their thinking process`

			`## Troubleshooting`

			`### Pattern Not Working`
			`If the reasoning pattern isn't properly stripping the thinking process:`

			`1. Check your model's actual reasoning output format`
			`2. Adjust the regular expression pattern to match your model's specific format`
			`3. Test with the default pattern first to see if it works`