Compare commits

..

2 Commits

Author SHA1 Message Date
veguAI
c179fcd3eb 0.34.0 (#239)
Visual Agent Refactor + Visual Library
Character Card Import Refactor
Bug fixes and other improvements
2025-12-06 11:19:48 +02:00
veguAI
89d16ae513 0.33.0 (#229)
* linting

* Add cleanup function for recent scenes in config to remove non-existent paths

* remove leghacy world state manager buttons

* move world state scene tools into sub component

* linting

* move module properties to navigation drawer

* Update icons in NodeEditorLibrary and NodeEditorModuleProperties for improved UI clarity

* prompt tweaks

* director chat prompt simplifications

* more prompt fixes

* Enhance type hints for duration conversion functions in time.py

* narrate time action now has access to response length instructions

* Add IsoDateDuration node for ISO 8601 interval string construction

* Update advance_time method to include return type annotation and return message

* Add AdvanceTime node to world state for time advancement with duration and narration instructions

* linting

* Add agent state exclusions to changelog with a TODO for module migration

* Add message emission for actor, narrator, and scene analysis guidance in respective components. Enhance AgentMessages and SceneTools for better message handling and visual feedback.

* Remove agent messages from state when opening agent message view in SceneTools component.

* linting

* openroute fetch models on key set

* Add input history functionality to message input in TalemateApp component. Implement keyboard shortcuts for navigating history (Ctrl+Up/Down) and limit history to the last 10 messages. Update message sending logic to store messages in history.

* Update message input hint in TalemateApp component to include keyboard shortcuts for navigating input history (Ctrl+Up/Down).

* node updates

* unified data extraction function

* prompt tweaks

* Add gamestate context support in BuildPrompt and corresponding template. Introduced new property for gamestate context and updated rendering logic to include gamestate information in prompts.

* Refactor Prompt class by removing LoopedPrompt and cleaning up related methods. Update data response parsing to streamline functionality and improve clarity. Adjust imports accordingly.

* Add 'data_multiple' property to GenerateResponse class to allow multiple data structures in responses. Update output socket type for 'data_obj' to support both dict and list formats.

* Add DictUpdate node

* Add UnpackGameState node to retrieve and unpack game state variables

* gamestate nodes

* linting

* Enhance scene view toggle functionality to support shift-click behavior for closing all drawers when hiding the scene view.

* immutable scenes should reset context db on load

* linting

* node updates

* prompt tweaks

* Add context type output and filtering for creative context ID meta entries in PathToContextID and ContextIDMetaEntries nodes

* Add string replacement functionality and Jinja2 formatting support in nodes. Introduced 'old' and 'new' properties for substring replacement in the Replace node, and added a new Jinja2Format node for template rendering using jinja2.

* Add additional outputs for context validation in ValidateContextIDItem node, including context type, context value, and name.

* prompt tweaks

* node adjustments

* linting

* Add data_expected attribute to Focal and Prompt classes for enhanced response handling

* node updates

* node updates

* node updates

* prompt tweaks

* director summary return appropriately on no action taken

* Enhance action handling in DirectorChatMixin by skipping actions when a question is present in the parsed response, ensuring better response accuracy.

* Enhance ConfirmActionPrompt component by adding anchorTop prop for dynamic alignment and adjusting icon size and color for improved UI consistency.

* anchor clear chat confirm to top

* responsive layout fixes in template editors

* linting

* relock

* Add scene progression guidance to chat-common-tasks template

* Refactor push_history method to be asynchronous across multiple agents and scenes, ensuring consistent handling of message history updates.

* Update chat instructions to clarify user intent considerations and enhance decisiveness in responses. Added guidance on distinguishing between scene progression and background changes, and refined analysis requirements for user interactions.

* Enhance DirectorConsoleChatsToolbar by adding a usage cheat sheet tooltip for user guidance and refining the Clear Chat button's UI for better accessibility.

* store character data at unified point

* fix button

* fix world editor auto sync

* Shared context 2 (#19)

Shared context

* Refactor NodeEditorLibrary to improve search functionality and debounce input handling. Updated v-text-field model and added a watcher for search input to enhance performance.

* Refactor NodeEditor and TalemateApp components to enhance UI interactions. Removed the exit creative mode button from NodeEditor and updated tooltips for clarity. Adjusted app bar navigation icons for better accessibility and added functionality to switch between node editor and creative mode.

* comment

* Character.update deserialize voice value correctly

* Enhance SharedContext.update_to_scene method to properly add or update character data in the scene based on existence checks. This improves the synchronization of character states between shared context and scene.

* shared context static history support
fix context memory db imports to always import

* Update WorldStateManagerSceneSharedContext.vue to clarify sharing of character, world entries, and history across connected scenes.

* linting

* Enhance chat modes by adding 'nospoilers' option to DirectorChat and related payloads. Update chat instructions to reflect new mode behavior and improve UI to support mode-specific icons and colors in the DirectorConsoleChatsToolbar.

* Comment out 'repetition_penalty_range' in TabbyAPIClient to prevent unexpected "<unk><unk> .." responses. Further investigation needed.

* linting

* Add active_characters and intro_instructions to Inheritance model; implement intro generation in load_scene_from_data. Update WorldStateManagerSceneSharedContext.vue to enhance new scene creation dialog with character selection and premise instructions.

* rename inheritance to scene initialization

* linting

* Update WorldStateManagerSceneSharedContext.vue to conditionally display alert based on scene saving status and new scene creation state.

* Refine messages for shared context checkboxes in WorldStateManagerCharacter and WorldStateManagerWorldEntries components for clarity.

* Add scene title generation to load process and update contextual generation template. Introduced a new method in AssistantMixin for generating scene titles, ensuring titles are concise and free of special characters. Updated load_scene_from_data to assign generated titles to scenes.

* linting

* Refactor GameState component to integrate Codemirror for JSON editing, replacing the previous treeview structure. Implement validation for JSON input and enhance error handling. Remove unused methods and streamline state management.

* Add lastLoadedJSON property to GameState component for change detection. Update validation logic to prevent unnecessary updates when game state has not changed.

* Remove status emission for gameplay switch in CmdSetEnvironmentToScene class.

* allow individual sharing of attributes and details

* linting

* Remove redundant question handling logic in DirectorChatMixin to streamline action selection process.

* Update EXTERNAL_DESCRIPTION in TabbyAPI client to include notes on EXL3 model sensitivity to inference parameters. Adjust handling of 'repetition_penalty_range' in parameter list for clarity.

* director chat support remove message and regenerate message

* Refactor ConfirmActionInline component to improve button rendering logic. Introduced 'size' prop for button customization and added 'comfortable' density option. Simplified icon handling with computed property for better clarity.

* linting

* node updates

* Add appBusy prop to DirectorConsoleChats and DirectorConsoleChatsToolbar components to manage button states during busy operations.

* Refactor DirectorChatMixin to utilize standalone utility functions for parsing response sections and extracting action blocks. This improves code clarity and maintainability. Added tests for new utility functions in test_utils_prompt.py to ensure correct functionality.

* Update clear chat button logic to consider appBusy state in DirectorConsoleChatsToolbar component, enhancing user experience during busy operations.

* linting

* Remove plan.md

* Add chat template identifier support and error handling in ModelPrompt class

- Implemented logic to check for 'chat_template.jinja2' in Hugging Face repository.
- Added new template identifiers: GraniteIdentifier and GLMIdentifier.
- Enhanced error handling to avoid logging 404 errors for missing templates.
- Introduced Granite.jinja2 template file for prompt structure.

* node fixes

* remove debug msg

* Enhance error handling in DynamicInstruction class by enforcing header requirement and ensuring content defaults to an empty string if not provided.

* recet scene message visibility on scene load

* prompt tweaks

* Enhance data extraction in Focal class by adding a fallback mechanism. Implemented additional error handling to attempt data extraction from a fenced block if the initial extraction fails, improving robustness in handling responses.

* linting

* node fixes

* Add relative_to_root function for path resolution and update node export logic

- Introduced a new function `relative_to_root` in path.py to resolve paths relative to the TALEMATE_ROOT.
- Updated the `export_node_definitions` function in registry.py to use `relative_to_root` for module path resolution.
- Added a check to skip non-selectable node definitions in litegraphUtils.js during registration.

* show icons

* Improve error handling in export_node_definitions by adding a try-except block for module path resolution. Log a warning if the relative path conversion fails.

* typo

* Refactor base_attributes type in Character model to a more generic dict type for improved flexibility

* relock

* ensure character gets added to character_data

* prompt tweaks

* linting

* properly activate characters

* activate needs to happen explicitly now and deactivated is the default

* missing arg

* avoid changed size error

* Refactor character removal logic in shared context to prevent deletion; characters are now only marked as non-shared.

* Add update_from_scene method calls in SharedContextMixin for scene synchronization

* Add ensure_changelogs_for_all_scenes function to manage changelog files for all scenes; integrate it into the server run process.

* Enhance backup restore functionality by adding base and latest snapshot options; improve UI with clearer labels and alerts for restore actions.

* Update _apply_delta function to enhance delta application handling by adding parameters for error logging and force application of changes on non-existent paths.

* Skip processing of changelog files in _list_files_and_directories function to prevent unnecessary inclusion in file listings.

* Update IntroRecentScenes.vue to use optional chaining for selectedScene properties and enhance backup timestamp display with revision info.

* linting

* Refactor source entry attribute access in collect_source_entries function to use getattr for optional attributes, improving robustness.

* Implement logic to always show scene view in scene mode within TalemateApp.vue, enhancing user experience during scene interactions.

* prompt tweaks

* prompt tweaks

* Update TalemateApp.vue to set the active tab to 'main' when switching to the node editor, improving navigation consistency.

* Add active frontend websocket handler management in websocket_endpoint

* agent websocket handler node support

* Refactor init_nodes method in DirectorAgent to call superclass method and rename chat initialization method in DirectorChatMixin for clarity.

* Add characters output to ContextHistory node to track active participants in the scene

* Add Agent Websocket Handler option to Node Editor Library with corresponding icons and labels

* Add check for node selectability in NodeEditorNodeSearch component to filter search results accordingly.

* Add SummarizeWebsocketHandler to handle summarize actions and integrate it into SummarizeAgent

* nodes

* Add data property to QueueResponse class for websocket communication and update run method to include action and data in output values.

* Update manual context handling in WorldStateManager to include shared property from existing context

* Enhance GetWorldEntry node to include 'shared' property in output values from world entry context

* Update scene loading to allow setting scene ID from data and include ID in scene serialization

* Update icon for AgentWebsocketHandler in NodeEditorLibrary component to mdi-web-box

* Refactor WorldStateManager components to enhance history management and sharing capabilities. Added summarized history titles, improved UI for sharing static history, and integrated scene summarization functionality. Removed deprecated methods related to shared context settings.

* linting

* Change log level from warning to debug for migrate_narrator_source_to_meta error handling in NarratorMessage class.

* Update GLM-no-reasoning template to include <think></think> tag before coercion message for improved prompt structure.

* allow prompt templates to specify reasoning pattern

* Add Seed.jinja2 template for LLM prompts with reasoning patterns and user interaction handling

* Enhance NarratorAgent to support dynamic response length configuration. Updated max generation length from 192 to 256 tokens and introduced a new method to calculate response length. Modified narration methods to accept and utilize response length parameter. Added response length property in GenerateNarrationBase class and updated templates to include response length handling.

* Update response length calculation in RevisionMixin to include token count for improved text processing.

* Refactor response identifier in RevisionMixin to dynamically use calculated response length for improved prompt handling.

* linting

* allow contextual generation of static history entries

* Add is_static property to HistoryEntry for static history entry identification

* Add "static history" option to ContextualGenerate node for enhanced contextual generation capabilities.

* Add CreateStaticArchiveEntry and RemoveStaticArchiveEntry nodes for managing static history entries. Implement input/output properties and error handling for entry creation and deletion.

* nodes updated

* linting

* Add assets field to SceneInitialization model and update load_scene_from_data function to handle scene assets. Update WorldStateManagerSceneSharedContext.vue to include assets in scene initialization parameters.

* Refactor CoverImage component to enhance drag-and-drop functionality and improve styling for empty portrait state.

* Add intent_state to SceneInitialization model and update load_scene_from_data function to handle intent state. Introduce story_intent property in Scene class and reset method in SceneIntent class. Update WorldStateManagerSceneSharedContext.vue to include intent state in scene initialization parameters.

* Refactor WorldStateManagerSceneSharedContext.vue to improve cancel functionality by introducing a dedicated cancelCreate method and removing the direct dialog toggle from the Cancel button. This enhances code clarity and maintainability.

* Update SharedContext to use await for set_shared method, ensuring proper asynchronous handling when modifying character sharing status.

* Add MAX_CONTENT_WIDTH constant and update components to use it for consistent max width styling

* fix issue with data structure parsing

* linting

* fix tests

* nodes

* fix update_introduction

* Add building blocks template for story configuration and scene management

* Refactor toggleNavigation method to accept an 'open' parameter for direct control over drawer visibility in TalemateApp.vue

* Update usageCheatSheet text in DirectorConsoleChatsToolbar.vue for clarity and add pre-wrap styling to tooltip

* Add cover image and writing style sections to story and character templates; update chat common tasks with new scene restrictions and user guide reference.

* linting

* relock

* Add EmitWorldEditorSync node to handle world editor synchronization; update WorldStateManager to refresh active tab on sync action.

* Update Anthropic client with new models and adjust default settings; introduce limited parameter models for specific configurations.

* director action  module updates

* direct context update fn

* director action updates

* Update usageCheatSheet in DirectorConsoleChatsToolbar.vue to include recommendation for 100B+ models.

* Remove debug diagnostics from DirectorConsoleChats.vue to clean up console output.

* Update card styles in IntroRecentScenes.vue for improved visual consistency; change card color to grey-darken-3 and adjust text classes for titles and subtitles.

* Update EmitWorldEditorSync node to include websocket passthrough in sync action for improved event handling.

* Increase maximum changelog file size limit from 500KB to 1MB to accommodate larger change logs.

* linting

* director action module updates

* 0.33 added

* Add Nexus agent persona to talemate template and initialize phrases array

* Add support for project-specific grouping in NodeEditorLibrary for templates/modules, enhancing organization of node groups.

* docs

* Enhance NodeEditorLibrary by adding primary color to tree component for improved visibility and user experience.

* docs

* Enhance NewSceneSetupModal to include subtitles for writing styles and director personas, improving context and usability.

* Update agent persona description in WorldStateManagerTemplates to specify current support for director only, enhancing clarity for users.

* Refine agent persona description in WorldStateManagerTemplates to clarify assignment per agent in Scene Settings, maintaining focus on current director-only support.

* fix crash when attempting to delete some clients

* Add TODO comments in finalize_llama3 and finalize_YI methods to indicate removable cruft

* Add lock_template feature to Client configuration and update related components for template management

* linting

* persist client template lock through model changes

* There is no longer a point to enforcing creative mode when there are no characters

* fix direct_narrator character argument

* Update CharacterContextItem to allow 'value' to accept dict type in addition to existing types

* docs

* Update lock_template field in Client model to allow None type in addition to bool

* Remove unused template_file field from Defaults model in Client configuration

* Refactor lock_template field in Client model and ClientModal component to ensure consistent boolean handling

* Add field validator for lock_template in Client model to ensure boolean value is returned

* fix issue where valid data processed in extract_data_with_ai_fallback was not returned

* Update default_player_character assignment in ConfigPlugin to use GamePlayerCharacter schema for improved data validation

* linting

* add heiku 4.5 model and make default

* opse 4.5 isnt a thing

* fix issue where  fork / restore would restore duplicate messages

* improve autocomplete handling when prefill isn't available

* prompt tweaks

* linting

* gracefully handle removed attributes

* Refactor scene reference handling in delete_changelog_files to prevent incorrect deletions. Added a test to verify proper scene reference construction and ensure changelog files are deleted correctly.

* forked scenes reset memory id and are not immutable

* emit_status export rev

* Update RequestInput.vue to handle extra_params more robustly, ensuring defaults are set correctly for input.

* only allow forking on saved messages

* linting

* tweak defaults

* summarizer fire off of push_history.after

* docs

* : in world entry titles will now load correctly

* linting

* docs

* removing base attrib ute or detail also clears it from shared list

* fix issue where cancelling some generations would cause errors

* increase font size

* formatting fixes

* unhandled errors at the loop level should not crash the entire scene

* separate message processing from main loop

* linting

* remove debug cruft

* enhance error logging in background processing to include traceback information

* linting

* nothing to detemrine of no model is sent

* fix some errors during kcpp client deletion

* improve configuration issue alert visibility

* restore input focus after autocomplete

* linting
2025-10-25 14:06:55 +03:00
348 changed files with 34597 additions and 5779 deletions

View File

@@ -19,6 +19,25 @@ jobs:
steps:
- uses: actions/checkout@v4
- name: Remove unnecessary files to release disk space
run: |
sudo rm -rf \
"$AGENT_TOOLSDIRECTORY" \
/opt/ghc \
/opt/google/chrome \
/opt/microsoft/msedge \
/opt/microsoft/powershell \
/opt/pipx \
/usr/lib/mono \
/usr/local/julia* \
/usr/local/lib/android \
/usr/local/lib/node_modules \
/usr/local/share/chromium \
/usr/local/share/powershell \
/usr/local/share/powershell \
/usr/share/dotnet \
/usr/share/swift
- name: Log in to GHCR
uses: docker/login-action@v3
with:

View File

@@ -14,6 +14,25 @@ jobs:
steps:
- uses: actions/checkout@v4
- name: Remove unnecessary files to release disk space
run: |
sudo rm -rf \
"$AGENT_TOOLSDIRECTORY" \
/opt/ghc \
/opt/google/chrome \
/opt/microsoft/msedge \
/opt/microsoft/powershell \
/opt/pipx \
/usr/lib/mono \
/usr/local/julia* \
/usr/local/lib/android \
/usr/local/lib/node_modules \
/usr/local/share/chromium \
/usr/local/share/powershell \
/usr/local/share/powershell \
/usr/share/dotnet \
/usr/share/swift
- name: Log in to GHCR
uses: docker/login-action@v3
with:

View File

@@ -45,6 +45,9 @@ WORKDIR /app
RUN apt-get update && apt-get install -y \
bash \
wget \
tar \
xz-utils \
&& rm -rf /var/lib/apt/lists/*
# Install uv in the final stage
@@ -53,6 +56,21 @@ RUN pip install uv
# Copy virtual environment from backend-build stage
COPY --from=backend-build /app/.venv /app/.venv
# Download and install FFmpeg 8.0 with shared libraries into .venv (matching Windows installer approach)
# Using BtbN FFmpeg builds which provide shared libraries - verified to work
# Note: We tried using jrottenberg/ffmpeg:8.0-ubuntu image but copying libraries from it didn't work properly,
# so we use the direct download approach which is more reliable and matches the Windows installer
RUN cd /tmp && \
wget -q https://github.com/BtbN/FFmpeg-Builds/releases/download/latest/ffmpeg-master-latest-linux64-gpl-shared.tar.xz -O ffmpeg.tar.xz && \
tar -xf ffmpeg.tar.xz && \
cp -a ffmpeg-master-latest-linux64-gpl-shared/bin/* /app/.venv/bin/ && \
cp -a ffmpeg-master-latest-linux64-gpl-shared/lib/* /app/.venv/lib/ && \
rm -rf ffmpeg-master-latest-linux64-gpl-shared ffmpeg.tar.xz && \
LD_LIBRARY_PATH=/app/.venv/lib /app/.venv/bin/ffmpeg -version | head -n 1
# Set LD_LIBRARY_PATH so torchcodec can find ffmpeg libraries at runtime
ENV LD_LIBRARY_PATH=/app/.venv/lib:${LD_LIBRARY_PATH}
# Copy Python source code
COPY --from=backend-build /app/src /app/src

View File

@@ -1,134 +0,0 @@
"""
An attempt to write a client against the runpod serverless vllm worker.
This is close to functional, but since runpod serverless gpu availability is currently terrible, i have
been unable to properly test it.
Putting it here for now since i think it makes a decent example of how to write a client against a new service.
"""
import pydantic
import structlog
import runpod
import asyncio
import aiohttp
from talemate.client.base import ClientBase, ExtraField
from talemate.client.registry import register
from talemate.emit import emit
from talemate.config import Client as BaseClientConfig
log = structlog.get_logger("talemate.client.runpod_vllm")
class Defaults(pydantic.BaseModel):
max_token_length: int = 4096
model: str = ""
runpod_id: str = ""
class ClientConfig(BaseClientConfig):
runpod_id: str = ""
@register()
class RunPodVLLMClient(ClientBase):
client_type = "runpod_vllm"
conversation_retries = 5
config_cls = ClientConfig
class Meta(ClientBase.Meta):
title: str = "Runpod VLLM"
name_prefix: str = "Runpod VLLM"
enable_api_auth: bool = True
manual_model: bool = True
defaults: Defaults = Defaults()
extra_fields: dict[str, ExtraField] = {
"runpod_id": ExtraField(
name="runpod_id",
type="text",
label="Runpod ID",
required=True,
description="The Runpod ID to connect to.",
)
}
def __init__(self, model=None, runpod_id=None, **kwargs):
self.model_name = model
self.runpod_id = runpod_id
super().__init__(**kwargs)
@property
def experimental(self):
return False
def set_client(self, **kwargs):
log.debug("set_client", kwargs=kwargs, runpod_id=self.runpod_id)
self.runpod_id = kwargs.get("runpod_id", self.runpod_id)
def tune_prompt_parameters(self, parameters: dict, kind: str):
super().tune_prompt_parameters(parameters, kind)
keys = list(parameters.keys())
valid_keys = ["temperature", "top_p", "max_tokens"]
for key in keys:
if key not in valid_keys:
del parameters[key]
async def get_model_name(self):
return self.model_name
async def generate(self, prompt: str, parameters: dict, kind: str):
"""
Generates text from the given prompt and parameters.
"""
prompt = prompt.strip()
self.log.debug("generate", prompt=prompt[:128] + " ...", parameters=parameters)
try:
async with aiohttp.ClientSession() as session:
endpoint = runpod.AsyncioEndpoint(self.runpod_id, session)
run_request = await endpoint.run(
{
"input": {
"prompt": prompt,
}
# "parameters": parameters
}
)
while (await run_request.status()) not in [
"COMPLETED",
"FAILED",
"CANCELLED",
]:
status = await run_request.status()
log.debug("generate", status=status)
await asyncio.sleep(0.1)
status = await run_request.status()
log.debug("generate", status=status)
response = await run_request.output()
log.debug("generate", response=response)
return response["choices"][0]["tokens"][0]
except Exception as e:
self.log.error("generate error", e=e)
emit(
"status", message="Error during generation (check logs)", status="error"
)
return ""
def reconfigure(self, **kwargs):
if kwargs.get("model"):
self.model_name = kwargs["model"]
if "runpod_id" in kwargs:
self.api_auth = kwargs["runpod_id"]
self.set_client(**kwargs)

Binary file not shown.

After

Width:  |  Height:  |  Size: 346 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 702 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 42 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 19 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 471 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 180 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 82 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 48 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 70 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 46 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 45 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 411 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 21 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 371 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 46 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 45 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 50 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 42 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 44 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 41 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 44 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 35 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 50 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 50 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 21 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 52 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 48 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 47 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 45 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 48 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 51 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 45 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 53 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 60 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.1 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 15 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.0 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 15 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 91 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 39 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.9 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.0 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.6 KiB

View File

@@ -1,42 +0,0 @@
# AUTOMATIC1111
!!! info
This requires you to setup a local instance of the AUTOMATIC1111 API. Follow the instructions from [their GitHub](https://github.com/AUTOMATIC1111/stable-diffusion-webui) to get it running.
Once you have it running, you will want to adjust the `webui-user.bat` in the AUTOMATIC1111 directory to include the following command arguments:
```bat
set COMMANDLINE_ARGS=--api --listen --port 7861
```
Then run the `webui-user.bat` to start the API.
Once your AUTOAMTIC1111 API is running (check with your browser) you can set the Visualizer config to use the `AUTOMATIC1111` backend
## Settings
![Visual agent automatic1111 settings](/talemate/img/0.27.0/automatic1111-settings.png)
##### API URL
The url of the API, if following this example, should be `http://localhost:7861`
##### Steps
The number of steps to use for image generation. More steps will result in higher quality images but will take longer to generate.
##### Sampling Method
Which sampling method to use for image generation.
##### Schedule Type
Which scheduler to use for image generation.
##### CFG Scale
CFG scale for image generation.
##### Model type
Differentiates between `SD1.5` and `SDXL` models. This will dictate the resolution of the image generation and actually matters for the quality so make sure this is set to the correct model type for the model you are using.

View File

@@ -0,0 +1,80 @@
# AUTOMATIC1111
!!! warning "Deprecated Backend"
**AUTOMATIC1111 (A1111) is essentially dead at this point** - development has largely stopped and the project is no longer actively maintained. Support for AUTOMATIC1111 has only been carried forward in Talemate because it was easy to maintain compatibility.
**We strongly recommend using [SD.Next](sdnext.md) instead**, which is an actively maintained fork of AUTOMATIC1111 with improved performance, better features, and ongoing development. SD.Next maintains API compatibility with AUTOMATIC1111, so migration is straightforward.
The AUTOMATIC1111 backend provides basic text-to-image generation capabilities using the AUTOMATIC1111 Stable Diffusion WebUI API. This backend only supports text-to-image generation - it does not support image editing or image analysis.
![This screenshot displays the dark-themed "General" settings interface for an application named "Visualizer," featuring a sidebar menu on the left and configuration options on the right. Key settings include dropdowns where the Client is set to "Google" and the text-to-image backend is set to "AUTOMATIC1111," alongside an image generation timeout slider positioned at 301. Additional controls show a checked box for "Automatic Setup," an unchecked box for "Automatic Generation," and a "Fallback Prompt Type" set to "Keywords."](/talemate/img/0.34.0/visual-agent-a1111-1.png)
## Prerequisites
Before configuring the AUTOMATIC1111 backend, you need to have AUTOMATIC1111 installed and running:
1. Install and start AUTOMATIC1111 Stable Diffusion WebUI on your system
2. Ensure the API is enabled and accessible
3. Note the API URL (default is `http://localhost:7860`)
!!! note "Migration to SD.Next"
If you're setting up a new installation, please use [SD.Next](sdnext.md) instead. If you have an existing AUTOMATIC1111 installation, consider migrating to SD.Next for better performance and ongoing support.
## Configuration
In the Visualizer agent settings, select AUTOMATIC1111 as your backend for text-to-image generation.
### Text-to-Image Configuration
For text-to-image generation, configure the following settings:
- **API URL**: The URL where your AUTOMATIC1111 instance is running (e.g., `http://localhost:7860`)
- **Steps**: Number of sampling steps (default: 40, range: 5-150)
- **Sampling Method**: The sampling algorithm to use (e.g., "DPM++ 2M", "Euler a")
- **Schedule Type**: The sampling schedule to use (e.g., "Automatic", "Karras", "Uniform")
- **CFG Scale**: Classifier-free guidance scale (default: 7.0, range: 1-30)
- **Prompt Type**: Choose between "Keywords" or "Descriptive" prompt formatting
- **Resolutions**: Configure the pixel dimensions for Square, Portrait, and Landscape formats
![A screenshot of the Visualizer interface displaying configuration settings for Automatic1111 text-to-image generation. The panel features adjustable parameters such as the API URL, sampler steps, CFG scale, and sampling method. Additionally, it includes sections for selecting prompting types and setting default resolutions for square, portrait, and landscape image orientations.](/talemate/img/0.34.0/visual-agent-a1111-2.png)
!!! note "No Authentication"
AUTOMATIC1111 backend does not support authentication. If your AUTOMATIC1111 instance requires authentication, you'll need to either disable it or use SD.Next instead, which supports authentication.
!!! note "Model Selection"
AUTOMATIC1111 does not support model selection through the API. The backend will use whatever model is currently loaded in your AUTOMATIC1111 instance. You need to change models manually in the AUTOMATIC1111 WebUI interface.
## Usage
Once configured, the AUTOMATIC1111 backend will appear in the Visualizer agent status with a green indicator showing text-to-image capability is available.
![A dark mode interface element titled "Visualizer" accompanied by a green status dot. Below the title are two badges: a gray button labeled "Google" with a computer icon and a green button labeled "AUTOMATIC1111" with an image icon.](/talemate/img/0.34.0/visual-agent-a1111-3.png)
## Limitations
The AUTOMATIC1111 backend has several limitations compared to SD.Next:
- **No image editing**: Only supports text-to-image generation
- **No authentication**: Cannot connect to instances that require authentication
- **No model selection**: Uses whatever model is loaded in AUTOMATIC1111
- **No active development**: The AUTOMATIC1111 project is no longer actively maintained
## Sampler Settings
AUTOMATIC1111 provides control over the generation process:
- **Steps**: More steps generally produce higher quality images but take longer. Typical values range from 20-50 steps, with 40 being a good default.
- **Sampling Method**: Different samplers produce different results. Popular options include:
- **DPM++ 2M**: Fast and high quality (default)
- **Euler a**: Fast, good for quick iterations
- **DPM++ SDE**: Variant with different characteristics
- **Schedule Type**: Controls the noise schedule used during sampling. "Automatic" is typically the best choice.
- **CFG Scale**: Controls how closely the model follows your prompt. Lower values (1-7) allow more creative freedom, higher values (7-15) stick closer to the prompt.
## Prompt Formatting
AUTOMATIC1111 uses **Keywords** prompt formatting by default. This means prompts are formatted as keyword lists optimized for Stable Diffusion models. You can switch to **Descriptive** formatting if you prefer natural language descriptions, though Keywords typically work better with SD models.
## Automatic Setup with KoboldCpp
If you're using KoboldCpp with AUTOMATIC1111 support, Talemate can automatically detect and configure the AUTOMATIC1111 backend when "Automatic Setup" is enabled in the Visualizer settings. This will automatically set the API URL to match your KoboldCpp instance URL.

View File

@@ -0,0 +1,166 @@
# ComfyUI
## Prepare ComfyUI
This document assumes you have installed ComfyUI (either the portable or the desktop version).
Copy the .bat file you use to start ComfyUI and add the `--port` parameter.
```
--port 8188
```
You can put any port you want, but this example will use 8188.
!!! note "If you are using a remote ComfyUI instance"
If you are using a remote ComfyUI instance, you may want to add the `--listen` parameter as well.
```
--listen 0.0.0.0
```
You will then also need to obtain the IP address of the computer running ComfyUI and use it in the Talemate configuration. (instead of localhost)
Confirm ComfyUI is running in your browser by visiting http://localhost:8188 or `http://<ip-address>:8188` before proceeding to talemate.
## Talemate configuration
In the Visualizer agent settings, select ComfyUI as your backend for text-to-image generation, image editing, or both. You'll need to configure each backend separately if you want to use ComfyUI for different operations.
![The image displays the General settings tab of the Visualizer interface, featuring a sidebar with active indicators for ComfyUI and Google modules. Dropdown menus in the main panel show ComfyUI selected for text-to-image and image editing backends, with Google selected for image analysis. The interface also includes an image generation timeout slider set to 301 and an enabled Automatic Setup checkbox.](/talemate/img/0.34.0/visual-agent-comfyui-1.png)
### Text-to-Image Configuration
For text-to-image generation, configure the following settings:
- **API URL**: The URL where your ComfyUI instance is running (e.g., `http://localhost:8188`)
- **Workflow**: Select the workflow file to use for generation. Talemate includes several pre-configured workflows including `qwen_image.json` and `z_image_turbo.json`
- **Model**: Select the model to use from your ComfyUI models directory. If your workflow doesn't include a "Talemate Load Model" or "Talemate Load Checkpoint" node, this will be set to "- Workflow default -" and the model specified in the workflow file will be used.
- **Prompt Type**: Choose between "Keywords" or "Descriptive" prompt formatting
!!! tip "Choosing Prompt Type"
As a general rule: **SDXL models** typically work best with **Keywords** formatting, while most other models (including Qwen Image, Flux, etc.) work better with **Descriptive** formatting. If you're unsure, start with Descriptive and switch to Keywords if you're using an SDXL-based workflow.
- **Resolutions**: Configure the pixel dimensions for Square, Portrait, and Landscape formats
![A screenshot of the "Visualizer" application's dark-mode settings panel specifically for ComfyUI text-to-image generation. The interface features configuration fields for the API URL, a workflow dropdown set to "z_image_turbo.json," model selection, and a "Descriptive" prompting type. The lower section includes adjustable numeric inputs for defining pixel dimensions for Square, Portrait, and Landscape image resolutions.](/talemate/img/0.34.0/visual-agent-comfyui-2.png)
![This screenshot displays the dark-themed settings interface of an application named "Visualizer," specifically configured for ComfyUI text-to-image generation. The main panel features input fields for the API URL, workflow selection (set to default-sdxl), and model choice (juggernautXL), along with a prompting type setting. Below these options is a "Resolutions" section allowing users to define specific pixel dimensions for Square, Portrait, and Landscape image outputs.](/talemate/img/0.34.0/visual-agent-comfyui-3.png)
### Image Editing Configuration
For image editing, configure similar settings but select an image editing workflow such as `qwen_image_edit.json`. The number of reference images supported depends on your model - for example, Qwen Image Edit can handle up to 3 reference images that can be used to guide the editing process.
!!! note "Prompt Type for Image Editing"
Image editing workflows typically use **Descriptive** prompt formatting by default, as most image editing models (like Qwen Image Edit) work better with descriptive instructions rather than keyword-based prompts.
![A screenshot of the "Visualizer" application settings interface, specifically showing the configuration panel for "ComfyUI Image Editing." The main view displays input fields for the API URL, a selected workflow file named "qwen_image_edit.json," descriptive prompting settings, and resolution presets for square, portrait, and landscape aspect ratios.](/talemate/img/0.34.0/visual-agent-comfyui-4.png)
![This screenshot shows a browser tab group labeled "Visualizer" marked with a green status dot on a dark background. The group contains four tabs: a Google link, two green-tinted ComfyUI tabs with image and pencil icons, and a gray tab titled "References 3".](/talemate/img/0.34.0/visual-agent-comfyui-5.png)
## Custom workflow creation
Talemate comes with pre-configured workflows for Qwen Image models (`qwen_image.json` for text-to-image and `qwen_image_edit.json` for image editing). However, since there are many variables in ComfyUI setups (different model formats like GGUF vs safetensors, custom LoRAs, different hardware configurations, etc.), you may want to customize these workflows to match your specific setup.
### Starting from a Template
Open ComfyUI in your browser and navigate to the templates menu. ComfyUI includes workflow templates that you can use as a starting point:
- **Qwen Image**: For text-to-image generation
- **Qwen Image Edit**: For image editing workflows
These templates provide a good foundation for creating custom workflows.
![A dark-themed dropdown menu from a software interface is shown, featuring a header labeled "image_qwen_image." The menu lists standard options such as New, File, Edit, View, and Theme, followed by specific actions like Browse Templates, Settings, Manage Extensions, and Help.](/talemate/img/0.34.0/comfyui.workflow.setup.browse-templates.png)
![A product card for the "Qwen-Image Text to Image" AI model, displaying a sample generation of a rainy, neon-lit street scene with vibrant pink and blue signage. The image demonstrates the model's capabilities by clearly rendering complex multilingual text, such as Chinese characters and English words like "HAPPY HAIR," on the storefronts. Below the visual, a brief description highlights the tool's exceptional text rendering and editing features.](/talemate/img/0.34.0/comfyui.workflow.setup.qwen-template.png)
Load the Qwen Image template to see the base workflow structure.
![A screenshot of a ComfyUI workflow designed for the Qwen-Image diffusion model, featuring grouped nodes for model loading, image sizing, and text prompting. The interface includes detailed instructional notes regarding VRAM usage on an RTX 4090D, model storage locations, and optimal KSampler settings. A positive prompt node is visible containing a detailed description of a neon-lit Hong Kong street scene.](/talemate/img/0.34.0/comfyui.workflow.setup.qwen-start.png)
### Naming Nodes for Talemate
For Talemate to properly interact with your workflow, you need to rename specific nodes with exact titles. These titles allow Talemate to inject prompts, set resolutions, and handle reference images automatically.
**Required Node Titles:**
1. **Talemate Positive Prompt**: The node that encodes the positive prompt (typically a `CLIPTextEncode` or `TextEncodeQwenImageEditPlus` node). This is required - workflows without this node will fail validation.
2. **Talemate Negative Prompt**: The node that encodes the negative prompt (same node types as above)
3. **Talemate Resolution**: The node that sets the image dimensions (typically an `EmptySD3LatentImage` or similar latent image node)
**Optional Node Titles:**
- **Talemate Load Model** or **Talemate Load Checkpoint**: If you want to allow model selection from Talemate's settings, rename your model loader node (typically `CheckpointLoaderSimple`, `UNETLoader`, or `UnetLoaderGGUF`) to one of these titles. If this node is not present, Talemate will use the model specified in the workflow file itself, and the model dropdown will show "- Workflow default -" as the only option.
To rename a node, right-click on it and select "Rename" or double-click the node title, then enter the exact title name.
![A screenshot of a node-based interface labeled "Step 3 - Prompt," featuring a green "Talemate Positive Prompt" node containing a detailed text description of a vibrant, neon-lit Hong Kong street scene. The text specifies a 1980s cinematic atmosphere and lists numerous specific shop signs in both Chinese and English. Below it, a dark red "Talemate Negative Prompt" node is visible but currently contains no text.](/talemate/img/0.34.0/comfyui.workflow.setup.talemate-prompts.png)
![This image displays a dark green interface node labeled "Talemate Positive Prompt," typical of a node-based editor like ComfyUI. It features a yellow input connection point for "clip" on the left, an orange output point for "CONDITIONING" on the right, and a large, dark text entry field in the center containing the placeholder word "text".](/talemate/img/0.34.0/comfyui.workflow.setup.talemate-empty-prompt.png)
![A screenshot of a dark gray interface node labeled "Talemate Resolution" with the identifier #58. It features configurable fields for width and height, both set to 1328, and a batch size of 1. The node has a single output connection point labeled "LATENT".](/talemate/img/0.34.0/comfyui.workflow.setup.talemate-resulotion.png)
### Activating the Lightning LoRA (Optional)
The Qwen Image template includes a Lightning LoRA node that is deactivated by default. You can optionally activate it to speed up generation with fewer steps. Note that this is a trade-off: the Lightning LoRA reduces generation time but may degrade image quality compared to using more steps without the LoRA.
To activate the Lightning LoRA:
1. Find the `LoraLoaderModelOnly` node in your workflow (it should already be present in the Qwen template)
2. Connect it between your model loader and sampler if it's not already connected
3. Load the appropriate Lightning LoRA file (e.g., `Qwen-Image-Lightning-8steps-V1.0.safetensors` for 8-step generation)
4. Adjust your sampler settings:
- **Steps**: Reduce to 8 steps (or 4 steps for the 4-step variant)
- **CFG Scale**: Set to 1.0 (lower than typical values)
![This screenshot features a "LoraLoaderModelOnly" node within a ComfyUI workflow, customized with the label "Lightx2v 8steps LoRA". It shows the selection of a "Qwen-Image-Lightning-8steps" LoRA file with a model strength parameter set to 1.00. Purple connection cables are visible attached to the input and output model terminals.](/talemate/img/0.34.0/comfyui.workflow.setup.lighting-lora.png)
![The image shows a close-up of a dark user interface panel containing two adjustable setting fields. The top field is labeled "steps" and displays a value of 8, flanked by left and right directional arrows. Below that, a second field labeled "cfg" shows a value of 1.0, also with adjustment arrows on either side.](/talemate/img/0.34.0/comfyui.workflow.setup.lighting-lora-sampler-changes.png)
### Image Editing Workflows: Reference Nodes
For image editing workflows (like `qwen_image_edit.json`), you need to add reference image nodes. Note that ComfyUI includes a Qwen Image Edit template similar to the Qwen Image template, which you can use as a starting point.
!!! warning "Reference Nodes Required"
Image editing workflows **must** define at least one reference node. If your workflow doesn't include any nodes titled "Talemate Reference 1" (or higher), the backend status will show an error and image editing will not work.
These are `LoadImage` nodes that Talemate will use to inject reference images for editing.
The number of reference nodes you can add depends on your model's capabilities. For example, Qwen Image Edit supports up to 3 reference images. Add `LoadImage` nodes and rename them with these exact titles:
- **Talemate Reference 1**
- **Talemate Reference 2**
- **Talemate Reference 3** (if your model supports it)
These nodes should be connected to your prompt encoding nodes (for Qwen Image Edit, use `TextEncodeQwenImageEditPlus` nodes that accept image inputs).
![Three identical interface nodes labeled "Talemate Reference 1," "2," and "3" are arranged horizontally within a dark-themed node-based editor. Each node features output ports for "IMAGE" and "MASK," along with a file selection field showing "image_qwen_image_edit" and a "choose file to upload" button. Blue and red connection wires link these nodes to other off-screen elements in the workflow.](/talemate/img/0.34.0/comfyui.workflow.setup.talemate-references.png)
### Saving and Exporting the Workflow
Once your workflow is configured, you need to save it and export it in the API format for Talemate to use it.
1. **Save the workflow**: Use File → Save As to save your workflow as a `.json` file in your ComfyUI workflows directory
2. **Export for API**: Use File → Export (API) to create the API-compatible version
!!! warning "Export vs Export (API)"
It's critical to use **"Export (API)"** and not just "Export". The regular export format is not compatible with Talemate's API integration. The API export format includes the necessary metadata and structure that Talemate expects.
![A screenshot of a dark-themed software interface menu with the "File" option selected, revealing a nested sub-menu. The sub-menu lists file management commands, with the "Save As" option highlighted among choices like Open, Save, and Export.](/talemate/img/0.34.0/comfyui.workflow.setup.qwen-save.png)
![This image displays a dark-themed user interface menu, likely from ComfyUI, with the "File" category expanded. A submenu lists options including Open, Save, and Save As, while the "Export (API)" option is currently highlighted at the bottom. This visual illustrates how to locate the API export function within the software's file management system.](/talemate/img/0.34.0/comfyui.workflow.setup.qwen-export.png)
After exporting, place the workflow JSON file in Talemate's `templates/comfyui-workflows` directory. Once placed there, it will automatically appear in the workflow dropdown in Talemate's ComfyUI settings.
!!! note "Workflow File Location"
Workflow files must be placed in Talemate's `templates/comfyui-workflows` directory, not ComfyUI's workflows directory. Talemate loads workflows from its own templates directory to ensure compatibility and proper integration.
!!! tip "Workflow Not Appearing?"
If your workflow file doesn't appear in the agent's settings dropdown after placing it in the correct directory, try reloading the Talemate browser window. The workflow list is refreshed when the page loads.
!!! info "Hot-Reloading Workflows"
Changes to workflow files are automatically detected and reloaded by the agent. After modifying a workflow file, your changes will be applied to the next image generation without needing to restart Talemate or reload the browser window.

View File

@@ -0,0 +1,101 @@
# Google
The Google backend provides image generation, editing, and analysis capabilities using Google's Gemini image models. It supports text-to-image generation, image editing with reference images, and AI-powered image analysis.
![A screenshot of the "Visualizer" application settings interface with the "General" tab selected. It shows configuration dropdowns for Client and various Backends (text to image, image editing, image analysis) all set to "Google," alongside an image generation timeout slider positioned at 301. Additional settings include a checked "Automatic Setup" box, an unchecked "Automatic Generation" box, and a "Fallback Prompt Type" menu set to "Keywords."](/talemate/img/0.34.0/visual-agent-google-4.png)
## Prerequisites
Before configuring the Google backend, you need to obtain a Google API key:
1. Go to [Google AI Studio](https://aistudio.google.com/app/apikey)
2. Sign in with your Google account
3. Create a new API key or use an existing one
4. Copy the API key
Then configure it in Talemate:
1. Open Talemate Settings → Application → Google
2. Paste your Google API key in the "Google API Key" field
3. Save your changes
!!! note "API Key vs Vertex AI Credentials"
The Visualizer agent uses the Google API key (not Vertex AI service account credentials). Make sure you're using the API key from Google AI Studio, not the service account JSON file used for Vertex AI.
## Configuration
In the Visualizer agent settings, select Google as your backend for text-to-image generation, image editing, image analysis, or any combination of these. Each operation can be configured separately.
### Text-to-Image Configuration
For text-to-image generation, configure the following settings:
- **Google API Key**: Your Google API key (configured globally in Talemate Settings)
- **Model**: Select the image generation model to use:
- **gemini-2.5-flash-image**: Faster generation, good quality
- **gemini-3-pro-image-preview**: Higher quality, slower generation
![A dark-themed settings interface for a "Visualizer" application, specifically showing the "Google Text to Image" configuration panel. The main view displays a masked input field for a configured Google API Key and a dropdown menu selecting the "gemini-3-pro-image-preview" model.](/talemate/img/0.34.0/visual-agent-google-5.png)
The Google backend automatically handles aspect ratios based on the format you select:
- **Landscape**: 16:9 aspect ratio
- **Portrait**: 9:16 aspect ratio
- **Square**: 1:1 aspect ratio
### Image Editing Configuration
For image editing, configure similar settings but with an additional option:
- **Google API Key**: Your Google API key
- **Model**: Select the image generation model (same options as text-to-image)
- **Max References**: Configure the maximum number of reference images (1-3). This determines how many reference images you can provide when editing an image.
![A dark-themed configuration interface for the "Visualizer" application displaying settings for the "Google Image Editing" tab. The panel features a configured Google API key section and a dropdown menu selecting the "gemini-3-pro-image-preview" model. A slider control at the bottom sets the "Max References" value to 3.](/talemate/img/0.34.0/visual-agent-google-6.png)
!!! note "Reference Images"
Google's image editing models can use up to 3 reference images to guide the editing process. The "Max References" setting controls how many reference images Talemate will send to the API. You can adjust this based on your needs, but keep in mind that more references may provide better context for complex edits.
### Image Analysis Configuration
For image analysis, configure the following:
- **Google API Key**: Your Google API key
- **Model**: Select a vision-capable text model:
- **gemini-2.5-flash**: Fast analysis, good for general use
- **gemini-2.5-pro**: Higher quality analysis
- **gemini-3-pro-preview**: Latest model with improved capabilities
!!! note "Analysis Models"
Image analysis uses text models that support vision capabilities, not the image generation models. These models can analyze images and provide detailed descriptions, answer questions about image content, and extract information from visual content.
## Usage
Once configured, the Google backend will appear in the Visualizer agent status with green indicators showing which capabilities are available.
![A dark-themed user interface panel titled "Visualizer" marked with a green status indicator. Below the title are several clickable buttons, including a "References 3" button and four "Google" buttons distinguished by icons representing screen, image, edit, and search functions.](/talemate/img/0.34.0/visual-agent-google-8.png)
The status indicators show:
- **Text to Image**: Available when text-to-image backend is configured
- **Image Edit**: Available when image editing backend is configured (shows max references if configured)
- **Image Analysis**: Available when image analysis backend is configured
## Model Recommendations
### Text-to-Image and Image Editing
- **gemini-2.5-flash-image**: Best for faster generation and general use. Good balance of speed and quality.
- **gemini-3-pro-image-preview**: Best for higher quality results when speed is less important. Use when you need the best possible image quality.
### Image Analysis
- **gemini-2.5-flash**: Best for quick analysis and general use cases. Fast responses with good accuracy.
- **gemini-2.5-pro**: Best for detailed analysis requiring higher accuracy and more nuanced understanding.
- **gemini-3-pro-preview**: Best for the latest capabilities and most advanced analysis features.
## Prompt Formatting
The Google backend uses **Descriptive** prompt formatting by default. This means prompts are formatted as natural language descriptions rather than keyword lists. This works well with Google's Gemini models, which are designed to understand natural language instructions.
When generating images, provide detailed descriptions of what you want to create. For image editing, describe the changes you want to make in natural language.

View File

@@ -0,0 +1,121 @@
# OpenAI
The OpenAI backend provides image generation, editing, and analysis capabilities using OpenAI's image models. It supports text-to-image generation with DALL·E 3 and GPT-Image models, image editing with GPT-Image models, and AI-powered image analysis using vision-capable GPT models.
![The image displays the "General" settings tab of the "Visualizer" interface, featuring a dark-themed layout with a sidebar menu on the left. The main panel includes dropdown menus where "Google" is selected as the client and "OpenAI" is chosen for text-to-image, image editing, and image analysis backends. Additional controls show an image generation timeout slider set to 301, checkboxes for automatic setup and generation, and a selector for the fallback prompt type.](/talemate/img/0.34.0/visual-agent-openai-1.png)
## Prerequisites
Before configuring the OpenAI backend, you need to obtain an OpenAI API key:
1. Go to [OpenAI Platform](https://platform.openai.com/api-keys)
2. Sign in with your OpenAI account
3. Create a new API key or use an existing one
4. Copy the API key
Then configure it in Talemate:
1. Open Talemate Settings → Application → OpenAI API
2. Paste your OpenAI API key in the "OpenAI API Key" field
3. Save your changes
For additional instructions, see the [OpenAI API setup guide](/talemate/user-guide/apis/openai/).
## Configuration
In the Visualizer agent settings, select OpenAI as your backend for text-to-image generation, image editing, image analysis, or any combination of these. Each operation can be configured separately.
### Text-to-Image Configuration
For text-to-image generation, configure the following settings:
- **OpenAI API Key**: Your OpenAI API key (configured globally in Talemate Settings)
- **Model**: Select the image generation model to use:
- **dall-e-3**: OpenAI's DALL·E 3 model (widely available)
- **gpt-image-1**: OpenAI's GPT-Image model (may require organization verification)
- **gpt-image-1-mini**: Smaller version of GPT-Image (may require organization verification)
![A screenshot of the "Visualizer" application settings interface with the "OpenAI Text to Image" tab selected on the left sidebar. The main panel displays a masked input field for a configured OpenAI API key and a dropdown menu set to the "dall-e-3" model.](/talemate/img/0.34.0/visual-agent-openai-2.png)
!!! warning "Organization Verification"
The **gpt-image-1** and **gpt-image-1-mini** models may require your OpenAI organization to be verified before you can use them. If you encounter errors with these models, you may need to complete OpenAI's organization verification process.
!!! note "Model Testing Status"
Talemate's organization is not verified with OpenAI, and we have not tested the **gpt-image-1** and **gpt-image-1-mini** models. We have confirmed that **dall-e-3** works correctly. If you have access to the GPT-Image models and encounter issues, please report them so we can improve support for these models.
The OpenAI backend automatically sets resolution based on the format and model you select:
- **gpt-image-1** and **gpt-image-1-mini**:
- Landscape: 1536x1024
- Portrait: 1024x1536
- Square: 1024x1024
- **dall-e-3**:
- Landscape: 1792x1024
- Portrait: 1024x1792
- Square: 1024x1024
### Image Editing Configuration
For image editing, configure similar settings but note that DALL·E 3 does not support image editing:
- **OpenAI API Key**: Your OpenAI API key
- **Model**: Select an image editing model:
- **gpt-image-1**: Full-featured image editing model (may require organization verification)
- **gpt-image-1-mini**: Smaller image editing model (may require organization verification)
![This screenshot displays the settings interface for an application called "Visualizer," specifically showing the "OpenAI Image Editing" configuration panel. The right side features a dropdown menu for selecting the model "gpt-image-1" beneath a configured API key section. An orange notification box at the bottom alerts the user that this specific model may require OpenAI organization verification.](/talemate/img/0.34.0/visual-agent-openai-3.png)
!!! warning "DALL·E 3 Limitations"
DALL·E 3 does not support image editing. If you select DALL·E 3 for image editing, you will receive an error. Use **gpt-image-1** or **gpt-image-1-mini** for image editing instead.
!!! note "Reference Images"
OpenAI's image editing models support a single reference image. When editing an image, provide one reference image that will be used as the base for the edit.
### Image Analysis Configuration
For image analysis, configure the following:
- **OpenAI API Key**: Your OpenAI API key
- **Model**: Select a vision-capable text model:
- **gpt-4.1-mini**: Fast analysis model with vision capabilities
- **gpt-4o-mini**: Alternative vision model option
![This image shows the settings interface for an application named Visualizer, with the "OpenAI Image Analysis" tab selected on the left sidebar. The main panel allows users to configure the OpenAI vision API, displaying a confirmed API key status. A dropdown menu below specifically indicates that the "gpt-4.1-mini" model is selected.](/talemate/img/0.34.0/visual-agent-openai-4.png)
!!! note "Analysis Models"
Image analysis uses text models that support vision capabilities, not the image generation models. These models can analyze images and provide detailed descriptions, answer questions about image content, and extract information from visual content.
## Usage
Once configured, the OpenAI backend will appear in the Visualizer agent status with green indicators showing which capabilities are available.
![This image captures a dark-mode user interface section titled "Visualizer," marked by an active green status dot. Below the title, there are several pill-shaped tags or buttons representing data sources, including "Google," "References 1," and three distinct "OpenAI" options. The OpenAI buttons are highlighted in green, distinguishing them from the greyed-out Google and References buttons.](/talemate/img/0.34.0/visual-agent-openai-5.png)
The status indicators show:
- **Text to Image**: Available when text-to-image backend is configured
- **Image Edit**: Available when image editing backend is configured (shows "References 1" indicating single reference support)
- **Image Analysis**: Available when image analysis backend is configured
## Model Recommendations
### Text-to-Image
- **dall-e-3**: Most widely available option. Good for general use, though quality may vary.
- **gpt-image-1**: Higher quality option, but requires organization verification. Use if you have access and need better results.
- **gpt-image-1-mini**: Smaller version of GPT-Image, faster generation. Requires organization verification.
### Image Editing
- **gpt-image-1**: Best quality for image editing. Requires organization verification.
- **gpt-image-1-mini**: Faster editing option. Requires organization verification.
### Image Analysis
- **gpt-4.1-mini**: Recommended default for image analysis. Fast and accurate.
- **gpt-4o-mini**: Alternative option if you prefer this model.
## Prompt Formatting
The OpenAI backend uses **Descriptive** prompt formatting by default. This means prompts are formatted as natural language descriptions rather than keyword lists. Provide detailed, natural language descriptions of what you want to create or edit.

View File

@@ -0,0 +1,119 @@
# OpenRouter
The OpenRouter backend provides access to image generation, editing, and analysis capabilities through OpenRouter's unified API. OpenRouter allows you to access multiple AI providers through a single API, giving you flexibility to choose from various models and providers.
![A dark-themed settings interface for the "Visualizer" application, displaying a sidebar with General, OpenRouter, and Styles navigation options. The main panel allows configuration of backend services, showing "OpenRouter" selected for text-to-image, image editing, and image analysis, with "Google" set as the client. Additional controls include a slider for image generation timeout set to 301 and checkboxes for automatic setup and generation.](/talemate/img/0.34.0/visual-agent-openrouter-1.png)
## Prerequisites
Before configuring the OpenRouter backend, you need to obtain an OpenRouter API key:
1. Go to [OpenRouter Keys](https://openrouter.ai/settings/keys)
2. Sign in with your account
3. Create a new API key or use an existing one
4. Copy the API key
Then configure it in Talemate:
1. Open Talemate Settings → Application → OpenRouter API
2. Paste your OpenRouter API key in the "OpenRouter API Key" field
3. Save your changes
For additional instructions, see the [OpenRouter API setup guide](/talemate/user-guide/apis/openrouter/).
## Configuration
In the Visualizer agent settings, select OpenRouter as your backend for text-to-image generation, image editing, image analysis, or any combination of these. Each operation can be configured separately.
### Text-to-Image Configuration
For text-to-image generation, configure the following settings:
- **OpenRouter API Key**: Your OpenRouter API key (configured globally in Talemate Settings)
- **Model**: Select an image generation model from OpenRouter. The model list is dynamically populated based on models available through your OpenRouter account.
- **Only use these providers**: Optionally filter to specific providers (e.g., only use Google or OpenAI)
- **Ignore these providers**: Optionally exclude specific providers from consideration
![This screenshot depicts the "Visualizer" settings interface, specifically the "OpenRouter Text to Image" configuration tab. The panel displays an active API Key section, a model selection dropdown currently set to "google/gemini-2.5-flash-image", and additional options to filter specific service providers.](/talemate/img/0.34.0/visual-agent-openrouter-2.png)
!!! warning "Model Selection"
There is no reliable way for Talemate to determine which models support text-to-image generation, so the model list is unfiltered. Please consult the [OpenRouter documentation](https://openrouter.ai/docs) to verify that your selected model supports image generation before using it.
The OpenRouter backend automatically handles aspect ratios based on the format you select:
- **Landscape**: 16:9 aspect ratio
- **Portrait**: 9:16 aspect ratio
- **Square**: 1:1 aspect ratio
### Image Editing Configuration
For image editing, configure similar settings with an additional option:
- **OpenRouter API Key**: Your OpenRouter API key
- **Model**: Select an image editing model from OpenRouter
- **Max References**: Configure the maximum number of reference images (1-3). This determines how many reference images you can provide when editing an image.
- **Provider filtering**: Optionally filter providers (same as text-to-image)
![This screenshot displays the settings interface for an application named Visualizer, specifically focusing on the "OpenRouter - Image Editing" configuration tab. The main panel features input fields for an OpenRouter API key, a model selection dropdown set to "google/gemini-2.5-flash-image," and provider filtering options. Additionally, a slider at the bottom allows users to adjust the "Max References," which is currently set to 1.](/talemate/img/0.34.0/visual-agent-openrouter-3.png)
!!! warning "Model Selection"
There is no reliable way for Talemate to determine which models support image editing, so the model list is unfiltered. Image editing refers to image generation with support for 1 or more contextual reference images. Please consult the [OpenRouter documentation](https://openrouter.ai/docs) to verify that your selected model supports image editing before using it.
### Image Analysis Configuration
For image analysis, configure the following:
- **OpenRouter API Key**: Your OpenRouter API key
- **Model**: Select a vision-capable text model from OpenRouter
- **Provider filtering**: Optionally filter providers
![A screenshot of the "Visualizer" application interface showing the "OpenRouter Image Analysis" settings panel. The configuration area displays a model selection dropdown set to "google/gemini-2.5-flash" alongside a configured API key field. An informational box notes that the model list is unfiltered and users should verify that their chosen text generation model supports multi-modal vision capabilities.](/talemate/img/0.34.0/visual-agent-openrouter-4.png)
!!! warning "Model Selection"
There is no reliable way for Talemate to determine which models support image analysis, so the model list is unfiltered. Image analysis requires a text generation model that is multi-modal and supports vision capabilities. Please consult the [OpenRouter documentation](https://openrouter.ai/docs) to verify that your selected model supports vision before using it.
## Usage
Once configured, the OpenRouter backend will appear in the Visualizer agent status with green indicators showing which capabilities are available.
![A dark-mode user interface panel labeled "Visualizer" features a green status indicator dot next to the title. Below the header are several pill-shaped tags, including grey buttons for "Google" and "References 1" alongside three green "OpenRouter" buttons with various icons. This layout likely represents a configuration of active tools or API connections within a software application.](/talemate/img/0.34.0/visual-agent-openrouter-5.png)
The status indicators show:
- **Text to Image**: Available when text-to-image backend is configured
- **Image Edit**: Available when image editing backend is configured (shows max references if configured)
- **Image Analysis**: Available when image analysis backend is configured
## Model Recommendations
OpenRouter provides access to many models from different providers. Here are some general recommendations:
### Text-to-Image and Image Editing
- **google/gemini-2.5-flash-image**: Fast image generation with good quality
- **google/gemini-3-pro-image-preview**: Higher quality option (if available)
### Image Analysis
- **google/gemini-2.5-flash**: Fast analysis with good accuracy
- **google/gemini-2.5-pro**: Higher quality analysis
- **google/gemini-3-pro-preview**: Latest capabilities (if available)
## Provider Filtering
OpenRouter allows you to filter which providers are used for a specific model. This can be useful if:
- You want to use a specific provider for cost or quality reasons
- You want to avoid certain providers
- You want to test different providers for the same model
You can configure provider filtering in each backend's settings:
- **Only use these providers**: Limits requests to only the selected providers
- **Ignore these providers**: Excludes the selected providers from consideration
If both are configured, "Only use these providers" takes precedence.
## Prompt Formatting
The OpenRouter backend uses **Descriptive** prompt formatting by default. This means prompts are formatted as natural language descriptions rather than keyword lists. Provide detailed, natural language descriptions of what you want to create or edit.

View File

@@ -0,0 +1,104 @@
# SD.Next
The SD.Next backend provides image generation and editing capabilities using Stable Diffusion Next (SD.Next), a fork of AUTOMATIC1111's Stable Diffusion WebUI. SD.Next offers improved performance and additional features while maintaining compatibility with the AUTOMATIC1111 API.
![This screenshot displays the "General" settings menu of the "Visualizer" interface, featuring a dark theme with purple accents. Configuration options show "Google" selected as the client, with "SD.Next" set as the backend for both text-to-image and image editing tasks. The panel also includes an image generation timeout slider set to 301, a checked "Automatic Setup" box, and a "Fallback Prompt Type" dropdown set to Keywords.](/talemate/img/0.34.0/visual-agent-sdnext-1.png)
## Prerequisites
Before configuring the SD.Next backend, you need to have SD.Next installed and running. SD.Next can be run locally or accessed remotely via its API.
1. Install and start SD.Next on your system
2. Ensure the API is enabled and accessible
3. Note the API URL (default is `http://localhost:7860`)
## Configuration
In the Visualizer agent settings, select SD.Next as your backend for text-to-image generation, image editing, or both. You'll need to configure each backend separately if you want to use SD.Next for different operations.
### Text-to-Image Configuration
For text-to-image generation, configure the following settings:
- **API URL**: The URL where your SD.Next instance is running (e.g., `http://localhost:7860`)
- **Authentication Method**: Choose the authentication method:
- **None**: No authentication required
- **Basic (username/password)**: Use username and password authentication
- **Bearer (API Key)**: Use API key authentication
!!! note "ArliAI SD.Next Endpoints"
If you're connecting to ArliAI's SD.Next endpoints, you should use **Bearer (API Key)** authentication method. Configure your API key in the authentication settings.
- **Username/Password** (if using Basic auth): Your SD.Next credentials
- **API Key** (if using Bearer auth): Your API key for SD.Next
- **Steps**: Number of sampling steps (default: 40, range: 5-150)
- **Sampling Method**: The sampling algorithm to use (dynamically populated from your SD.Next instance)
- **CFG Scale**: Classifier-free guidance scale (default: 7.0, range: 1-30)
- **Model**: Select the model to use from your SD.Next models directory (dynamically populated)
- **Prompt Type**: Choose between "Keywords" or "Descriptive" prompt formatting
- **Resolutions**: Configure the pixel dimensions for Square, Portrait, and Landscape formats
![This screenshot displays the "Visualizer" interface for SD.Next text-to-image generation, featuring configuration settings for the API URL, 40 sampling steps, DPM++ 2M method, and a CFG scale of 7.0. The panel also shows the selected model as "juggernautXL_juggXlByRundiffusion" along with adjustable resolution presets for square, portrait, and landscape formats.](/talemate/img/0.34.0/visual-agent-sdnext-2.png)
![This screenshot displays a dark-themed user interface form with a dropdown menu labeled "Authentication Method" selected to "Basic (username/password)". Below the menu are two text input fields for entering a "Username" and "Password" under a section header labeled "AUTHENTICATION (OPTIONAL, SERVER DEPENDENT)".](/talemate/img/0.34.0/visual-agent-sdnext-3.png)
![This image displays a dark-themed user interface configuration panel for setting up authentication parameters. It features a dropdown menu where "Bearer (API Key)" is selected as the Authentication Method, positioned above a text input field labeled "API Key".](/talemate/img/0.34.0/visual-agent-sdnext-4.png)
### Image Editing Configuration
For image editing, configure similar settings. SD.Next supports image editing through its img2img API, which uses a single reference image.
![User interface for the "Visualizer" software displaying the "Image editing configuration for SD.Next" panel with a dark theme. It features adjustable sliders for Steps (set to 40) and CFG Scale (set to 7.0), alongside dropdown menus for the Sampling Method and Model selection. The bottom section includes input fields for defining specific pixel dimensions for square, portrait, and landscape image resolutions.](/talemate/img/0.34.0/visual-agent-sdnext-5.png)
!!! note "Reference Images"
SD.Next image editing supports a single reference image. When editing an image, provide one reference image that will be used as the base for the edit.
## Usage
Once configured, the SD.Next backend will appear in the Visualizer agent status with green indicators showing which capabilities are available.
![A dark mode user interface section titled "Visualizer," indicated by a green status dot. Below the header is a row of four buttons: "Google," two distinct "SD.Next" buttons with image and pencil icons respectively, and a "References 1" button.](/talemate/img/0.34.0/visual-agent-sdnext-6.png)
The status indicators show:
- **Text to Image**: Available when text-to-image backend is configured
- **Image Edit**: Available when image editing backend is configured (shows "References 1" indicating single reference support)
## Model and Sampler Selection
SD.Next dynamically fetches the list of available models and samplers from your instance when you configure the backend. This means:
- **Models**: The model dropdown is automatically populated with models available in your SD.Next installation
- **Samplers**: The sampling method dropdown is automatically populated with samplers available in your SD.Next instance
If you change the API URL or authentication settings, Talemate will automatically refresh the model and sampler lists from the new instance.
!!! tip "Model Selection"
If you don't select a specific model, SD.Next will use its default model. You can select "- Default Model -" from the dropdown to explicitly use the default, or leave the field empty.
## Sampler Settings
SD.Next provides extensive control over the generation process:
- **Steps**: More steps generally produce higher quality images but take longer. Typical values range from 20-50 steps, with 40 being a good default.
- **Sampling Method**: Different samplers produce different results. Popular options include:
- **DPM++ 2M**: Fast and high quality (default)
- **Euler a**: Fast, good for quick iterations
- **DPM++ 2M Karras**: Variant with different characteristics
- **CFG Scale**: Controls how closely the model follows your prompt. Lower values (1-7) allow more creative freedom, higher values (7-15) stick closer to the prompt.
## Prompt Formatting
SD.Next uses **Keywords** prompt formatting by default. This means prompts are formatted as keyword lists optimized for Stable Diffusion models. You can switch to **Descriptive** formatting if you prefer natural language descriptions, though Keywords typically work better with SD models.
## Remote Access
If you're running SD.Next on a remote server:
1. Configure SD.Next to listen on the appropriate network interface
2. Use the server's IP address or hostname in the API URL (e.g., `http://192.168.1.100:7860`)
3. Configure appropriate authentication if your SD.Next instance requires it
4. Ensure your firewall allows connections to the SD.Next port
!!! warning "Security Considerations"
If exposing SD.Next over a network, always use authentication. Unauthenticated SD.Next instances can be accessed by anyone on your network, which may pose security risks.

Some files were not shown because too many files have changed in this diff Show More