Compare commits

..

204 Commits

Author SHA1 Message Date
vegu-ai-tools
9722936af6 free up space before container build 2025-12-05 20:53:57 +02:00
vegu-ai-tools
ca1e49075a fix ffmpeg verify 2025-12-05 20:53:46 +02:00
vegu-ai-tools
3f6ec9328c attempt to free space for container build 2025-12-05 20:31:41 +02:00
vegu-ai-tools
7bd8da47fc fix comment 2025-12-05 20:15:00 +02:00
vegu-ai-tools
c036aeb5eb Enhance Dockerfile comments for FFmpeg installation process, clarifying the choice of BtbN builds and the reliability of the direct download approach. 2025-12-05 20:14:01 +02:00
vegu-ai-tools
e06ded8ff4 Update Dockerfile to include FFmpeg 8.0 installation with shared libraries and set LD_LIBRARY_PATH for runtime compatibility. 2025-12-05 19:59:41 +02:00
vegu-ai-tools
974786a2ef remove comment 2025-12-05 17:51:11 +02:00
vegu-ai-tools
19f30a5991 docs 2025-12-05 17:27:22 +02:00
vegu-ai-tools
82f72d2a0d Add SHA256 checksum verification for FFmpeg download in install-ffmpeg.bat to ensure file integrity. 2025-12-05 17:23:32 +02:00
vegu-ai-tools
735cdfee51 Add optional FFmpeg installation step in install.bat, providing warnings for failure or absence of the script. 2025-12-05 17:15:16 +02:00
vegu-ai-tools
868dd924d1 relock 2025-12-05 17:04:20 +02:00
vegu-ai-tools
bdbd028bc8 docs 2025-12-05 16:55:14 +02:00
vegu-ai-tools
61632464bd Update FFmpeg installation script to download version 8.0.1 from GitHub and use tar for extraction, simplifying the process and ensuring compatibility with Windows 10. 2025-12-05 16:53:02 +02:00
vegu-ai-tools
b10ff490bb Add FFmpeg installation script: create install-ffmpeg.bat to automate downloading, extracting, and verifying FFmpeg installation for Talemate. 2025-12-05 16:40:47 +02:00
vegu-ai-tools
20380e417b relock 2025-12-05 15:49:32 +02:00
vegu-ai-tools
f6353ced38 relock 2025-12-05 15:16:51 +02:00
vegu-ai-tools
7f6c8c7dd9 add torchcodec 2025-12-05 15:07:24 +02:00
vegu-ai-tools
acca2a0fcf linting 2025-11-30 14:18:10 +02:00
vegu-ai-tools
5d48cc06d4 fix chara cv3 spec support 2025-11-30 14:07:08 +02:00
vegu-ai-tools
3e206e5c06 linting 2025-11-29 21:27:53 +02:00
vegu-ai-tools
f455a11096 Add methods for caching model data in Backend class: implement _get_cache_data and _apply_cache_data to enhance data sharing between backend instances. 2025-11-29 21:24:07 +02:00
vegu-ai-tools
647e2f277c Add caching functionality to Backend class: introduce methods for retrieving and applying cached data after successful test connections, enhancing data management across backend instances. 2025-11-29 21:24:02 +02:00
vegu-ai-tools
8cbecc67f2 Enhance sdnext_emit_status method to update model choices based on active handlers, improving backend responsiveness during processing. 2025-11-29 21:16:04 +02:00
vegu-ai-tools
91a7df67b1 Add model management in Backend class: initialize models list, update on_status_change to refresh model choices, and modify sdnext_update_model_choices to accept backend parameter for improved flexibility. 2025-11-29 21:08:00 +02:00
vegu-ai-tools
1d032e3309 Implement updateChoicesOnly method in AgentModal to preserve unsaved user values when updating agent choices. Update AIAgent to call this method when the modal is open and the current agent is being updated. 2025-11-29 21:07:50 +02:00
vegu-ai-tools
87089fad46 Add on_status_change method to Backend class for status updates, triggering visual agent notifications on status changes. 2025-11-29 21:07:29 +02:00
vegu-ai-tools
206939559c Update status icon fallback in VisualLibrary component to use 'mdi-minus-circle' for undefined statuses, improving visual feedback for users. 2025-11-29 19:05:42 +02:00
vegu-ai-tools
1757070bb9 Refactor API URL handling in visual backends to use a new utility function for normalization, ensuring consistent URL formatting across automatic1111, comfyui, and sdnext backends. 2025-11-29 19:03:09 +02:00
vegu-ai-tools
49689bcf00 Add visual and voice library documentation 2025-11-29 18:04:25 +02:00
vegu-ai-tools
c113a34a35 docs 2025-11-29 18:02:31 +02:00
vegu-ai-tools
6c5b95d501 docs 2025-11-29 17:52:34 +02:00
vegu-ai-tools
532a0a05b7 docs 2025-11-29 16:47:14 +02:00
vegu-ai-tools
04ec39f154 visual library docs 2025-11-29 15:14:24 +02:00
vegu-ai-tools
a5368a8a66 Update documentation for Shared Context to Shared World, enhancing clarity and adding details on episodes management and scene linking. 2025-11-29 13:57:33 +02:00
vegu-ai-tools
8ffdd8841c docs 2025-11-29 13:38:40 +02:00
vegu-ai-tools
83e2bdbe0e character card import docs 2025-11-29 13:26:44 +02:00
vegu-ai-tools
c97a83fb9a docs images 2025-11-29 13:07:51 +02:00
vegu-ai-tools
e2a1e465dd dont give saved character card image files as an option for character import 2025-11-29 13:07:42 +02:00
vegu-ai-tools
639955edaf Fix model name cleaning function to handle backslashes in template file names 2025-11-29 02:46:22 +02:00
vegu-ai-tools
366efb2532 linting 2025-11-28 21:36:13 +02:00
vegu-ai-tools
0cb4ba17ef add reason_prefill to force model to start with thinking block 2025-11-28 21:36:02 +02:00
vegu-ai-tools
f3b398322b linting 2025-11-28 19:56:03 +02:00
vegu-ai-tools
5e46107d07 fix kcpp visual agent auto setup failure messages 2025-11-28 19:54:34 +02:00
vegu-ai-tools
c8162ad350 linting 2025-11-28 15:57:49 +02:00
vegu-ai-tools
590e33c9b0 improve backend status display in visual library 2025-11-28 15:57:19 +02:00
vegu-ai-tools
5860745091 Remove custom clear tags functionality from VisualImageView component 2025-11-28 15:35:32 +02:00
vegu-ai-tools
4e2ad01f81 Add clear tags functionality in VisualImageView component
- Introduced a new inline confirmation button to clear tags from the form.
- Enhanced the component by importing and integrating the ConfirmActionInline component.
- Added logic to reset the tags array when the clear action is confirmed.
2025-11-28 15:28:42 +02:00
vegu-ai-tools
66bbe267e4 Add tag management features in VisualImageView component
- Introduced copy and paste functionality for tags using clipboard operations.
- Added tooltips for copy and paste buttons to enhance user experience.
- Improved layout of action buttons for better alignment and accessibility.
2025-11-28 15:25:00 +02:00
vegu-ai-tools
fcd675def0 Enhance asset search functionality in VisualLibraryScene component
- Added clear button to asset search input for easier user interaction.
- Implemented logic to determine which folders to open based on the current asset search input, ensuring relevant folders are displayed when filtering is active.
- Merged computed folder IDs with currently open nodes to maintain user selections during searches.
2025-11-28 15:20:51 +02:00
vegu-ai-tools
32dfc21c2f linting 2025-11-28 14:19:04 +02:00
vegu-ai-tools
fdf5ec0426 min thinking tokens to 512 2025-11-28 14:00:48 +02:00
vegu-ai-tools
598f6ad0b3 z image turbo workflow 2025-11-28 13:55:54 +02:00
vegu-ai-tools
b92b1048ae Add workflow reloading functionality to Backend class and improve status checks
- Implemented `_reload_workflow_if_outdated` method to reload the workflow from disk if it is outdated.
- Updated `ready` method to call the new workflow reloading method and added additional status checks for workflow validity.
- Enhanced logging for various workflow status conditions to improve debugging and monitoring.
2025-11-28 13:46:02 +02:00
vegu-ai-tools
27c378128f linting 2025-11-25 03:20:07 +02:00
vegu-ai-tools
61c0307543 sdnext auth method 2025-11-25 03:19:40 +02:00
vegu-ai-tools
8ab47e751e Improve debug logging in asset path retrieval by clarifying log message when asset ID is not found. 2025-11-25 02:46:37 +02:00
vegu-ai-tools
b636cb5398 nodes 2025-11-23 02:17:25 +02:00
vegu-ai-tools
6986f4df3f fix issue when attempting to load scene from scene search list 2025-11-23 02:15:47 +02:00
vegu-ai-tools
8ec0188001 Enhance name formatting in WorldState class to correctly handle possessive cases by using regex, ensuring proper title casing for character and item names. 2025-11-22 21:04:23 +02:00
vegu-ai-tools
cb38f6a830 Update visual styles instructions in YAML template to emphasize clear artist guidance for various art styles. 2025-11-22 20:40:04 +02:00
vegu-ai-tools
65d9137b66 Refactor prompt building logic in VisualPrompt class for improved readability and maintainability. Adjust formatting in validation module to ensure consistent newline handling. 2025-11-22 20:00:15 +02:00
vegu-ai-tools
e8fc01d227 modules 2025-11-22 19:03:45 +02:00
vegu-ai-tools
a523ba8828 Add ValidateAssetID node for asset ID validation in validation module, enhancing asset management capabilities. 2025-11-22 19:02:50 +02:00
vegu-ai-tools
ca8c2d0e03 Remove unused mediaType property and related code from DirectorConsoleChatMessageAssetView component, streamlining image handling logic and improving code clarity. 2025-11-22 18:22:44 +02:00
vegu-ai-tools
8ed3c9badf Refactor image preview functionality in DirectorConsoleChatMessageAssetView component. Replace dialog trigger with a dedicated method to handle image loading and preview sizing based on image dimensions. 2025-11-22 18:18:59 +02:00
vegu-ai-tools
0d7d49f107 nodes 2025-11-22 17:29:34 +02:00
vegu-ai-tools
8bcec0acb8 Add create_placeholder property to ValidateCharacter for optional placeholder creation when character does not exist, enhancing validation logic. 2025-11-22 17:28:52 +02:00
vegu-ai-tools
dc3ba4a18a Add new properties to VisualPrompt for handling positive and negative prompts, including keywords and descriptive formats, and refactor prompt building logic. 2025-11-22 17:28:42 +02:00
vegu-ai-tools
16f135c8f0 linting 2025-11-22 16:16:44 +02:00
veguAI
2d6d61d9d6 Director visuals (#22)
director chat action for image creation
2025-11-22 16:16:06 +02:00
vegu-ai-tools
19b70c4ba8 missing test files 2025-11-21 00:28:31 +02:00
vegu-ai-tools
e6fbc2dfaf updated what's new 2025-11-20 23:41:45 +02:00
vegu-ai-tools
053a872fde Enhance delete confirmation flow in WorldStateManagerCharacter component by focusing on input field after clicking delete button. 2025-11-20 22:53:34 +02:00
vegu-ai-tools
c0e224e2b6 linting 2025-11-20 22:29:20 +02:00
vegu-ai-tools
7fc2fccd0c Add gemini-3-pro-image-preview option to GoogleImageMixin for enhanced model selection. 2025-11-20 22:29:10 +02:00
vegu-ai-tools
7c253650b6 Add first_token_time to RequestInformation for accurate rate calculation 2025-11-20 19:44:56 +02:00
vegu-ai-tools
11c1427d9a Reduce debounce time for scene status emission from 50ms to 25ms for improved responsiveness. 2025-11-20 19:43:53 +02:00
vegu-ai-tools
7e376e02b0 node update 2025-11-20 14:01:34 +02:00
vegu-ai-tools
bf326b636d debounce scene emit_status 2025-11-20 13:32:57 +02:00
vegu-ai-tools
8502306824 Replace v-text-field with v-number-input in multiple components for improved number input handling and consistency. 2025-11-20 13:21:52 +02:00
vegu-ai-tools
8a16f500df Refactor import options in CharacterCardImport component to use v-checkbox with tooltips for better user experience and clarity. 2025-11-20 12:56:39 +02:00
vegu-ai-tools
18f8c4f752 json character card import should use new system 2025-11-20 12:39:41 +02:00
vegu-ai-tools
643ca7fc32 cleanup 2025-11-20 12:11:36 +02:00
vegu-ai-tools
1eba909376 cleanup 2025-11-20 12:02:00 +02:00
vegu-ai-tools
ac3e569324 linting 2025-11-20 11:51:41 +02:00
vegu-ai-tools
e632f2bff6 Add unique random color assignment for characters in load_scene_from_character_card function 2025-11-20 11:50:25 +02:00
vegu-ai-tools
dde6c76aa9 Add writing style template selection to CharacterCardImport component and update scene loading logic to apply selected template 2025-11-20 11:41:26 +02:00
vegu-ai-tools
152f47a12e Update episode management to reflect correct directory structure for episodes.json 2025-11-20 02:09:43 +02:00
vegu-ai-tools
a2f12ffb1b Add maxHeight property to v-select in AgentModal component 2025-11-20 02:01:54 +02:00
vegu-ai-tools
b580eb2316 Enhance error logging in Google Image backend to include result data on parse errors 2025-11-20 02:00:01 +02:00
vegu-ai-tools
7acb26cf55 fix asset meta when created through auto cover image creation 2025-11-20 01:44:54 +02:00
vegu-ai-tools
4fd3e0d25e when introducing a character without instructions send empty content 2025-11-20 00:29:41 +02:00
vegu-ai-tools
495d9b1abe prompt tweaks 2025-11-20 00:29:23 +02:00
vegu-ai-tools
2ad54cc5f2 Update cover image IDs in generate_intro_scenes 2025-11-20 00:10:38 +02:00
vegu-ai-tools
31c4e20612 Add 'gemini-3-pro-preview' model option to GoogleImageMixin configuration 2025-11-19 12:15:18 +02:00
vegu-ai-tools
63db597c1c Add support for new model 'gemini-3-pro-preview' in Google client 2025-11-19 12:09:01 +02:00
vegu-ai-tools
6fefefc695 linting 2025-11-19 01:44:19 +02:00
vegu-ai-tools
7391d76dc2 Add character dialogue example generation functionality
- Implemented a new method in the CharacterCreatorMixin to extract or generate dialogue examples for characters based on provided text.
- Updated the character card loading process to include dialogue example determination, ensuring examples are regenerated properly.
- Created a new Jinja2 template for generating dialogue examples, including guidelines for format and content.
- Enhanced logging for dialogue example generation to track character names and example counts.
2025-11-19 01:36:16 +02:00
vegu-ai-tools
885b48a83f Add new Jinja2 template for generating horizontal scene illustrations
- Created a new template to generate prompts for horizontal illustrations capturing dynamic moments in scenes.
- Included sections for character context, requirements, and task instructions to guide image generation.
- Emphasized the importance of action, emotion, and cinematic framing in the generated images.
2025-11-18 20:34:13 +02:00
vegu-ai-tools
b364bc28b0 Fix example stripping logic in Character class to handle cases without colons 2025-11-18 20:20:47 +02:00
vegu-ai-tools
2e43da8b0f linting 2025-11-18 20:09:01 +02:00
vegu-ai-tools
0e393a09e2 Add require_active flag to reinforcement templates and logic
- Introduced a require_active boolean flag in various reinforcement classes and templates to control reinforcement activation based on character status.
- Updated logic in WorldStateAgent to skip inactive character reinforcements when require_active is true.
- Enhanced frontend components to support the new require_active option for character reinforcements.
2025-11-18 19:51:34 +02:00
vegu-ai-tools
a595e73c1e Update response length calculation in NarratorAgent to use max_generation_length 2025-11-18 19:35:33 +02:00
vegu-ai-tools
3c25b99340 Enhance character card import UI with re-analyze and retry options
- Added a "Re-analyze" button that appears when file data is present and analysis is not in progress.
- Improved error handling by introducing a "Retry" button in the error alert for failed analyses.
- Adjusted the display logic for analysis information to improve user experience.
2025-11-18 19:32:10 +02:00
vegu-ai-tools
fe014755e2 allow cancelling of character card import 2025-11-18 19:25:10 +02:00
vegu-ai-tools
35b11156ee fix character card preview when card is loaded from talemate file system 2025-11-18 13:51:59 +02:00
vegu-ai-tools
b733774096 fix shared world auto refresh 2025-11-18 13:41:37 +02:00
vegu-ai-tools
a89f11cd0a linting 2025-11-18 13:09:48 +02:00
vegu-ai-tools
33cc6d3442 Add max-height constraints to episode layouts and sidebars for improved UI consistency 2025-11-18 13:09:41 +02:00
vegu-ai-tools
79d39e5297 Remove max-height constraint from episodes layout to improve UI flexibility 2025-11-18 13:06:43 +02:00
vegu-ai-tools
70cd6946eb Update excluded character names to enhance detection accuracy
- Expanded the list of excluded names in character detection to include additional variations, preventing false positives during character name identification.
2025-11-18 13:04:44 +02:00
vegu-ai-tools
eb55ec6877 linting 2025-11-18 12:53:45 +02:00
vegu-ai-tools
218a301909 Remove early return in handleMessage to ensure prompts are cleared when scene is loaded 2025-11-18 12:52:39 +02:00
vegu-ai-tools
00b96b603f prompt tweaks 2025-11-18 12:52:25 +02:00
vegu-ai-tools
d663ee01bd Refactor import options initialization in load_scene_from_character_card function
- Moved the initialization of import_options to ensure it occurs after the import statements, improving code organization and readability.
2025-11-18 12:40:07 +02:00
vegu-ai-tools
fb28888011 dont pass agent instances around 2025-11-18 12:36:42 +02:00
vegu-ai-tools
186630b11c tweaks to character description and character sheet extraction 2025-11-18 12:32:23 +02:00
vegu-ai-tools
0f96b19af7 stuff 2025-11-18 05:14:43 +02:00
vegu-ai-tools
30377189c5 linting 2025-11-18 03:47:33 +02:00
vegu-ai-tools
13bf370e76 Enhance character detection by processing texts in chunks
- Introduced a new method, detect_characters_from_texts, to analyze multiple texts by processing them in manageable chunks based on the client's max context size.
- Added functionality to avoid duplicate detections by passing already detected character names.
- Implemented utility functions for chunking items by token count and removing substring names to improve character detection accuracy.
- Updated the corresponding Jinja2 template to reflect the changes in character detection logic.
2025-11-18 03:47:20 +02:00
vegu-ai-tools
d7df0dd2e5 linting 2025-11-18 03:18:58 +02:00
vegu-ai-tools
c5cb4d7335 Enhance character card validation and error handling
- Added validation to ensure character card data is a dictionary, raising a ValueError with a clear message if invalid.
- Improved error handling in the analyze_character_card function to provide user-friendly messages for various parsing errors, including JSON decoding and file not found issues.
- Updated the CharacterCardImport component to adjust dialog sizing based on analysis errors, enhancing user feedback during character import.
2025-11-18 03:18:36 +02:00
vegu-ai-tools
0db092cb8a linting 2025-11-18 03:10:27 +02:00
vegu-ai-tools
5496ef7a57 Enhance CharacterCardImport component with dynamic dialog sizing and image preview
- Updated the dialog max-width to adjust based on the analyzing state, improving user experience during character import.
- Added an image preview section that displays the character card image or relevant messages based on the file data and path, enhancing feedback during the import process.
2025-11-18 03:09:46 +02:00
vegu-ai-tools
4e999b8300 Enhance episode title generation and character import options
- Updated the CreatorAgent to parse titles from AI responses, ensuring titles are extracted from <TITLE> tags.
- Added a new option in CharacterCardImportOptions to enable episode title generation from alternate greetings.
- Implemented logic in the loading process to generate titles for episodes if the new option is enabled.
- Enhanced the UI in CharacterCardImport.vue to include a toggle for generating episode titles during character import.
2025-11-18 03:07:39 +02:00
vegu-ai-tools
cc8579b554 Refactor CharacterCardImport component to enhance character selection UI
- Replaced radio buttons with a tabbed interface for selecting character modes: Default Template, Detected Character, and Import from Another Scene.
- Updated the layout to use a window component for displaying the corresponding input fields based on the selected character mode.
- Improved overall user experience with better organization and visual clarity in character selection options.
2025-11-18 02:29:29 +02:00
vegu-ai-tools
5741157483 Enhance character parsing logic in character_card.py
- Updated _parse_characters_from_greeting_text to accept a scene parameter for improved character validation.
- Implemented logic to validate character names against actual characters in the scene, including partial matches for NPCs.
- Added functionality to activate up to 2 NPCs if no characters are detected from the greeting text.
- Adjusted related calls to ensure compatibility with the new parsing method.
2025-11-18 02:23:04 +02:00
vegu-ai-tools
50234a4d88 Refactor WorldStateManagerSceneSharedContext for improved layout and structure
- Simplified the template structure by removing unnecessary nested rows and columns in WorldStateManagerSceneSharedContext.vue.
- Enhanced the UI layout for better readability and user experience in managing shared contexts.
- Updated WorldStateManagerSceneSharedWorld.vue to maintain consistent structure when integrating the SharedContext component.
2025-11-18 02:11:06 +02:00
vegu-ai-tools
9ed511139f Refactor WorldStateManagerScene components and introduce SharedWorld
- Renamed WorldStateManagerSceneSharedContext to WorldStateManagerSceneSharedWorld for clarity.
- Created a new WorldStateManagerSceneSharedWorld component to encapsulate shared context functionality.
- Updated references and imports across components to reflect the new structure.
- Enhanced UI elements for better user experience in managing shared contexts and episodes.
2025-11-18 02:08:56 +02:00
vegu-ai-tools
3fe34c6688 Implement auto-selection of characters based on episode intro in WorldStateManagerSceneSharedContext
- Added functionality to automatically select characters mentioned in the episode intro when the dialog is open and characters are available.
- Introduced a new method, autoSelectCharactersFromIntro, to handle the selection logic based on character names found in the intro text.
- Enhanced the existing episode selection handling to trigger character auto-selection when an episode is selected.
2025-11-18 02:05:56 +02:00
vegu-ai-tools
85777a4d2a Add sharing functionality for world entries and characters in WorldStateManager
- Implemented methods to share and unshare all world entries and characters in the WorldStateManagerPlugin and CharacterMixin classes.
- Enhanced the WorldStateManagerSceneSharedContext.vue component with UI elements to trigger sharing actions for characters and world entries.
- Added corresponding websocket actions to facilitate sharing and unsharing operations from the frontend.
2025-11-18 01:59:36 +02:00
vegu-ai-tools
fe854141e7 Refactor WorldStateManagerScene components for improved UI and functionality
- Removed unnecessary icon slots in WorldStateManagerSceneEpisodes.vue for cleaner design.
- Updated icon colors in WorldStateManagerSceneSharedContext.vue to enhance visual hierarchy.
- Streamlined dialog components for creating new scenes and shared contexts, ensuring consistent layout and user experience.
2025-11-18 01:54:14 +02:00
vegu-ai-tools
da0be7a9ff Add shared context setup option in CharacterCardImport and related components
- Introduced a new toggle in CharacterCardImport.vue for setting up shared context during character import.
- Updated CharacterCardImportOptions model to include a flag for shared context setup.
- Implemented _setup_shared_context_for_import function in character_card.py to handle shared context creation and character marking.
- Enhanced WorldStateManagerSceneSharedContext.vue to display shared context details and manage linked files more effectively.
2025-11-18 01:51:19 +02:00
vegu-ai-tools
d78c2398e4 Update icons and styling in WorldStateManagerScene components for improved UI consistency
- Changed the delete button icon in WorldStateManagerSceneEpisodes.vue to mdi-close-circle-outline for better visual clarity.
- Added margin-top class to the card text in WorldStateManagerSceneSharedContext.vue to enhance layout spacing.
2025-11-18 01:30:14 +02:00
vegu-ai-tools
096af6486e Enhance WorldStateManagerScene components with new scene and shared context features
- Added a new section for creating scenes with shared context in WorldStateManagerSceneSharedContext.vue.
- Improved UI elements including buttons and chips for better user interaction.
- Updated episode action buttons in WorldStateManagerSceneEpisodes.vue for clarity and consistency.
2025-11-18 01:26:44 +02:00
vegu-ai-tools
8e3c6fe166 Add episode editing functionality and enhance WorldStateManagerScene components
- Introduced a dialog for adding and editing episodes in WorldStateManagerSceneEpisodes.vue.
- Added buttons for editing selected episodes and improved form validation.
- Updated WorldStateManagerScene and WorldStateManagerSceneSharedContext to accept templates and generation options as props for better context management.
2025-11-18 01:07:17 +02:00
vegu-ai-tools
b097fe9439 refactor scene_versions into episodes 2025-11-18 00:57:53 +02:00
vegu-ai-tools
b8c46996b2 move import 2025-11-17 01:34:01 +02:00
vegu-ai-tools
b3cdf11d9d linting 2025-11-17 01:32:17 +02:00
vegu-ai-tools
f238b890b4 Refactor scene loading by introducing scene_stub function to create minimal Scene objects, improving asset access without full scene loading. Update SceneAssets class to utilize scene_stub for better efficiency. 2025-11-17 01:26:34 +02:00
vegu-ai-tools
30ce1541b9 transfer cover image 2025-11-17 01:22:56 +02:00
vegu-ai-tools
a46bdb929c linting 2025-11-16 23:02:09 +02:00
vegu-ai-tools
b7663ab263 Implement player character setup options in CharacterCardImport component, including template creation, existing character selection, and scene import. Update scene loading logic to accommodate new character data format and enhance character management in the application. 2025-11-16 23:01:56 +02:00
vegu-ai-tools
584b3ae7b9 modules 2025-11-16 22:24:19 +02:00
vegu-ai-tools
dd6181d02a Rename "Restore Scene" to "Reset" in SceneToolsSave component for clarity, and update confirmation prompt action label to "Reset Scene". 2025-11-16 21:31:30 +02:00
vegu-ai-tools
a2c97932d7 Enhance scene restoration process by adding a confirmation prompt in SceneToolsSave component and ensuring restore_from state is preserved in TaleMate class. 2025-11-16 21:25:57 +02:00
vegu-ai-tools
267b552d38 Add save and restore functionality to SceneToolsSave component, including visual indicators for unsaved changes and the ability to restore scenes from specified points. 2025-11-16 21:19:04 +02:00
vegu-ai-tools
814d006c88 Add unified API key support across various components and implement ConfigWidgetUnifiedApiKey for streamlined API key management. 2025-11-16 18:47:00 +02:00
vegu-ai-tools
4e57761553 Refactor CoverImage component by removing unused isPortrait logic and simplifying image handling, enhancing code clarity and maintainability. 2025-11-16 18:28:33 +02:00
vegu-ai-tools
389f0ee9ad linting 2025-11-16 18:25:24 +02:00
vegu-ai-tools
6a5d1f1173 Refactor VisualImageView component to use v-card for image display and adjust image preview styles for better responsiveness 2025-11-16 18:25:14 +02:00
vegu-ai-tools
da739b1d20 move buttons to scene component 2025-11-16 18:10:37 +02:00
vegu-ai-tools
2116d6e552 assets 2025-11-16 18:10:19 +02:00
vegu-ai-tools
36b6ef5b7d add analysis data 2025-11-16 17:40:21 +02:00
vegu-ai-tools
3d4d995710 Add reference_assets field to EditAssetMetaPayload and update related components for improved asset management and metadata handling. 2025-11-16 17:40:12 +02:00
vegu-ai-tools
13087e676a Add metadata to scene cover asset in library.json, including visual type, generation type, resolution, and tags for improved asset management. 2025-11-16 17:26:15 +02:00
vegu-ai-tools
3575a0d67b assets 2025-11-16 17:24:43 +02:00
vegu-ai-tools
b28c28f25d Refine template management description in Templates.vue for clarity and conciseness 2025-11-16 17:23:43 +02:00
vegu-ai-tools
1e9b8f18a1 Add styles for image positioning in IntroRecentScenes component to enhance visual layout 2025-11-16 16:46:56 +02:00
vegu-ai-tools
53dbc2a085 Remove redundant debug logging in migrate_scene_assets_to_library function 2025-11-16 16:46:51 +02:00
vegu-ai-tools
5a12395fb9 assets 2025-11-16 16:41:17 +02:00
vegu-ai-tools
df197ac873 assets 2025-11-16 16:40:59 +02:00
vegu-ai-tools
deab1bf3c3 assets 2025-11-16 16:39:28 +02:00
vegu-ai-tools
0f5731f7b5 Add scene_info method to SceneAssets class for simplified asset data retrieval 2025-11-16 16:24:22 +02:00
vegu-ai-tools
5b5a9e3d68 update assets 2025-11-16 16:24:00 +02:00
vegu-ai-tools
5fed0115f3 Improve image aspect ratio handling in CoverImage component by introducing a delay before checking the aspect ratio. Reset portrait flag upon asset updates for various asset types to ensure accurate orientation detection. 2025-11-16 16:10:51 +02:00
vegu-ai-tools
2589ad36f8 linting 2025-11-16 14:45:52 +02:00
vegu-ai-tools
d2441142e0 module update 2025-11-16 14:45:42 +02:00
vegu-ai-tools
2a645ee0c6 Implement save option for asset analysis in AnalysisRequest and related components. Update websocket handler and frontend to support saving analysis results conditionally based on user input. 2025-11-16 14:45:14 +02:00
vegu-ai-tools
d453ee6781 linting 2025-11-16 13:39:55 +02:00
vegu-ai-tools
eaa9f76181 unified asset library 2025-11-16 13:38:55 +02:00
vegu-ai-tools
6890f4f138 Add unified API key configuration support across various client classes and implement handling for saving the API key in the server config. Update frontend components to manage and display the unified API key. 2025-11-16 12:31:18 +02:00
vegu-ai-tools
1ae3d2669c Enhance CoverImage component with improved aspect ratio handling and responsive styling. Added logic to determine image orientation and updated class bindings for better visual consistency. 2025-11-16 11:44:05 +02:00
vegu-ai-tools
566e03c9e6 Refactor CoverImage and IntroRecentScenes components to improve image handling and styling. Added aspect ratios and updated class usage for better responsiveness and visual consistency. 2025-11-16 11:32:54 +02:00
vegu-ai-tools
dff778c997 relock 2025-11-16 11:28:06 +02:00
vegu-ai-tools
a8d9202c92 Add clarification on content search methodology in WorldStateManagerContextDB.vue 2025-11-16 02:00:15 +02:00
vegu-ai-tools
9dfe5d34db more character info for visual generation 2025-11-16 01:55:59 +02:00
vegu-ai-tools
70fc83afe5 linting 2025-11-16 01:29:03 +02:00
vegu-ai-tools
1730c7f53c director can generate visuals for new characters 2025-11-16 01:28:40 +02:00
vegu-ai-tools
4cdf3f240f linting 2025-11-15 21:07:56 +02:00
vegu-ai-tools
693052046b Enhance loading status setup in character card import by adding parameters for character count and character book presence. Adjust loading steps calculation for improved accuracy based on these parameters. 2025-11-15 21:07:10 +02:00
vegu-ai-tools
b7ea6c9d7f linting 2025-11-15 21:03:48 +02:00
vegu-ai-tools
77a14ab3db Refactor character management to utilize ClientContext for scene detection, enhancing flexibility in handling active scenes. Update context model to include 'requires_active_scene' attribute and streamline error handling in client base. 2025-11-15 21:03:39 +02:00
vegu-ai-tools
0a0cf9427a linting 2025-11-15 20:45:47 +02:00
vegu-ai-tools
170380becc Remove unused 'app-ready' prop from VisualLibrary component to streamline code and improve clarity. 2025-11-15 20:45:03 +02:00
vegu-ai-tools
8ff51def0f Update character card import alert color and icon for improved visibility during analysis 2025-11-15 20:44:56 +02:00
vegu-ai-tools
7431752afe improved character card import 2025-11-15 20:30:43 +02:00
vegu-ai-tools
e4ae19949a Refactor changelog handling to use a local variable for InMemoryChangelog, improving code clarity and maintainability. 2025-11-15 20:17:04 +02:00
vegu-ai-tools
42d2b574db character card import improvements 2025-11-15 19:55:36 +02:00
vegu-ai-tools
833dde378f linting 2025-11-15 19:02:19 +02:00
vegu-ai-tools
3b7df1eb3c support alternative intros 2025-11-15 18:25:41 +02:00
vegu-ai-tools
7f2fc4421c character card lorebook loading 2025-11-15 17:42:36 +02:00
vegu-ai-tools
1a06b70994 Refactor scene cover image handling to emit status only when the scene is active, ensuring proper event management. Update TalemateApp to include 'type' prop for CoverImage component, enhancing flexibility in rendering. 2025-11-15 17:42:21 +02:00
vegu-ai-tools
858b94856b load.py into load module 2025-11-15 16:16:18 +02:00
vegu-ai-tools
5a38958819 Enhance tab selection logic in TalemateApp to prefer 'home' tab when no scene is loaded, improving user experience during initial app interactions. 2025-11-15 16:00:33 +02:00
vegu-ai-tools
c0eaa00872 Update client configuration options to only include enabled clients in the agent's config options. 2025-11-15 15:58:21 +02:00
vegu-ai-tools
7052cedd3f app ready status 2025-11-15 15:05:16 +02:00
vegu-ai-tools
0c77004725 fix pydantic warnings 2025-11-15 13:48:42 +02:00
vegu-ai-tools
93c5c58c10 fix tests 2025-11-15 13:44:00 +02:00
vegu-ai-tools
e5e3091c9c remove runpod integration 2025-11-15 13:40:24 +02:00
vegu-ai-tools
be472b4b15 module fix 2025-11-15 13:34:06 +02:00
veguAI
04e8349975 Visual refactor 2 (#20)
- visual agent refactor
- visual library
- move template management out of world editor
2025-11-15 13:32:50 +02:00
vegu-ai-tools
1f47219b35 enabled needs to be passed to apply config so agents can act on it changing 2025-10-26 21:59:49 +02:00
vegu-ai-tools
e51d1dbcbc set 0.34 2025-10-26 20:14:42 +02:00
348 changed files with 34597 additions and 5779 deletions

View File

@@ -19,6 +19,25 @@ jobs:
steps:
- uses: actions/checkout@v4
- name: Remove unnecessary files to release disk space
run: |
sudo rm -rf \
"$AGENT_TOOLSDIRECTORY" \
/opt/ghc \
/opt/google/chrome \
/opt/microsoft/msedge \
/opt/microsoft/powershell \
/opt/pipx \
/usr/lib/mono \
/usr/local/julia* \
/usr/local/lib/android \
/usr/local/lib/node_modules \
/usr/local/share/chromium \
/usr/local/share/powershell \
/usr/local/share/powershell \
/usr/share/dotnet \
/usr/share/swift
- name: Log in to GHCR
uses: docker/login-action@v3
with:

View File

@@ -14,6 +14,25 @@ jobs:
steps:
- uses: actions/checkout@v4
- name: Remove unnecessary files to release disk space
run: |
sudo rm -rf \
"$AGENT_TOOLSDIRECTORY" \
/opt/ghc \
/opt/google/chrome \
/opt/microsoft/msedge \
/opt/microsoft/powershell \
/opt/pipx \
/usr/lib/mono \
/usr/local/julia* \
/usr/local/lib/android \
/usr/local/lib/node_modules \
/usr/local/share/chromium \
/usr/local/share/powershell \
/usr/local/share/powershell \
/usr/share/dotnet \
/usr/share/swift
- name: Log in to GHCR
uses: docker/login-action@v3
with:

View File

@@ -45,6 +45,9 @@ WORKDIR /app
RUN apt-get update && apt-get install -y \
bash \
wget \
tar \
xz-utils \
&& rm -rf /var/lib/apt/lists/*
# Install uv in the final stage
@@ -53,6 +56,21 @@ RUN pip install uv
# Copy virtual environment from backend-build stage
COPY --from=backend-build /app/.venv /app/.venv
# Download and install FFmpeg 8.0 with shared libraries into .venv (matching Windows installer approach)
# Using BtbN FFmpeg builds which provide shared libraries - verified to work
# Note: We tried using jrottenberg/ffmpeg:8.0-ubuntu image but copying libraries from it didn't work properly,
# so we use the direct download approach which is more reliable and matches the Windows installer
RUN cd /tmp && \
wget -q https://github.com/BtbN/FFmpeg-Builds/releases/download/latest/ffmpeg-master-latest-linux64-gpl-shared.tar.xz -O ffmpeg.tar.xz && \
tar -xf ffmpeg.tar.xz && \
cp -a ffmpeg-master-latest-linux64-gpl-shared/bin/* /app/.venv/bin/ && \
cp -a ffmpeg-master-latest-linux64-gpl-shared/lib/* /app/.venv/lib/ && \
rm -rf ffmpeg-master-latest-linux64-gpl-shared ffmpeg.tar.xz && \
LD_LIBRARY_PATH=/app/.venv/lib /app/.venv/bin/ffmpeg -version | head -n 1
# Set LD_LIBRARY_PATH so torchcodec can find ffmpeg libraries at runtime
ENV LD_LIBRARY_PATH=/app/.venv/lib:${LD_LIBRARY_PATH}
# Copy Python source code
COPY --from=backend-build /app/src /app/src

View File

@@ -1,134 +0,0 @@
"""
An attempt to write a client against the runpod serverless vllm worker.
This is close to functional, but since runpod serverless gpu availability is currently terrible, i have
been unable to properly test it.
Putting it here for now since i think it makes a decent example of how to write a client against a new service.
"""
import pydantic
import structlog
import runpod
import asyncio
import aiohttp
from talemate.client.base import ClientBase, ExtraField
from talemate.client.registry import register
from talemate.emit import emit
from talemate.config import Client as BaseClientConfig
log = structlog.get_logger("talemate.client.runpod_vllm")
class Defaults(pydantic.BaseModel):
max_token_length: int = 4096
model: str = ""
runpod_id: str = ""
class ClientConfig(BaseClientConfig):
runpod_id: str = ""
@register()
class RunPodVLLMClient(ClientBase):
client_type = "runpod_vllm"
conversation_retries = 5
config_cls = ClientConfig
class Meta(ClientBase.Meta):
title: str = "Runpod VLLM"
name_prefix: str = "Runpod VLLM"
enable_api_auth: bool = True
manual_model: bool = True
defaults: Defaults = Defaults()
extra_fields: dict[str, ExtraField] = {
"runpod_id": ExtraField(
name="runpod_id",
type="text",
label="Runpod ID",
required=True,
description="The Runpod ID to connect to.",
)
}
def __init__(self, model=None, runpod_id=None, **kwargs):
self.model_name = model
self.runpod_id = runpod_id
super().__init__(**kwargs)
@property
def experimental(self):
return False
def set_client(self, **kwargs):
log.debug("set_client", kwargs=kwargs, runpod_id=self.runpod_id)
self.runpod_id = kwargs.get("runpod_id", self.runpod_id)
def tune_prompt_parameters(self, parameters: dict, kind: str):
super().tune_prompt_parameters(parameters, kind)
keys = list(parameters.keys())
valid_keys = ["temperature", "top_p", "max_tokens"]
for key in keys:
if key not in valid_keys:
del parameters[key]
async def get_model_name(self):
return self.model_name
async def generate(self, prompt: str, parameters: dict, kind: str):
"""
Generates text from the given prompt and parameters.
"""
prompt = prompt.strip()
self.log.debug("generate", prompt=prompt[:128] + " ...", parameters=parameters)
try:
async with aiohttp.ClientSession() as session:
endpoint = runpod.AsyncioEndpoint(self.runpod_id, session)
run_request = await endpoint.run(
{
"input": {
"prompt": prompt,
}
# "parameters": parameters
}
)
while (await run_request.status()) not in [
"COMPLETED",
"FAILED",
"CANCELLED",
]:
status = await run_request.status()
log.debug("generate", status=status)
await asyncio.sleep(0.1)
status = await run_request.status()
log.debug("generate", status=status)
response = await run_request.output()
log.debug("generate", response=response)
return response["choices"][0]["tokens"][0]
except Exception as e:
self.log.error("generate error", e=e)
emit(
"status", message="Error during generation (check logs)", status="error"
)
return ""
def reconfigure(self, **kwargs):
if kwargs.get("model"):
self.model_name = kwargs["model"]
if "runpod_id" in kwargs:
self.api_auth = kwargs["runpod_id"]
self.set_client(**kwargs)

Binary file not shown.

After

Width:  |  Height:  |  Size: 346 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 702 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 42 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 19 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 471 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 180 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 82 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 48 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 70 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 46 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 45 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 411 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 21 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 371 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 46 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 45 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 50 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 42 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 44 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 41 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 44 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 35 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 50 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 50 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 21 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 52 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 48 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 47 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 45 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 48 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 51 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 45 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 53 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 60 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.1 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 15 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.0 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 15 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 91 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 39 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.9 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.0 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.6 KiB

View File

@@ -1,42 +0,0 @@
# AUTOMATIC1111
!!! info
This requires you to setup a local instance of the AUTOMATIC1111 API. Follow the instructions from [their GitHub](https://github.com/AUTOMATIC1111/stable-diffusion-webui) to get it running.
Once you have it running, you will want to adjust the `webui-user.bat` in the AUTOMATIC1111 directory to include the following command arguments:
```bat
set COMMANDLINE_ARGS=--api --listen --port 7861
```
Then run the `webui-user.bat` to start the API.
Once your AUTOAMTIC1111 API is running (check with your browser) you can set the Visualizer config to use the `AUTOMATIC1111` backend
## Settings
![Visual agent automatic1111 settings](/talemate/img/0.27.0/automatic1111-settings.png)
##### API URL
The url of the API, if following this example, should be `http://localhost:7861`
##### Steps
The number of steps to use for image generation. More steps will result in higher quality images but will take longer to generate.
##### Sampling Method
Which sampling method to use for image generation.
##### Schedule Type
Which scheduler to use for image generation.
##### CFG Scale
CFG scale for image generation.
##### Model type
Differentiates between `SD1.5` and `SDXL` models. This will dictate the resolution of the image generation and actually matters for the quality so make sure this is set to the correct model type for the model you are using.

View File

@@ -0,0 +1,80 @@
# AUTOMATIC1111
!!! warning "Deprecated Backend"
**AUTOMATIC1111 (A1111) is essentially dead at this point** - development has largely stopped and the project is no longer actively maintained. Support for AUTOMATIC1111 has only been carried forward in Talemate because it was easy to maintain compatibility.
**We strongly recommend using [SD.Next](sdnext.md) instead**, which is an actively maintained fork of AUTOMATIC1111 with improved performance, better features, and ongoing development. SD.Next maintains API compatibility with AUTOMATIC1111, so migration is straightforward.
The AUTOMATIC1111 backend provides basic text-to-image generation capabilities using the AUTOMATIC1111 Stable Diffusion WebUI API. This backend only supports text-to-image generation - it does not support image editing or image analysis.
![This screenshot displays the dark-themed "General" settings interface for an application named "Visualizer," featuring a sidebar menu on the left and configuration options on the right. Key settings include dropdowns where the Client is set to "Google" and the text-to-image backend is set to "AUTOMATIC1111," alongside an image generation timeout slider positioned at 301. Additional controls show a checked box for "Automatic Setup," an unchecked box for "Automatic Generation," and a "Fallback Prompt Type" set to "Keywords."](/talemate/img/0.34.0/visual-agent-a1111-1.png)
## Prerequisites
Before configuring the AUTOMATIC1111 backend, you need to have AUTOMATIC1111 installed and running:
1. Install and start AUTOMATIC1111 Stable Diffusion WebUI on your system
2. Ensure the API is enabled and accessible
3. Note the API URL (default is `http://localhost:7860`)
!!! note "Migration to SD.Next"
If you're setting up a new installation, please use [SD.Next](sdnext.md) instead. If you have an existing AUTOMATIC1111 installation, consider migrating to SD.Next for better performance and ongoing support.
## Configuration
In the Visualizer agent settings, select AUTOMATIC1111 as your backend for text-to-image generation.
### Text-to-Image Configuration
For text-to-image generation, configure the following settings:
- **API URL**: The URL where your AUTOMATIC1111 instance is running (e.g., `http://localhost:7860`)
- **Steps**: Number of sampling steps (default: 40, range: 5-150)
- **Sampling Method**: The sampling algorithm to use (e.g., "DPM++ 2M", "Euler a")
- **Schedule Type**: The sampling schedule to use (e.g., "Automatic", "Karras", "Uniform")
- **CFG Scale**: Classifier-free guidance scale (default: 7.0, range: 1-30)
- **Prompt Type**: Choose between "Keywords" or "Descriptive" prompt formatting
- **Resolutions**: Configure the pixel dimensions for Square, Portrait, and Landscape formats
![A screenshot of the Visualizer interface displaying configuration settings for Automatic1111 text-to-image generation. The panel features adjustable parameters such as the API URL, sampler steps, CFG scale, and sampling method. Additionally, it includes sections for selecting prompting types and setting default resolutions for square, portrait, and landscape image orientations.](/talemate/img/0.34.0/visual-agent-a1111-2.png)
!!! note "No Authentication"
AUTOMATIC1111 backend does not support authentication. If your AUTOMATIC1111 instance requires authentication, you'll need to either disable it or use SD.Next instead, which supports authentication.
!!! note "Model Selection"
AUTOMATIC1111 does not support model selection through the API. The backend will use whatever model is currently loaded in your AUTOMATIC1111 instance. You need to change models manually in the AUTOMATIC1111 WebUI interface.
## Usage
Once configured, the AUTOMATIC1111 backend will appear in the Visualizer agent status with a green indicator showing text-to-image capability is available.
![A dark mode interface element titled "Visualizer" accompanied by a green status dot. Below the title are two badges: a gray button labeled "Google" with a computer icon and a green button labeled "AUTOMATIC1111" with an image icon.](/talemate/img/0.34.0/visual-agent-a1111-3.png)
## Limitations
The AUTOMATIC1111 backend has several limitations compared to SD.Next:
- **No image editing**: Only supports text-to-image generation
- **No authentication**: Cannot connect to instances that require authentication
- **No model selection**: Uses whatever model is loaded in AUTOMATIC1111
- **No active development**: The AUTOMATIC1111 project is no longer actively maintained
## Sampler Settings
AUTOMATIC1111 provides control over the generation process:
- **Steps**: More steps generally produce higher quality images but take longer. Typical values range from 20-50 steps, with 40 being a good default.
- **Sampling Method**: Different samplers produce different results. Popular options include:
- **DPM++ 2M**: Fast and high quality (default)
- **Euler a**: Fast, good for quick iterations
- **DPM++ SDE**: Variant with different characteristics
- **Schedule Type**: Controls the noise schedule used during sampling. "Automatic" is typically the best choice.
- **CFG Scale**: Controls how closely the model follows your prompt. Lower values (1-7) allow more creative freedom, higher values (7-15) stick closer to the prompt.
## Prompt Formatting
AUTOMATIC1111 uses **Keywords** prompt formatting by default. This means prompts are formatted as keyword lists optimized for Stable Diffusion models. You can switch to **Descriptive** formatting if you prefer natural language descriptions, though Keywords typically work better with SD models.
## Automatic Setup with KoboldCpp
If you're using KoboldCpp with AUTOMATIC1111 support, Talemate can automatically detect and configure the AUTOMATIC1111 backend when "Automatic Setup" is enabled in the Visualizer settings. This will automatically set the API URL to match your KoboldCpp instance URL.

View File

@@ -0,0 +1,166 @@
# ComfyUI
## Prepare ComfyUI
This document assumes you have installed ComfyUI (either the portable or the desktop version).
Copy the .bat file you use to start ComfyUI and add the `--port` parameter.
```
--port 8188
```
You can put any port you want, but this example will use 8188.
!!! note "If you are using a remote ComfyUI instance"
If you are using a remote ComfyUI instance, you may want to add the `--listen` parameter as well.
```
--listen 0.0.0.0
```
You will then also need to obtain the IP address of the computer running ComfyUI and use it in the Talemate configuration. (instead of localhost)
Confirm ComfyUI is running in your browser by visiting http://localhost:8188 or `http://<ip-address>:8188` before proceeding to talemate.
## Talemate configuration
In the Visualizer agent settings, select ComfyUI as your backend for text-to-image generation, image editing, or both. You'll need to configure each backend separately if you want to use ComfyUI for different operations.
![The image displays the General settings tab of the Visualizer interface, featuring a sidebar with active indicators for ComfyUI and Google modules. Dropdown menus in the main panel show ComfyUI selected for text-to-image and image editing backends, with Google selected for image analysis. The interface also includes an image generation timeout slider set to 301 and an enabled Automatic Setup checkbox.](/talemate/img/0.34.0/visual-agent-comfyui-1.png)
### Text-to-Image Configuration
For text-to-image generation, configure the following settings:
- **API URL**: The URL where your ComfyUI instance is running (e.g., `http://localhost:8188`)
- **Workflow**: Select the workflow file to use for generation. Talemate includes several pre-configured workflows including `qwen_image.json` and `z_image_turbo.json`
- **Model**: Select the model to use from your ComfyUI models directory. If your workflow doesn't include a "Talemate Load Model" or "Talemate Load Checkpoint" node, this will be set to "- Workflow default -" and the model specified in the workflow file will be used.
- **Prompt Type**: Choose between "Keywords" or "Descriptive" prompt formatting
!!! tip "Choosing Prompt Type"
As a general rule: **SDXL models** typically work best with **Keywords** formatting, while most other models (including Qwen Image, Flux, etc.) work better with **Descriptive** formatting. If you're unsure, start with Descriptive and switch to Keywords if you're using an SDXL-based workflow.
- **Resolutions**: Configure the pixel dimensions for Square, Portrait, and Landscape formats
![A screenshot of the "Visualizer" application's dark-mode settings panel specifically for ComfyUI text-to-image generation. The interface features configuration fields for the API URL, a workflow dropdown set to "z_image_turbo.json," model selection, and a "Descriptive" prompting type. The lower section includes adjustable numeric inputs for defining pixel dimensions for Square, Portrait, and Landscape image resolutions.](/talemate/img/0.34.0/visual-agent-comfyui-2.png)
![This screenshot displays the dark-themed settings interface of an application named "Visualizer," specifically configured for ComfyUI text-to-image generation. The main panel features input fields for the API URL, workflow selection (set to default-sdxl), and model choice (juggernautXL), along with a prompting type setting. Below these options is a "Resolutions" section allowing users to define specific pixel dimensions for Square, Portrait, and Landscape image outputs.](/talemate/img/0.34.0/visual-agent-comfyui-3.png)
### Image Editing Configuration
For image editing, configure similar settings but select an image editing workflow such as `qwen_image_edit.json`. The number of reference images supported depends on your model - for example, Qwen Image Edit can handle up to 3 reference images that can be used to guide the editing process.
!!! note "Prompt Type for Image Editing"
Image editing workflows typically use **Descriptive** prompt formatting by default, as most image editing models (like Qwen Image Edit) work better with descriptive instructions rather than keyword-based prompts.
![A screenshot of the "Visualizer" application settings interface, specifically showing the configuration panel for "ComfyUI Image Editing." The main view displays input fields for the API URL, a selected workflow file named "qwen_image_edit.json," descriptive prompting settings, and resolution presets for square, portrait, and landscape aspect ratios.](/talemate/img/0.34.0/visual-agent-comfyui-4.png)
![This screenshot shows a browser tab group labeled "Visualizer" marked with a green status dot on a dark background. The group contains four tabs: a Google link, two green-tinted ComfyUI tabs with image and pencil icons, and a gray tab titled "References 3".](/talemate/img/0.34.0/visual-agent-comfyui-5.png)
## Custom workflow creation
Talemate comes with pre-configured workflows for Qwen Image models (`qwen_image.json` for text-to-image and `qwen_image_edit.json` for image editing). However, since there are many variables in ComfyUI setups (different model formats like GGUF vs safetensors, custom LoRAs, different hardware configurations, etc.), you may want to customize these workflows to match your specific setup.
### Starting from a Template
Open ComfyUI in your browser and navigate to the templates menu. ComfyUI includes workflow templates that you can use as a starting point:
- **Qwen Image**: For text-to-image generation
- **Qwen Image Edit**: For image editing workflows
These templates provide a good foundation for creating custom workflows.
![A dark-themed dropdown menu from a software interface is shown, featuring a header labeled "image_qwen_image." The menu lists standard options such as New, File, Edit, View, and Theme, followed by specific actions like Browse Templates, Settings, Manage Extensions, and Help.](/talemate/img/0.34.0/comfyui.workflow.setup.browse-templates.png)
![A product card for the "Qwen-Image Text to Image" AI model, displaying a sample generation of a rainy, neon-lit street scene with vibrant pink and blue signage. The image demonstrates the model's capabilities by clearly rendering complex multilingual text, such as Chinese characters and English words like "HAPPY HAIR," on the storefronts. Below the visual, a brief description highlights the tool's exceptional text rendering and editing features.](/talemate/img/0.34.0/comfyui.workflow.setup.qwen-template.png)
Load the Qwen Image template to see the base workflow structure.
![A screenshot of a ComfyUI workflow designed for the Qwen-Image diffusion model, featuring grouped nodes for model loading, image sizing, and text prompting. The interface includes detailed instructional notes regarding VRAM usage on an RTX 4090D, model storage locations, and optimal KSampler settings. A positive prompt node is visible containing a detailed description of a neon-lit Hong Kong street scene.](/talemate/img/0.34.0/comfyui.workflow.setup.qwen-start.png)
### Naming Nodes for Talemate
For Talemate to properly interact with your workflow, you need to rename specific nodes with exact titles. These titles allow Talemate to inject prompts, set resolutions, and handle reference images automatically.
**Required Node Titles:**
1. **Talemate Positive Prompt**: The node that encodes the positive prompt (typically a `CLIPTextEncode` or `TextEncodeQwenImageEditPlus` node). This is required - workflows without this node will fail validation.
2. **Talemate Negative Prompt**: The node that encodes the negative prompt (same node types as above)
3. **Talemate Resolution**: The node that sets the image dimensions (typically an `EmptySD3LatentImage` or similar latent image node)
**Optional Node Titles:**
- **Talemate Load Model** or **Talemate Load Checkpoint**: If you want to allow model selection from Talemate's settings, rename your model loader node (typically `CheckpointLoaderSimple`, `UNETLoader`, or `UnetLoaderGGUF`) to one of these titles. If this node is not present, Talemate will use the model specified in the workflow file itself, and the model dropdown will show "- Workflow default -" as the only option.
To rename a node, right-click on it and select "Rename" or double-click the node title, then enter the exact title name.
![A screenshot of a node-based interface labeled "Step 3 - Prompt," featuring a green "Talemate Positive Prompt" node containing a detailed text description of a vibrant, neon-lit Hong Kong street scene. The text specifies a 1980s cinematic atmosphere and lists numerous specific shop signs in both Chinese and English. Below it, a dark red "Talemate Negative Prompt" node is visible but currently contains no text.](/talemate/img/0.34.0/comfyui.workflow.setup.talemate-prompts.png)
![This image displays a dark green interface node labeled "Talemate Positive Prompt," typical of a node-based editor like ComfyUI. It features a yellow input connection point for "clip" on the left, an orange output point for "CONDITIONING" on the right, and a large, dark text entry field in the center containing the placeholder word "text".](/talemate/img/0.34.0/comfyui.workflow.setup.talemate-empty-prompt.png)
![A screenshot of a dark gray interface node labeled "Talemate Resolution" with the identifier #58. It features configurable fields for width and height, both set to 1328, and a batch size of 1. The node has a single output connection point labeled "LATENT".](/talemate/img/0.34.0/comfyui.workflow.setup.talemate-resulotion.png)
### Activating the Lightning LoRA (Optional)
The Qwen Image template includes a Lightning LoRA node that is deactivated by default. You can optionally activate it to speed up generation with fewer steps. Note that this is a trade-off: the Lightning LoRA reduces generation time but may degrade image quality compared to using more steps without the LoRA.
To activate the Lightning LoRA:
1. Find the `LoraLoaderModelOnly` node in your workflow (it should already be present in the Qwen template)
2. Connect it between your model loader and sampler if it's not already connected
3. Load the appropriate Lightning LoRA file (e.g., `Qwen-Image-Lightning-8steps-V1.0.safetensors` for 8-step generation)
4. Adjust your sampler settings:
- **Steps**: Reduce to 8 steps (or 4 steps for the 4-step variant)
- **CFG Scale**: Set to 1.0 (lower than typical values)
![This screenshot features a "LoraLoaderModelOnly" node within a ComfyUI workflow, customized with the label "Lightx2v 8steps LoRA". It shows the selection of a "Qwen-Image-Lightning-8steps" LoRA file with a model strength parameter set to 1.00. Purple connection cables are visible attached to the input and output model terminals.](/talemate/img/0.34.0/comfyui.workflow.setup.lighting-lora.png)
![The image shows a close-up of a dark user interface panel containing two adjustable setting fields. The top field is labeled "steps" and displays a value of 8, flanked by left and right directional arrows. Below that, a second field labeled "cfg" shows a value of 1.0, also with adjustment arrows on either side.](/talemate/img/0.34.0/comfyui.workflow.setup.lighting-lora-sampler-changes.png)
### Image Editing Workflows: Reference Nodes
For image editing workflows (like `qwen_image_edit.json`), you need to add reference image nodes. Note that ComfyUI includes a Qwen Image Edit template similar to the Qwen Image template, which you can use as a starting point.
!!! warning "Reference Nodes Required"
Image editing workflows **must** define at least one reference node. If your workflow doesn't include any nodes titled "Talemate Reference 1" (or higher), the backend status will show an error and image editing will not work.
These are `LoadImage` nodes that Talemate will use to inject reference images for editing.
The number of reference nodes you can add depends on your model's capabilities. For example, Qwen Image Edit supports up to 3 reference images. Add `LoadImage` nodes and rename them with these exact titles:
- **Talemate Reference 1**
- **Talemate Reference 2**
- **Talemate Reference 3** (if your model supports it)
These nodes should be connected to your prompt encoding nodes (for Qwen Image Edit, use `TextEncodeQwenImageEditPlus` nodes that accept image inputs).
![Three identical interface nodes labeled "Talemate Reference 1," "2," and "3" are arranged horizontally within a dark-themed node-based editor. Each node features output ports for "IMAGE" and "MASK," along with a file selection field showing "image_qwen_image_edit" and a "choose file to upload" button. Blue and red connection wires link these nodes to other off-screen elements in the workflow.](/talemate/img/0.34.0/comfyui.workflow.setup.talemate-references.png)
### Saving and Exporting the Workflow
Once your workflow is configured, you need to save it and export it in the API format for Talemate to use it.
1. **Save the workflow**: Use File → Save As to save your workflow as a `.json` file in your ComfyUI workflows directory
2. **Export for API**: Use File → Export (API) to create the API-compatible version
!!! warning "Export vs Export (API)"
It's critical to use **"Export (API)"** and not just "Export". The regular export format is not compatible with Talemate's API integration. The API export format includes the necessary metadata and structure that Talemate expects.
![A screenshot of a dark-themed software interface menu with the "File" option selected, revealing a nested sub-menu. The sub-menu lists file management commands, with the "Save As" option highlighted among choices like Open, Save, and Export.](/talemate/img/0.34.0/comfyui.workflow.setup.qwen-save.png)
![This image displays a dark-themed user interface menu, likely from ComfyUI, with the "File" category expanded. A submenu lists options including Open, Save, and Save As, while the "Export (API)" option is currently highlighted at the bottom. This visual illustrates how to locate the API export function within the software's file management system.](/talemate/img/0.34.0/comfyui.workflow.setup.qwen-export.png)
After exporting, place the workflow JSON file in Talemate's `templates/comfyui-workflows` directory. Once placed there, it will automatically appear in the workflow dropdown in Talemate's ComfyUI settings.
!!! note "Workflow File Location"
Workflow files must be placed in Talemate's `templates/comfyui-workflows` directory, not ComfyUI's workflows directory. Talemate loads workflows from its own templates directory to ensure compatibility and proper integration.
!!! tip "Workflow Not Appearing?"
If your workflow file doesn't appear in the agent's settings dropdown after placing it in the correct directory, try reloading the Talemate browser window. The workflow list is refreshed when the page loads.
!!! info "Hot-Reloading Workflows"
Changes to workflow files are automatically detected and reloaded by the agent. After modifying a workflow file, your changes will be applied to the next image generation without needing to restart Talemate or reload the browser window.

View File

@@ -0,0 +1,101 @@
# Google
The Google backend provides image generation, editing, and analysis capabilities using Google's Gemini image models. It supports text-to-image generation, image editing with reference images, and AI-powered image analysis.
![A screenshot of the "Visualizer" application settings interface with the "General" tab selected. It shows configuration dropdowns for Client and various Backends (text to image, image editing, image analysis) all set to "Google," alongside an image generation timeout slider positioned at 301. Additional settings include a checked "Automatic Setup" box, an unchecked "Automatic Generation" box, and a "Fallback Prompt Type" menu set to "Keywords."](/talemate/img/0.34.0/visual-agent-google-4.png)
## Prerequisites
Before configuring the Google backend, you need to obtain a Google API key:
1. Go to [Google AI Studio](https://aistudio.google.com/app/apikey)
2. Sign in with your Google account
3. Create a new API key or use an existing one
4. Copy the API key
Then configure it in Talemate:
1. Open Talemate Settings → Application → Google
2. Paste your Google API key in the "Google API Key" field
3. Save your changes
!!! note "API Key vs Vertex AI Credentials"
The Visualizer agent uses the Google API key (not Vertex AI service account credentials). Make sure you're using the API key from Google AI Studio, not the service account JSON file used for Vertex AI.
## Configuration
In the Visualizer agent settings, select Google as your backend for text-to-image generation, image editing, image analysis, or any combination of these. Each operation can be configured separately.
### Text-to-Image Configuration
For text-to-image generation, configure the following settings:
- **Google API Key**: Your Google API key (configured globally in Talemate Settings)
- **Model**: Select the image generation model to use:
- **gemini-2.5-flash-image**: Faster generation, good quality
- **gemini-3-pro-image-preview**: Higher quality, slower generation
![A dark-themed settings interface for a "Visualizer" application, specifically showing the "Google Text to Image" configuration panel. The main view displays a masked input field for a configured Google API Key and a dropdown menu selecting the "gemini-3-pro-image-preview" model.](/talemate/img/0.34.0/visual-agent-google-5.png)
The Google backend automatically handles aspect ratios based on the format you select:
- **Landscape**: 16:9 aspect ratio
- **Portrait**: 9:16 aspect ratio
- **Square**: 1:1 aspect ratio
### Image Editing Configuration
For image editing, configure similar settings but with an additional option:
- **Google API Key**: Your Google API key
- **Model**: Select the image generation model (same options as text-to-image)
- **Max References**: Configure the maximum number of reference images (1-3). This determines how many reference images you can provide when editing an image.
![A dark-themed configuration interface for the "Visualizer" application displaying settings for the "Google Image Editing" tab. The panel features a configured Google API key section and a dropdown menu selecting the "gemini-3-pro-image-preview" model. A slider control at the bottom sets the "Max References" value to 3.](/talemate/img/0.34.0/visual-agent-google-6.png)
!!! note "Reference Images"
Google's image editing models can use up to 3 reference images to guide the editing process. The "Max References" setting controls how many reference images Talemate will send to the API. You can adjust this based on your needs, but keep in mind that more references may provide better context for complex edits.
### Image Analysis Configuration
For image analysis, configure the following:
- **Google API Key**: Your Google API key
- **Model**: Select a vision-capable text model:
- **gemini-2.5-flash**: Fast analysis, good for general use
- **gemini-2.5-pro**: Higher quality analysis
- **gemini-3-pro-preview**: Latest model with improved capabilities
!!! note "Analysis Models"
Image analysis uses text models that support vision capabilities, not the image generation models. These models can analyze images and provide detailed descriptions, answer questions about image content, and extract information from visual content.
## Usage
Once configured, the Google backend will appear in the Visualizer agent status with green indicators showing which capabilities are available.
![A dark-themed user interface panel titled "Visualizer" marked with a green status indicator. Below the title are several clickable buttons, including a "References 3" button and four "Google" buttons distinguished by icons representing screen, image, edit, and search functions.](/talemate/img/0.34.0/visual-agent-google-8.png)
The status indicators show:
- **Text to Image**: Available when text-to-image backend is configured
- **Image Edit**: Available when image editing backend is configured (shows max references if configured)
- **Image Analysis**: Available when image analysis backend is configured
## Model Recommendations
### Text-to-Image and Image Editing
- **gemini-2.5-flash-image**: Best for faster generation and general use. Good balance of speed and quality.
- **gemini-3-pro-image-preview**: Best for higher quality results when speed is less important. Use when you need the best possible image quality.
### Image Analysis
- **gemini-2.5-flash**: Best for quick analysis and general use cases. Fast responses with good accuracy.
- **gemini-2.5-pro**: Best for detailed analysis requiring higher accuracy and more nuanced understanding.
- **gemini-3-pro-preview**: Best for the latest capabilities and most advanced analysis features.
## Prompt Formatting
The Google backend uses **Descriptive** prompt formatting by default. This means prompts are formatted as natural language descriptions rather than keyword lists. This works well with Google's Gemini models, which are designed to understand natural language instructions.
When generating images, provide detailed descriptions of what you want to create. For image editing, describe the changes you want to make in natural language.

View File

@@ -0,0 +1,121 @@
# OpenAI
The OpenAI backend provides image generation, editing, and analysis capabilities using OpenAI's image models. It supports text-to-image generation with DALL·E 3 and GPT-Image models, image editing with GPT-Image models, and AI-powered image analysis using vision-capable GPT models.
![The image displays the "General" settings tab of the "Visualizer" interface, featuring a dark-themed layout with a sidebar menu on the left. The main panel includes dropdown menus where "Google" is selected as the client and "OpenAI" is chosen for text-to-image, image editing, and image analysis backends. Additional controls show an image generation timeout slider set to 301, checkboxes for automatic setup and generation, and a selector for the fallback prompt type.](/talemate/img/0.34.0/visual-agent-openai-1.png)
## Prerequisites
Before configuring the OpenAI backend, you need to obtain an OpenAI API key:
1. Go to [OpenAI Platform](https://platform.openai.com/api-keys)
2. Sign in with your OpenAI account
3. Create a new API key or use an existing one
4. Copy the API key
Then configure it in Talemate:
1. Open Talemate Settings → Application → OpenAI API
2. Paste your OpenAI API key in the "OpenAI API Key" field
3. Save your changes
For additional instructions, see the [OpenAI API setup guide](/talemate/user-guide/apis/openai/).
## Configuration
In the Visualizer agent settings, select OpenAI as your backend for text-to-image generation, image editing, image analysis, or any combination of these. Each operation can be configured separately.
### Text-to-Image Configuration
For text-to-image generation, configure the following settings:
- **OpenAI API Key**: Your OpenAI API key (configured globally in Talemate Settings)
- **Model**: Select the image generation model to use:
- **dall-e-3**: OpenAI's DALL·E 3 model (widely available)
- **gpt-image-1**: OpenAI's GPT-Image model (may require organization verification)
- **gpt-image-1-mini**: Smaller version of GPT-Image (may require organization verification)
![A screenshot of the "Visualizer" application settings interface with the "OpenAI Text to Image" tab selected on the left sidebar. The main panel displays a masked input field for a configured OpenAI API key and a dropdown menu set to the "dall-e-3" model.](/talemate/img/0.34.0/visual-agent-openai-2.png)
!!! warning "Organization Verification"
The **gpt-image-1** and **gpt-image-1-mini** models may require your OpenAI organization to be verified before you can use them. If you encounter errors with these models, you may need to complete OpenAI's organization verification process.
!!! note "Model Testing Status"
Talemate's organization is not verified with OpenAI, and we have not tested the **gpt-image-1** and **gpt-image-1-mini** models. We have confirmed that **dall-e-3** works correctly. If you have access to the GPT-Image models and encounter issues, please report them so we can improve support for these models.
The OpenAI backend automatically sets resolution based on the format and model you select:
- **gpt-image-1** and **gpt-image-1-mini**:
- Landscape: 1536x1024
- Portrait: 1024x1536
- Square: 1024x1024
- **dall-e-3**:
- Landscape: 1792x1024
- Portrait: 1024x1792
- Square: 1024x1024
### Image Editing Configuration
For image editing, configure similar settings but note that DALL·E 3 does not support image editing:
- **OpenAI API Key**: Your OpenAI API key
- **Model**: Select an image editing model:
- **gpt-image-1**: Full-featured image editing model (may require organization verification)
- **gpt-image-1-mini**: Smaller image editing model (may require organization verification)
![This screenshot displays the settings interface for an application called "Visualizer," specifically showing the "OpenAI Image Editing" configuration panel. The right side features a dropdown menu for selecting the model "gpt-image-1" beneath a configured API key section. An orange notification box at the bottom alerts the user that this specific model may require OpenAI organization verification.](/talemate/img/0.34.0/visual-agent-openai-3.png)
!!! warning "DALL·E 3 Limitations"
DALL·E 3 does not support image editing. If you select DALL·E 3 for image editing, you will receive an error. Use **gpt-image-1** or **gpt-image-1-mini** for image editing instead.
!!! note "Reference Images"
OpenAI's image editing models support a single reference image. When editing an image, provide one reference image that will be used as the base for the edit.
### Image Analysis Configuration
For image analysis, configure the following:
- **OpenAI API Key**: Your OpenAI API key
- **Model**: Select a vision-capable text model:
- **gpt-4.1-mini**: Fast analysis model with vision capabilities
- **gpt-4o-mini**: Alternative vision model option
![This image shows the settings interface for an application named Visualizer, with the "OpenAI Image Analysis" tab selected on the left sidebar. The main panel allows users to configure the OpenAI vision API, displaying a confirmed API key status. A dropdown menu below specifically indicates that the "gpt-4.1-mini" model is selected.](/talemate/img/0.34.0/visual-agent-openai-4.png)
!!! note "Analysis Models"
Image analysis uses text models that support vision capabilities, not the image generation models. These models can analyze images and provide detailed descriptions, answer questions about image content, and extract information from visual content.
## Usage
Once configured, the OpenAI backend will appear in the Visualizer agent status with green indicators showing which capabilities are available.
![This image captures a dark-mode user interface section titled "Visualizer," marked by an active green status dot. Below the title, there are several pill-shaped tags or buttons representing data sources, including "Google," "References 1," and three distinct "OpenAI" options. The OpenAI buttons are highlighted in green, distinguishing them from the greyed-out Google and References buttons.](/talemate/img/0.34.0/visual-agent-openai-5.png)
The status indicators show:
- **Text to Image**: Available when text-to-image backend is configured
- **Image Edit**: Available when image editing backend is configured (shows "References 1" indicating single reference support)
- **Image Analysis**: Available when image analysis backend is configured
## Model Recommendations
### Text-to-Image
- **dall-e-3**: Most widely available option. Good for general use, though quality may vary.
- **gpt-image-1**: Higher quality option, but requires organization verification. Use if you have access and need better results.
- **gpt-image-1-mini**: Smaller version of GPT-Image, faster generation. Requires organization verification.
### Image Editing
- **gpt-image-1**: Best quality for image editing. Requires organization verification.
- **gpt-image-1-mini**: Faster editing option. Requires organization verification.
### Image Analysis
- **gpt-4.1-mini**: Recommended default for image analysis. Fast and accurate.
- **gpt-4o-mini**: Alternative option if you prefer this model.
## Prompt Formatting
The OpenAI backend uses **Descriptive** prompt formatting by default. This means prompts are formatted as natural language descriptions rather than keyword lists. Provide detailed, natural language descriptions of what you want to create or edit.

View File

@@ -0,0 +1,119 @@
# OpenRouter
The OpenRouter backend provides access to image generation, editing, and analysis capabilities through OpenRouter's unified API. OpenRouter allows you to access multiple AI providers through a single API, giving you flexibility to choose from various models and providers.
![A dark-themed settings interface for the "Visualizer" application, displaying a sidebar with General, OpenRouter, and Styles navigation options. The main panel allows configuration of backend services, showing "OpenRouter" selected for text-to-image, image editing, and image analysis, with "Google" set as the client. Additional controls include a slider for image generation timeout set to 301 and checkboxes for automatic setup and generation.](/talemate/img/0.34.0/visual-agent-openrouter-1.png)
## Prerequisites
Before configuring the OpenRouter backend, you need to obtain an OpenRouter API key:
1. Go to [OpenRouter Keys](https://openrouter.ai/settings/keys)
2. Sign in with your account
3. Create a new API key or use an existing one
4. Copy the API key
Then configure it in Talemate:
1. Open Talemate Settings → Application → OpenRouter API
2. Paste your OpenRouter API key in the "OpenRouter API Key" field
3. Save your changes
For additional instructions, see the [OpenRouter API setup guide](/talemate/user-guide/apis/openrouter/).
## Configuration
In the Visualizer agent settings, select OpenRouter as your backend for text-to-image generation, image editing, image analysis, or any combination of these. Each operation can be configured separately.
### Text-to-Image Configuration
For text-to-image generation, configure the following settings:
- **OpenRouter API Key**: Your OpenRouter API key (configured globally in Talemate Settings)
- **Model**: Select an image generation model from OpenRouter. The model list is dynamically populated based on models available through your OpenRouter account.
- **Only use these providers**: Optionally filter to specific providers (e.g., only use Google or OpenAI)
- **Ignore these providers**: Optionally exclude specific providers from consideration
![This screenshot depicts the "Visualizer" settings interface, specifically the "OpenRouter Text to Image" configuration tab. The panel displays an active API Key section, a model selection dropdown currently set to "google/gemini-2.5-flash-image", and additional options to filter specific service providers.](/talemate/img/0.34.0/visual-agent-openrouter-2.png)
!!! warning "Model Selection"
There is no reliable way for Talemate to determine which models support text-to-image generation, so the model list is unfiltered. Please consult the [OpenRouter documentation](https://openrouter.ai/docs) to verify that your selected model supports image generation before using it.
The OpenRouter backend automatically handles aspect ratios based on the format you select:
- **Landscape**: 16:9 aspect ratio
- **Portrait**: 9:16 aspect ratio
- **Square**: 1:1 aspect ratio
### Image Editing Configuration
For image editing, configure similar settings with an additional option:
- **OpenRouter API Key**: Your OpenRouter API key
- **Model**: Select an image editing model from OpenRouter
- **Max References**: Configure the maximum number of reference images (1-3). This determines how many reference images you can provide when editing an image.
- **Provider filtering**: Optionally filter providers (same as text-to-image)
![This screenshot displays the settings interface for an application named Visualizer, specifically focusing on the "OpenRouter - Image Editing" configuration tab. The main panel features input fields for an OpenRouter API key, a model selection dropdown set to "google/gemini-2.5-flash-image," and provider filtering options. Additionally, a slider at the bottom allows users to adjust the "Max References," which is currently set to 1.](/talemate/img/0.34.0/visual-agent-openrouter-3.png)
!!! warning "Model Selection"
There is no reliable way for Talemate to determine which models support image editing, so the model list is unfiltered. Image editing refers to image generation with support for 1 or more contextual reference images. Please consult the [OpenRouter documentation](https://openrouter.ai/docs) to verify that your selected model supports image editing before using it.
### Image Analysis Configuration
For image analysis, configure the following:
- **OpenRouter API Key**: Your OpenRouter API key
- **Model**: Select a vision-capable text model from OpenRouter
- **Provider filtering**: Optionally filter providers
![A screenshot of the "Visualizer" application interface showing the "OpenRouter Image Analysis" settings panel. The configuration area displays a model selection dropdown set to "google/gemini-2.5-flash" alongside a configured API key field. An informational box notes that the model list is unfiltered and users should verify that their chosen text generation model supports multi-modal vision capabilities.](/talemate/img/0.34.0/visual-agent-openrouter-4.png)
!!! warning "Model Selection"
There is no reliable way for Talemate to determine which models support image analysis, so the model list is unfiltered. Image analysis requires a text generation model that is multi-modal and supports vision capabilities. Please consult the [OpenRouter documentation](https://openrouter.ai/docs) to verify that your selected model supports vision before using it.
## Usage
Once configured, the OpenRouter backend will appear in the Visualizer agent status with green indicators showing which capabilities are available.
![A dark-mode user interface panel labeled "Visualizer" features a green status indicator dot next to the title. Below the header are several pill-shaped tags, including grey buttons for "Google" and "References 1" alongside three green "OpenRouter" buttons with various icons. This layout likely represents a configuration of active tools or API connections within a software application.](/talemate/img/0.34.0/visual-agent-openrouter-5.png)
The status indicators show:
- **Text to Image**: Available when text-to-image backend is configured
- **Image Edit**: Available when image editing backend is configured (shows max references if configured)
- **Image Analysis**: Available when image analysis backend is configured
## Model Recommendations
OpenRouter provides access to many models from different providers. Here are some general recommendations:
### Text-to-Image and Image Editing
- **google/gemini-2.5-flash-image**: Fast image generation with good quality
- **google/gemini-3-pro-image-preview**: Higher quality option (if available)
### Image Analysis
- **google/gemini-2.5-flash**: Fast analysis with good accuracy
- **google/gemini-2.5-pro**: Higher quality analysis
- **google/gemini-3-pro-preview**: Latest capabilities (if available)
## Provider Filtering
OpenRouter allows you to filter which providers are used for a specific model. This can be useful if:
- You want to use a specific provider for cost or quality reasons
- You want to avoid certain providers
- You want to test different providers for the same model
You can configure provider filtering in each backend's settings:
- **Only use these providers**: Limits requests to only the selected providers
- **Ignore these providers**: Excludes the selected providers from consideration
If both are configured, "Only use these providers" takes precedence.
## Prompt Formatting
The OpenRouter backend uses **Descriptive** prompt formatting by default. This means prompts are formatted as natural language descriptions rather than keyword lists. Provide detailed, natural language descriptions of what you want to create or edit.

View File

@@ -0,0 +1,104 @@
# SD.Next
The SD.Next backend provides image generation and editing capabilities using Stable Diffusion Next (SD.Next), a fork of AUTOMATIC1111's Stable Diffusion WebUI. SD.Next offers improved performance and additional features while maintaining compatibility with the AUTOMATIC1111 API.
![This screenshot displays the "General" settings menu of the "Visualizer" interface, featuring a dark theme with purple accents. Configuration options show "Google" selected as the client, with "SD.Next" set as the backend for both text-to-image and image editing tasks. The panel also includes an image generation timeout slider set to 301, a checked "Automatic Setup" box, and a "Fallback Prompt Type" dropdown set to Keywords.](/talemate/img/0.34.0/visual-agent-sdnext-1.png)
## Prerequisites
Before configuring the SD.Next backend, you need to have SD.Next installed and running. SD.Next can be run locally or accessed remotely via its API.
1. Install and start SD.Next on your system
2. Ensure the API is enabled and accessible
3. Note the API URL (default is `http://localhost:7860`)
## Configuration
In the Visualizer agent settings, select SD.Next as your backend for text-to-image generation, image editing, or both. You'll need to configure each backend separately if you want to use SD.Next for different operations.
### Text-to-Image Configuration
For text-to-image generation, configure the following settings:
- **API URL**: The URL where your SD.Next instance is running (e.g., `http://localhost:7860`)
- **Authentication Method**: Choose the authentication method:
- **None**: No authentication required
- **Basic (username/password)**: Use username and password authentication
- **Bearer (API Key)**: Use API key authentication
!!! note "ArliAI SD.Next Endpoints"
If you're connecting to ArliAI's SD.Next endpoints, you should use **Bearer (API Key)** authentication method. Configure your API key in the authentication settings.
- **Username/Password** (if using Basic auth): Your SD.Next credentials
- **API Key** (if using Bearer auth): Your API key for SD.Next
- **Steps**: Number of sampling steps (default: 40, range: 5-150)
- **Sampling Method**: The sampling algorithm to use (dynamically populated from your SD.Next instance)
- **CFG Scale**: Classifier-free guidance scale (default: 7.0, range: 1-30)
- **Model**: Select the model to use from your SD.Next models directory (dynamically populated)
- **Prompt Type**: Choose between "Keywords" or "Descriptive" prompt formatting
- **Resolutions**: Configure the pixel dimensions for Square, Portrait, and Landscape formats
![This screenshot displays the "Visualizer" interface for SD.Next text-to-image generation, featuring configuration settings for the API URL, 40 sampling steps, DPM++ 2M method, and a CFG scale of 7.0. The panel also shows the selected model as "juggernautXL_juggXlByRundiffusion" along with adjustable resolution presets for square, portrait, and landscape formats.](/talemate/img/0.34.0/visual-agent-sdnext-2.png)
![This screenshot displays a dark-themed user interface form with a dropdown menu labeled "Authentication Method" selected to "Basic (username/password)". Below the menu are two text input fields for entering a "Username" and "Password" under a section header labeled "AUTHENTICATION (OPTIONAL, SERVER DEPENDENT)".](/talemate/img/0.34.0/visual-agent-sdnext-3.png)
![This image displays a dark-themed user interface configuration panel for setting up authentication parameters. It features a dropdown menu where "Bearer (API Key)" is selected as the Authentication Method, positioned above a text input field labeled "API Key".](/talemate/img/0.34.0/visual-agent-sdnext-4.png)
### Image Editing Configuration
For image editing, configure similar settings. SD.Next supports image editing through its img2img API, which uses a single reference image.
![User interface for the "Visualizer" software displaying the "Image editing configuration for SD.Next" panel with a dark theme. It features adjustable sliders for Steps (set to 40) and CFG Scale (set to 7.0), alongside dropdown menus for the Sampling Method and Model selection. The bottom section includes input fields for defining specific pixel dimensions for square, portrait, and landscape image resolutions.](/talemate/img/0.34.0/visual-agent-sdnext-5.png)
!!! note "Reference Images"
SD.Next image editing supports a single reference image. When editing an image, provide one reference image that will be used as the base for the edit.
## Usage
Once configured, the SD.Next backend will appear in the Visualizer agent status with green indicators showing which capabilities are available.
![A dark mode user interface section titled "Visualizer," indicated by a green status dot. Below the header is a row of four buttons: "Google," two distinct "SD.Next" buttons with image and pencil icons respectively, and a "References 1" button.](/talemate/img/0.34.0/visual-agent-sdnext-6.png)
The status indicators show:
- **Text to Image**: Available when text-to-image backend is configured
- **Image Edit**: Available when image editing backend is configured (shows "References 1" indicating single reference support)
## Model and Sampler Selection
SD.Next dynamically fetches the list of available models and samplers from your instance when you configure the backend. This means:
- **Models**: The model dropdown is automatically populated with models available in your SD.Next installation
- **Samplers**: The sampling method dropdown is automatically populated with samplers available in your SD.Next instance
If you change the API URL or authentication settings, Talemate will automatically refresh the model and sampler lists from the new instance.
!!! tip "Model Selection"
If you don't select a specific model, SD.Next will use its default model. You can select "- Default Model -" from the dropdown to explicitly use the default, or leave the field empty.
## Sampler Settings
SD.Next provides extensive control over the generation process:
- **Steps**: More steps generally produce higher quality images but take longer. Typical values range from 20-50 steps, with 40 being a good default.
- **Sampling Method**: Different samplers produce different results. Popular options include:
- **DPM++ 2M**: Fast and high quality (default)
- **Euler a**: Fast, good for quick iterations
- **DPM++ 2M Karras**: Variant with different characteristics
- **CFG Scale**: Controls how closely the model follows your prompt. Lower values (1-7) allow more creative freedom, higher values (7-15) stick closer to the prompt.
## Prompt Formatting
SD.Next uses **Keywords** prompt formatting by default. This means prompts are formatted as keyword lists optimized for Stable Diffusion models. You can switch to **Descriptive** formatting if you prefer natural language descriptions, though Keywords typically work better with SD models.
## Remote Access
If you're running SD.Next on a remote server:
1. Configure SD.Next to listen on the appropriate network interface
2. Use the server's IP address or hostname in the API URL (e.g., `http://192.168.1.100:7860`)
3. Configure appropriate authentication if your SD.Next instance requires it
4. Ensure your firewall allows connections to the SD.Next port
!!! warning "Security Considerations"
If exposing SD.Next over a network, always use authentication. Unauthenticated SD.Next instances can be accessed by anyone on your network, which may pose security risks.

Some files were not shown because too many files have changed in this diff Show More