Compare commits

..

11 Commits

Author SHA1 Message Date
veguAI
ddfbd6891b 0.25.5 (#121)
* oepnai compat client to /completions instead of chat/completions
openai compat client pass frequency penalty

* 0.25.5

* fix version

* remove debug message

* fix openai compat client not saving coercion settings

* openai compatible client: API handles prompt template switches over to chat/completions api

* wording

* mistral std template

* fix error when setting llm prompt template if model name contained /

* lock sentence transformers to 2.2.2 since >=2.3.0 breaks instructor model loading

* support png tEXt

* openai compat client: fix repetition_penality KeyError issue

* presence_penalty is not equal to repetition_penalty and needs its own dedicated definition

* round presence penalty randomization to one decimal place

* fix filename

* same fixes for presence_penalty ported to koboldcpp client

* kcpp client: remove a1111 setup spam
kcpp client: fixes to presence_penalty jiggle

* mistral.ai: default model 8x22b
mistral.ai: 7b and 8x7b taken out of JSON_OBJECT_RESPONSE_MODELS
2024-05-24 18:17:55 +03:00
veguAI
143dd47e02 0.25.4 (#118)
* dont run npm install during container build

* fix const var issue when ALLOWED_HOSTS is anything but `all`

* ensure docker env sets NODE_ENV to development for now

* 0.25.4

* dont mount frontend volume by default
2024-05-18 16:22:57 +03:00
veguAI
cc7cb773d1 Update README.md 2024-05-18 12:31:32 +03:00
veguAI
02c88f75a1 0.25.3 (#113)
* add gpt-4o

add gpt-4o-2024-05-13

* fix koboldcpp client jiggle arguments

* kcpp api url default port 5001

* fix repetition breaking issues with kcpp client

* use tokencount endpoint if available

* auto configure visual agent with koboldcpp

* env var config for frontend serve

* its not clear that gpt-4o is better than turbo, dont default to it yet

* 0.25.3

* handle kcpp being down during a1111 setup check

* only check a1111 setup if client is connected

* fix kcpp a1111 setup check

* fix issue where saving a new scene could cause recent config changes to revert
2024-05-15 00:31:36 +03:00
veguAI
419371e0fb Update README.md 2024-05-14 15:36:33 +03:00
veguAI
6e847bf283 Update README.md 2024-05-14 15:29:37 +03:00
veguAI
ceedd3019f Update README.md 2024-05-14 15:29:02 +03:00
veguAI
a28cf2a029 0.25.2 (#108)
* fix typo

* fix openai compat config save issue maybe

* fix api_handles_prompt_template no longer saving changes after last fix

* koboldcpp client

* default to kobold ai api

* linting

* conversation cleanup tweak

* 0.25.2

* allowed hosts to all on dev instance

* ensure numbers on parameters when sending edited values

* fix prompt parameter issues

* remove debug message
2024-05-10 21:29:29 +03:00
henk717
60cb271e30 List KoboldCpp as compatible (#104)
KoboldCpp is a great fit for TaleMate, it supports fast local generations across a variety of machines including the cloud and is compatible with both text and image gen trough the OpenAI API, and A1111 API.
2024-05-10 00:22:57 +03:00
veguAI
1874234d2c Prep 0.25.1 (#103)
* remove auto client disable

* 0.25.1
2024-05-05 23:23:30 +03:00
veguAI
ef99539e69 Update README.md 2024-05-05 22:30:24 +03:00
31 changed files with 1223 additions and 566 deletions

View File

@@ -1,13 +1,19 @@
# Use an official node runtime as a parent image
FROM node:20
# Make sure we are in a development environment (this isn't a production ready Dockerfile)
ENV NODE_ENV=development
# Echo that this isn't a production ready Dockerfile
RUN echo "This Dockerfile is not production ready. It is intended for development purposes only."
# Set the working directory in the container
WORKDIR /app
# Copy the frontend directory contents into the container at /app
COPY ./talemate_frontend /app
# Install any needed packages specified in package.json
# Install all dependencies
RUN npm install
# Make port 8080 available to the world outside this container

View File

@@ -16,6 +16,7 @@ Supported APIs:
- [Google Gemini](https://console.cloud.google.com/)
Supported self-hosted APIs:
- [KoboldCpp](https://koboldai.org/cpp) ([Local](https://koboldai.org/cpp), [Runpod](https://koboldai.org/runpodcpp), [VastAI](https://koboldai.org/vastcpp), also includes image gen support)
- [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) (local or with runpod support)
- [LMStudio](https://lmstudio.ai/)
@@ -56,6 +57,7 @@ Please read the documents in the `docs` folder for more advanced configuration a
- [Ready to go](#ready-to-go)
- [Load the introductory scenario "Infinity Quest"](#load-the-introductory-scenario-infinity-quest)
- [Loading character cards](#loading-character-cards)
- [Configure for hosting](#configure-for-hosting)
- [Text-to-Speech (TTS)](docs/tts.md)
- [Visual Generation](docs/visual.md)
- [ChromaDB (long term memory) configuration](docs/chromadb.md)
@@ -93,16 +95,19 @@ There is also a [troubleshooting guide](docs/troubleshoot.md) that might help.
### Docker
:warning: Some users currently experience issues with missing dependencies inside the docker container, issue tracked at [#114](https://github.com/vegu-ai/talemate/issues/114)
1. `git clone https://github.com/vegu-ai/talemate.git`
1. `cd talemate`
1. `docker-compose up`
1. `cp config.example.yaml config.yaml`
1. `docker compose up`
1. Navigate your browser to http://localhost:8080
:warning: When connecting local APIs running on the hostmachine (e.g. text-generation-webui), you need to use `host.docker.internal` as the hostname.
#### To shut down the Docker container
Just closing the terminal window will not stop the Docker container. You need to run `docker-compose down` to stop the container.
Just closing the terminal window will not stop the Docker container. You need to run `docker compose down` to stop the container.
#### How to install Docker
@@ -168,19 +173,9 @@ In the case for `bartowski_Nous-Hermes-2-Mistral-7B-DPO-exl2_8_0` that is `ChatM
### Recommended Models
As of 2024.03.07 my personal regular drivers (the ones i test with) are:
Any of the top models in any of the size classes here should work well (i wouldn't recommend going lower than 7B):
- Kunoichi-7B
- sparsetral-16x7B
- Nous-Hermes-2-Mistral-7B-DPO
- brucethemoose_Yi-34B-200K-RPMerge
- dolphin-2.7-mixtral-8x7b
- rAIfle_Verdict-8x7B
- Mixtral-8x7B-instruct
That said, any of the top models in any of the size classes here should work well (i wouldn't recommend going lower than 7B):
https://www.reddit.com/r/LocalLLaMA/comments/18yp9u4/llm_comparisontest_api_edition_gpt4_vs_gemini_vs/
[https://oobabooga.github.io/benchmark.html](https://oobabooga.github.io/benchmark.html)
## DeepInfra via OpenAI Compatible client
@@ -253,3 +248,17 @@ Expand the "Load" menu in the top left corner and either click on "Upload a char
Once a character is uploaded, talemate may actually take a moment because it needs to convert it to a talemate format and will also run additional LLM prompts to generate character attributes and world state.
Make sure you save the scene after the character is loaded as it can then be loaded as normal talemate scenario in the future.
## Configure for hosting
By default talemate is configured to run locally. If you want to host it behind a reverse proxy or on a server, you will need create some environment variables in the `talemate_frontend/.env.development.local` file
Start by copying `talemate_frontend/example.env.development.local` to `talemate_frontend/.env.development.local`.
Then open the file and edit the `ALLOWED_HOSTS` and `VUE_APP_TALEMATE_BACKEND_WEBSOCKET_URL` variables.
```sh
ALLOWED_HOSTS=example.com
# wss if behind ssl, ws if not
VUE_APP_TALEMATE_BACKEND_WEBSOCKET_URL=wss://example.com:5050
```

View File

@@ -23,5 +23,5 @@ services:
dockerfile: Dockerfile.frontend
ports:
- "8080:8080"
volumes:
- ./talemate_frontend:/app
#volumes:
# - ./talemate_frontend:/app

1098
poetry.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -4,7 +4,7 @@ build-backend = "poetry.masonry.api"
[tool.poetry]
name = "talemate"
version = "0.25.0"
version = "0.25.5"
description = "AI-backed roleplay and narrative tools"
authors = ["FinalWombat"]
license = "GNU Affero General Public License v3.0"
@@ -51,7 +51,8 @@ chromadb = ">=0.4.17,<1"
InstructorEmbedding = "^1.0.1"
torch = ">=2.1.0"
torchaudio = ">=2.3.0"
sentence-transformers="^2.2.2"
# locked for instructor embeddings
sentence-transformers="==2.2.2"
[tool.poetry.dev-dependencies]
pytest = "^6.2"

View File

@@ -2,4 +2,4 @@ from .agents import Agent
from .client import TextGeneratorWebuiClient
from .tale_mate import *
VERSION = "0.25.0"
VERSION = "0.25.5"

View File

@@ -221,6 +221,9 @@ class Agent(ABC):
if callback:
await callback()
async def setup_check(self):
return False
async def ready_check(self, task: asyncio.Task = None):
self.ready_check_error = None
if task:

View File

@@ -668,7 +668,9 @@ class ConversationAgent(Agent):
total_result = util.handle_endofline_special_delimiter(total_result)
if total_result.startswith(":\n"):
log.info("conversation agent", total_result=total_result)
if total_result.startswith(":\n") or total_result.startswith(": "):
total_result = total_result[2:]
# movie script format

View File

@@ -80,6 +80,11 @@ class VisualBase(Agent):
),
},
),
"automatic_setup": AgentAction(
enabled=True,
label="Automatic Setup",
description="Automatically setup the visual agent if the selected client has an implementation of the selected backend. (Like the KoboldCpp Automatic1111 api)",
),
"automatic_generation": AgentAction(
enabled=False,
label="Automatic Generation",
@@ -187,8 +192,10 @@ class VisualBase(Agent):
prev_ready = self.backend_ready
self.backend_ready = False
self.ready_check_error = str(error)
await self.setup_check()
if prev_ready:
await self.emit_status()
async def ready_check(self):
if not self.enabled:
@@ -198,6 +205,15 @@ class VisualBase(Agent):
task = asyncio.create_task(fn())
await super().ready_check(task)
async def setup_check(self):
if not self.actions["automatic_setup"].enabled:
return
backend = self.backend
if self.client and hasattr(self.client, f"visual_{backend.lower()}_setup"):
await getattr(self.client, f"visual_{backend.lower()}_setup")(self)
async def apply_config(self, *args, **kwargs):
try:

View File

@@ -5,9 +5,10 @@ from talemate.client.anthropic import AnthropicClient
from talemate.client.cohere import CohereClient
from talemate.client.google import GoogleClient
from talemate.client.groq import GroqClient
from talemate.client.koboldcpp import KoboldCppClient
from talemate.client.lmstudio import LMStudioClient
from talemate.client.mistral import MistralAIClient
from talemate.client.openai import OpenAIClient
from talemate.client.openai_compat import OpenAICompatibleClient
from talemate.client.registry import CLIENT_CLASSES, get_client_class, register
from talemate.client.textgenwebui import TextGeneratorWebuiClient
from talemate.client.textgenwebui import TextGeneratorWebuiClient

View File

@@ -122,6 +122,10 @@ class ClientBase:
"""
return self.Meta().requires_prompt_template
@property
def max_tokens_param_name(self):
return "max_tokens"
def set_client(self, **kwargs):
self.client = AsyncOpenAI(base_url=self.api_url, api_key="sk-1111")
@@ -410,7 +414,6 @@ class ClientBase:
self.log.warning("client status error", e=e, client=self.name)
self.model_name = None
self.connected = False
self.toggle_disabled_if_remote()
self.emit_status()
return
@@ -626,7 +629,7 @@ class ClientBase:
is_repetition, similarity_score, matched_line = util.similarity_score(
response, finalized_prompt.split("\n"), similarity_threshold=80
)
if not is_repetition:
# not a repetition, return the response
@@ -660,7 +663,7 @@ class ClientBase:
# then we pad the max_tokens by the pad_max_tokens amount
prompt_param["max_tokens"] += pad_max_tokens
prompt_param[self.max_tokens_param_name] += pad_max_tokens
# send the prompt again
# we use the repetition_adjustment method to further encourage
@@ -682,7 +685,7 @@ class ClientBase:
# a lot of the times the response will now contain the repetition + something new
# so we dedupe the response to remove the repetition on sentences level
response = util.dedupe_sentences(
response, matched_line, similarity_threshold=85, debug=True
)
@@ -752,3 +755,29 @@ class ClientBase:
new_lines.append(line)
return "\n".join(new_lines)
def process_response_for_indirect_coercion(self, prompt:str, response:str) -> str:
"""
A lot of remote APIs don't let us control the prompt template and we cannot directly
append the beginning of the desired response to the prompt.
With indirect coercion we tell the LLM what the beginning of the response should be
and then hopefully it will adhere to it and we can strip it off the actual response.
"""
_, right = prompt.split("\nStart your response with: ")
expected_response = right.strip()
if (
expected_response
and expected_response.startswith("{")
):
if response.startswith("```json") and response.endswith("```"):
response = response[7:-3].strip()
if right and response.startswith(right):
response = response[len(right) :].strip()
return response

View File

@@ -1,16 +0,0 @@
import asyncio
import json
import logging
import random
from abc import ABC, abstractmethod
from typing import Callable, Union
import requests
import talemate.client.system_prompts as system_prompts
import talemate.util as util
from talemate.client.registry import register
from talemate.client.textgenwebui import RESTTaleMateClient
from talemate.emit import Emission, emit
# NOT IMPLEMENTED AT THIS POINT

View File

@@ -0,0 +1,306 @@
import random
import re
from typing import TYPE_CHECKING
# import urljoin
from urllib.parse import urljoin, urlparse
import httpx
import structlog
from talemate.client.base import STOPPING_STRINGS, ClientBase, Defaults, ExtraField
from talemate.client.registry import register
import talemate.util as util
if TYPE_CHECKING:
from talemate.agents.visual import VisualBase
log = structlog.get_logger("talemate.client.koboldcpp")
class KoboldCppClientDefaults(Defaults):
api_url: str = "http://localhost:5001"
api_key: str = ""
@register()
class KoboldCppClient(ClientBase):
auto_determine_prompt_template: bool = True
client_type = "koboldcpp"
class Meta(ClientBase.Meta):
name_prefix: str = "KoboldCpp"
title: str = "KoboldCpp"
enable_api_auth: bool = True
defaults: KoboldCppClientDefaults = KoboldCppClientDefaults()
@property
def request_headers(self):
headers = {}
headers["Content-Type"] = "application/json"
if self.api_key:
headers["Authorization"] = f"Bearer {self.api_key}"
return headers
@property
def url(self) -> str:
parts = urlparse(self.api_url)
return f"{parts.scheme}://{parts.netloc}"
@property
def is_openai(self) -> bool:
"""
kcpp has two apis
open-ai implementation at /v1
their own implenation at /api/v1
"""
return "/api/v1" not in self.api_url
@property
def api_url_for_model(self) -> str:
if self.is_openai:
# join /model to url
return urljoin(self.api_url, "models")
else:
# join /models to url
return urljoin(self.api_url, "model")
@property
def api_url_for_generation(self) -> str:
if self.is_openai:
# join /v1/completions
return urljoin(self.api_url, "completions")
else:
# join /api/v1/generate
return urljoin(self.api_url, "generate")
@property
def max_tokens_param_name(self):
if self.is_openai:
return "max_tokens"
else:
return "max_length"
def api_endpoint_specified(self, url: str) -> bool:
return "/v1" in self.api_url
def ensure_api_endpoint_specified(self):
if not self.api_endpoint_specified(self.api_url):
# url doesn't specify the api endpoint
# use the koboldcpp united api
self.api_url = urljoin(self.api_url.rstrip("/") + "/", "/api/v1/")
if not self.api_url.endswith("/"):
self.api_url += "/"
def __init__(self, **kwargs):
self.api_key = kwargs.pop("api_key", "")
super().__init__(**kwargs)
self.ensure_api_endpoint_specified()
def tune_prompt_parameters(self, parameters: dict, kind: str):
super().tune_prompt_parameters(parameters, kind)
if not self.is_openai:
# adjustments for united api
parameters["max_length"] = parameters.pop("max_tokens")
parameters["max_context_length"] = self.max_token_length
if "repetition_penalty_range" in parameters:
parameters["rep_pen_range"] = parameters.pop("repetition_penalty_range")
if "repetition_penalty" in parameters:
parameters["rep_pen"] = parameters.pop("repetition_penalty")
if parameters.get("stop_sequence"):
parameters["stop_sequence"] = parameters.pop("stopping_strings")
if parameters.get("extra_stopping_strings"):
if "stop_sequence" in parameters:
parameters["stop_sequence"] += parameters.pop("extra_stopping_strings")
else:
parameters["stop_sequence"] = parameters.pop("extra_stopping_strings")
allowed_params = [
"max_length",
"max_context_length",
"rep_pen",
"rep_pen_range",
"top_p",
"top_k",
"temperature",
"stop_sequence",
]
else:
allowed_params = ["max_tokens", "presence_penalty", "top_p", "temperature"]
# drop unsupported params
for param in list(parameters.keys()):
if param not in allowed_params:
del parameters[param]
def set_client(self, **kwargs):
self.api_key = kwargs.get("api_key", self.api_key)
self.ensure_api_endpoint_specified()
async def get_model_name(self):
self.ensure_api_endpoint_specified()
async with httpx.AsyncClient() as client:
response = await client.get(
self.api_url_for_model,
timeout=2,
headers=self.request_headers,
)
if response.status_code == 404:
raise KeyError(f"Could not find model info at: {self.api_url_for_model}")
response_data = response.json()
if self.is_openai:
# {"object": "list", "data": [{"id": "koboldcpp/dolphin-2.8-mistral-7b", "object": "model", "created": 1, "owned_by": "koboldcpp", "permission": [], "root": "koboldcpp"}]}
model_name = response_data.get("data")[0].get("id")
else:
# {"result": "koboldcpp/dolphin-2.8-mistral-7b"}
model_name = response_data.get("result")
# split by "/" and take last
if model_name:
model_name = model_name.split("/")[-1]
return model_name
async def tokencount(self, content:str) -> int:
"""
KoboldCpp has a tokencount endpoint we can use to count tokens
for the prompt and response
If the endpoint is not available, we will use the default token count estimate
"""
# extract scheme and host from api url
parts = urlparse(self.api_url)
url_tokencount = f"{parts.scheme}://{parts.netloc}/api/extra/tokencount"
async with httpx.AsyncClient() as client:
response = await client.post(
url_tokencount,
json={"prompt":content},
timeout=None,
headers=self.request_headers,
)
if response.status_code == 404:
# kobold united doesn't have tokencount endpoint
return util.count_tokens(content)
tokencount = len(response.json().get("ids",[]))
return tokencount
async def generate(self, prompt: str, parameters: dict, kind: str):
"""
Generates text from the given prompt and parameters.
"""
parameters["prompt"] = prompt.strip(" ")
self._returned_prompt_tokens = await self.tokencount(parameters["prompt"] )
async with httpx.AsyncClient() as client:
response = await client.post(
self.api_url_for_generation,
json=parameters,
timeout=None,
headers=self.request_headers,
)
response_data = response.json()
try:
if self.is_openai:
response_text = response_data["choices"][0]["text"]
else:
response_text = response_data["results"][0]["text"]
except (TypeError, KeyError) as exc:
log.error("Failed to generate text", exc=exc, response_data=response_data, response_status=response.status_code)
response_text = ""
self._returned_response_tokens = await self.tokencount(response_text)
return response_text
def jiggle_randomness(self, prompt_config: dict, offset: float = 0.3) -> dict:
"""
adjusts temperature and repetition_penalty
by random values using the base value as a center
"""
temp = prompt_config["temperature"]
if "rep_pen" in prompt_config:
rep_pen_key = "rep_pen"
elif "presence_penalty" in prompt_config:
rep_pen_key = "presence_penalty"
else:
rep_pen_key = "repetition_penalty"
min_offset = offset * 0.3
prompt_config["temperature"] = random.uniform(temp + min_offset, temp + offset)
try:
if rep_pen_key == "presence_penalty":
presence_penalty = prompt_config["presence_penalty"]
prompt_config["presence_penalty"] = round(random.uniform(
presence_penalty + 0.1, presence_penalty + offset
),1)
else:
rep_pen = prompt_config[rep_pen_key]
prompt_config[rep_pen_key] = random.uniform(
rep_pen + min_offset * 0.3, rep_pen + offset * 0.3
)
except KeyError:
pass
def reconfigure(self, **kwargs):
if "api_key" in kwargs:
self.api_key = kwargs.pop("api_key")
super().reconfigure(**kwargs)
async def visual_automatic1111_setup(self, visual_agent:"VisualBase") -> bool:
"""
Automatically configure the visual agent for automatic1111
if the koboldcpp server has a SD model available
"""
if not self.connected:
return False
sd_models_url = urljoin(self.url, "/sdapi/v1/sd-models")
async with httpx.AsyncClient() as client:
try:
response = await client.get(
url=sd_models_url, timeout=2
)
except Exception as exc:
log.error(f"Failed to fetch sd models from {sd_models_url}", exc=exc)
return False
if response.status_code != 200:
return False
response_data = response.json()
sd_model = response_data[0].get("model_name") if response_data else None
if not sd_model:
return False
log.info("automatic1111_setup", sd_model=sd_model)
visual_agent.actions["automatic1111"].config["api_url"].value = self.url
visual_agent.is_enabled = True
return True

View File

@@ -25,12 +25,16 @@ SUPPORTED_MODELS = [
"mistral-large-latest",
]
JSON_OBJECT_RESPONSE_MODELS = SUPPORTED_MODELS
JSON_OBJECT_RESPONSE_MODELS = [
"open-mixtral-8x22b",
"mistral-small-latest",
"mistral-medium-latest",
"mistral-large-latest",
]
class Defaults(pydantic.BaseModel):
max_token_length: int = 16384
model: str = "open-mixtral-8x7b"
model: str = "open-mixtral-8x22b"
@register()
@@ -53,7 +57,7 @@ class MistralAIClient(ClientBase):
requires_prompt_template: bool = False
defaults: Defaults = Defaults()
def __init__(self, model="open-mixtral-8x7b", **kwargs):
def __init__(self, model="open-mixtral-8x22b", **kwargs):
self.model_name = model
self.api_key_status = None
self.config = load_config()
@@ -115,7 +119,7 @@ class MistralAIClient(ClientBase):
return
if not self.model_name:
self.model_name = "open-mixtral-8x7b"
self.model_name = "open-mixtral-8x22b"
if max_token_length and not isinstance(max_token_length, int):
max_token_length = int(max_token_length)

View File

@@ -136,13 +136,15 @@ class ModelPrompt:
"""
matches = []
cleaned_model_name = model_name.replace("/", "__")
# Iterate over all templates in the loader's directory
for template_name in self.env.list_templates():
# strip extension
template_name_match = os.path.splitext(template_name)[0]
# Check if the model name is in the template filename
if template_name_match.lower() in model_name.lower():
if template_name_match.lower() in cleaned_model_name.lower():
matches.append(template_name)
# If there are no matches, return None
@@ -163,16 +165,17 @@ class ModelPrompt:
"""
template_name = template_name.split(".jinja2")[0]
cleaned_model_name = model_name.replace("/", "__")
shutil.copyfile(
os.path.join(STD_TEMPLATE_PATH, template_name + ".jinja2"),
os.path.join(USER_TEMPLATE_PATH, model_name + ".jinja2"),
os.path.join(USER_TEMPLATE_PATH, cleaned_model_name + ".jinja2"),
)
return os.path.join(USER_TEMPLATE_PATH, model_name + ".jinja2")
return os.path.join(USER_TEMPLATE_PATH, cleaned_model_name + ".jinja2")
def query_hf_for_prompt_template_suggestion(self, model_name: str):
print("query_hf_for_prompt_template_suggestion", model_name)
api = huggingface_hub.HfApi()
try:

View File

@@ -28,12 +28,14 @@ SUPPORTED_MODELS = [
"gpt-4-turbo-preview",
"gpt-4-turbo-2024-04-09",
"gpt-4-turbo",
"gpt-4o-2024-05-13",
"gpt-4o",
]
# any model starting with gpt-4- is assumed to support 'json_object'
# for others we need to explicitly state the model name
JSON_OBJECT_RESPONSE_MODELS = [
"gpt-4-1106-preview",
"gpt-4-0125-preview",
"gpt-4-turbo-preview",
"gpt-4o",
"gpt-3.5-turbo-0125",
]

View File

@@ -1,5 +1,5 @@
import urllib
import random
import pydantic
import structlog
from openai import AsyncOpenAI, NotFoundError, PermissionDeniedError
@@ -20,6 +20,7 @@ class Defaults(pydantic.BaseModel):
max_token_length: int = 8192
model: str = ""
api_handles_prompt_template: bool = False
double_coercion: str = None
class ClientConfig(BaseClientConfig):
@@ -43,9 +44,9 @@ class OpenAICompatibleClient(ClientBase):
"api_handles_prompt_template": ExtraField(
name="api_handles_prompt_template",
type="bool",
label="API Handles Prompt Template",
label="API handles prompt template (chat/completions)",
required=False,
description="The API handles the prompt template, meaning your choice in the UI for the prompt template below will be ignored.",
description="The API handles the prompt template, meaning your choice in the UI for the prompt template below will be ignored. This is not recommended and should only be used if the API does not support the `completions` andpoint or you don't know which prompt template to use.",
)
}
@@ -83,13 +84,12 @@ class OpenAICompatibleClient(ClientBase):
def tune_prompt_parameters(self, parameters: dict, kind: str):
super().tune_prompt_parameters(parameters, kind)
keys = list(parameters.keys())
allowed_params = ["max_tokens", "presence_penalty", "top_p", "temperature"]
valid_keys = ["temperature", "top_p", "max_tokens"]
for key in keys:
if key not in valid_keys:
del parameters[key]
# drop unsupported params
for param in list(parameters.keys()):
if param not in allowed_params:
del parameters[param]
def prompt_template(self, system_message: str, prompt: str):
@@ -117,16 +117,27 @@ class OpenAICompatibleClient(ClientBase):
"""
Generates text from the given prompt and parameters.
"""
human_message = {"role": "user", "content": prompt.strip()}
self.log.debug("generate", prompt=prompt[:128] + " ...", parameters=parameters)
try:
response = await self.client.chat.completions.create(
model=self.model_name, messages=[human_message], **parameters
)
return response.choices[0].message.content
if self.api_handles_prompt_template:
# OpenAI API handles prompt template
# Use the chat completions endpoint
self.log.debug("generate (chat/completions)", prompt=prompt[:128] + " ...", parameters=parameters)
human_message = {"role": "user", "content": prompt.strip()}
response = await self.client.chat.completions.create(
model=self.model_name, messages=[human_message], **parameters
)
response = response.choices[0].message.content
return self.process_response_for_indirect_coercion(prompt, response)
else:
# Talemate handles prompt template
# Use the completions endpoint
self.log.debug("generate (completions)", prompt=prompt[:128] + " ...", parameters=parameters)
parameters["prompt"] = prompt
response = await self.client.completions.create(
model=self.model_name, **parameters
)
return response.choices[0].text
except PermissionDeniedError as e:
self.log.error("generate error", e=e)
emit("status", message="Client API: Permission Denied", status="error")
@@ -151,7 +162,33 @@ class OpenAICompatibleClient(ClientBase):
self.api_key = kwargs["api_key"]
if "api_handles_prompt_template" in kwargs:
self.api_handles_prompt_template = kwargs["api_handles_prompt_template"]
# TODO: why isn't this calling super()?
if "enabled" in kwargs:
self.enabled = bool(kwargs["enabled"])
if "double_coercion" in kwargs:
self.double_coercion = kwargs["double_coercion"]
log.warning("reconfigure", kwargs=kwargs)
self.set_client(**kwargs)
def jiggle_randomness(self, prompt_config: dict, offset: float = 0.3) -> dict:
"""
adjusts temperature and presence penalty
by random values using the base value as a center
"""
temp = prompt_config["temperature"]
min_offset = offset * 0.3
prompt_config["temperature"] = random.uniform(temp + min_offset, temp + offset)
try:
presence_penalty = prompt_config["presence_penalty"]
prompt_config["presence_penalty"] = round(random.uniform(
presence_penalty + 0.1, presence_penalty + offset
),1)
except KeyError:
pass

View File

@@ -11,10 +11,15 @@ __all__ = [
"PRESET_SIMPLE_1",
]
# TODO: refactor abstraction and make configurable
PRESENCE_PENALTY_BASE = 0.2
PRESET_TALEMATE_CONVERSATION = {
"temperature": 0.65,
"top_p": 0.47,
"top_k": 42,
"presence_penalty": PRESENCE_PENALTY_BASE,
"repetition_penalty": 1.18,
"repetition_penalty_range": 2048,
}
@@ -23,6 +28,7 @@ PRESET_TALEMATE_CREATOR = {
"temperature": 0.7,
"top_p": 0.9,
"top_k": 20,
"presence_penalty": PRESENCE_PENALTY_BASE,
"repetition_penalty": 1.15,
"repetition_penalty_range": 512,
}
@@ -31,6 +37,7 @@ PRESET_LLAMA_PRECISE = {
"temperature": 0.7,
"top_p": 0.1,
"top_k": 40,
"presence_penalty": PRESENCE_PENALTY_BASE,
"repetition_penalty": 1.18,
}
@@ -45,6 +52,7 @@ PRESET_DIVINE_INTELLECT = {
"temperature": 1.31,
"top_p": 0.14,
"top_k": 49,
"presence_penalty": PRESENCE_PENALTY_BASE,
"repetition_penalty_range": 1024,
"repetition_penalty": 1.17,
}
@@ -53,6 +61,7 @@ PRESET_SIMPLE_1 = {
"temperature": 0.7,
"top_p": 0.9,
"top_k": 20,
"presence_penalty": PRESENCE_PENALTY_BASE,
"repetition_penalty": 1.15,
}

View File

@@ -51,6 +51,39 @@ class TextGeneratorWebuiClient(ClientBase):
# is this needed?
parameters["max_new_tokens"] = parameters["max_tokens"]
parameters["stop"] = parameters["stopping_strings"]
# textgenwebui does not error on unsupported parameters
# but we should still drop them so they don't get passed to the API
# and show up in our prompt debugging tool.
# note that this is not the full list of their supported parameters
# but only those we send.
allowed_params = [
"temperature",
"top_p",
"top_k",
"max_tokens",
"repetition_penalty",
"repetition_penalty_range",
"max_tokens",
"stopping_strings",
"skip_special_tokens",
"stream",
# is this needed?
"max_new_tokens",
"stop",
# talemate internal
# These will be removed before sending to the API
# but we keep them here since they are used during the prompt finalization
"extra_stopping_strings",
]
# drop unsupported params
for param in list(parameters.keys()):
if param not in allowed_params:
del parameters[param]
def set_client(self, **kwargs):
self.api_key = kwargs.get("api_key", self.api_key)

View File

@@ -187,3 +187,5 @@ async def agent_ready_checks():
for agent in AGENTS.values():
if agent and agent.enabled:
await agent.ready_check()
elif agent and not agent.enabled:
await agent.setup_check()

View File

@@ -11,6 +11,20 @@ class TestPromptPayload(pydantic.BaseModel):
kind: str
def ensure_number(v):
"""
if v is a str but digit turn into into or float
"""
if isinstance(v, str):
if v.isdigit():
return int(v)
try:
return float(v)
except ValueError:
return v
return v
class DevToolsPlugin:
router = "devtools"
@@ -34,7 +48,7 @@ class DevToolsPlugin:
log.info(
"Testing prompt",
payload={
k: v for k, v in payload.generation_parameters.items() if k != "prompt"
k: ensure_number(v) for k, v in payload.generation_parameters.items() if k != "prompt"
},
)

View File

@@ -2123,7 +2123,7 @@ class Scene(Emitter):
async def add_to_recent_scenes(self):
log.debug("add_to_recent_scenes", filename=self.filename)
config = Config(**self.config)
config = load_config(as_model=True)
config.recent_scenes.push(self)
config.save()

View File

@@ -5,7 +5,7 @@ import json
import re
import textwrap
from typing import List, Union
import struct
import isodate
import structlog
from colorama import Back, Fore, Style, init
@@ -179,6 +179,29 @@ def color_emotes(text: str, color: str = "blue") -> str:
def extract_metadata(img_path, img_format):
return chara_read(img_path)
def read_metadata_from_png_text(image_path:str) -> dict:
"""
Reads the character metadata from the tEXt chunk of a PNG image.
"""
# Read the image
with open(image_path, 'rb') as f:
png_data = f.read()
# Split the PNG data into chunks
offset = 8 # Skip the PNG signature
while offset < len(png_data):
length = struct.unpack('!I', png_data[offset:offset+4])[0]
chunk_type = png_data[offset+4:offset+8]
chunk_data = png_data[offset+8:offset+8+length]
if chunk_type == b'tEXt':
keyword, text_data = chunk_data.split(b'\x00', 1)
if keyword == b'chara':
return json.loads(base64.b64decode(text_data).decode('utf-8'))
offset += 12 + length
raise ValueError('No character metadata found.')
def chara_read(img_url, input_format=None):
if input_format is None:
@@ -194,7 +217,6 @@ def chara_read(img_url, input_format=None):
image = Image.open(io.BytesIO(image_data))
exif_data = image.getexif()
if format == "webp":
try:
if 37510 in exif_data:
@@ -235,7 +257,15 @@ def chara_read(img_url, input_format=None):
return base64_decoded_data
else:
log.warn("chara_load", msg="No chara data found in PNG image.")
return False
log.warn("chara_load", msg="Trying to read from PNG text.")
try:
return read_metadata_from_png_text(img_url)
except ValueError:
return False
except Exception as exc:
log.error("chara_load", msg="Error reading metadata from PNG text.", exc_info=exc)
return False
else:
return None

View File

@@ -0,0 +1,3 @@
ALLOWED_HOSTS=example.com
# wss if behind ssl, ws if not
VUE_APP_TALEMATE_BACKEND_WEBSOCKET_URL=wss://example.com:5050

View File

@@ -1,12 +1,12 @@
{
"name": "talemate_frontend",
"version": "0.25.0",
"version": "0.25.5",
"lockfileVersion": 2,
"requires": true,
"packages": {
"": {
"name": "talemate_frontend",
"version": "0.25.0",
"version": "0.25.5",
"dependencies": {
"@codemirror/lang-markdown": "^6.2.5",
"@codemirror/theme-one-dark": "^6.1.2",

View File

@@ -1,6 +1,6 @@
{
"name": "talemate_frontend",
"version": "0.25.0",
"version": "0.25.5",
"private": true,
"scripts": {
"serve": "vue-cli-service serve",

View File

@@ -244,6 +244,13 @@ export default {
client.api_key = data.api_key;
client.double_coercion = data.data.double_coercion;
client.data = data.data;
for (let key in client.data.meta.extra_fields) {
if (client.data[key] === null || client.data[key] === undefined) {
client.data[key] = client.data.meta.defaults[key];
}
client[key] = client.data[key];
}
} else if(!client) {
console.log("Adding new client", data);
@@ -259,6 +266,16 @@ export default {
double_coercion: data.data.double_coercion,
data: data.data,
});
// apply extra field defaults
let client = this.state.clients[this.state.clients.length - 1];
for (let key in client.data.meta.extra_fields) {
if (client.data[key] === null || client.data[key] === undefined) {
client.data[key] = client.data.meta.defaults[key];
}
client[key] = client.data[key];
}
// sort the clients by name
this.state.clients.sort((a, b) => (a.name > b.name) ? 1 : -1);
}

View File

@@ -56,9 +56,9 @@
</v-row>
<v-row v-for="field in clientMeta().extra_fields" :key="field.name">
<v-col cols="12">
<v-text-field v-model="client.data[field.name]" v-if="field.type === 'text'" :label="field.label"
<v-text-field v-model="client[field.name]" v-if="field.type === 'text'" :label="field.label"
:rules="[rules.required]" :hint="field.description"></v-text-field>
<v-checkbox v-else-if="field.type === 'bool'" v-model="client.data[field.name]"
<v-checkbox v-else-if="field.type === 'bool'" v-model="client[field.name]"
:label="field.label" :hint="field.description" density="compact"></v-checkbox>
</v-col>
</v-row>

View File

@@ -248,7 +248,7 @@ export default {
messageHandlers: [],
scene: {},
appConfig: {},
autcompleting: false,
autocompleting: false,
autocompletePartialInput: "",
autocompleteCallback: null,
autocompleteFocusElement: null,
@@ -303,9 +303,11 @@ export default {
this.connecting = true;
let currentUrl = new URL(window.location.href);
console.log(currentUrl);
let websocketUrl = process.env.VUE_APP_TALEMATE_BACKEND_WEBSOCKET_URL || `ws://${currentUrl.hostname}:5050/ws`;
this.websocket = new WebSocket(`ws://${currentUrl.hostname}:5050/ws`);
console.log("urls", { websocketUrl, currentUrl }, {env : process.env});
this.websocket = new WebSocket(websocketUrl);
console.log("Websocket connecting ...")
this.websocket.onmessage = this.handleMessage;
this.websocket.onopen = () => {

View File

@@ -1,4 +1,12 @@
const { defineConfig } = require('@vue/cli-service')
const ALLOWED_HOSTS = ((process.env.ALLOWED_HOSTS || "all") !== "all" ? process.env.ALLOWED_HOSTS.split(",") : "all")
const VUE_APP_TALEMATE_BACKEND_WEBSOCKET_URL = process.env.VUE_APP_TALEMATE_BACKEND_WEBSOCKET_URL || null
console.log("NODE_ENV", process.env.NODE_ENV)
console.log("ALLOWED_HOSTS", ALLOWED_HOSTS)
console.log("VUE_APP_TALEMATE_BACKEND_WEBSOCKET_URL", VUE_APP_TALEMATE_BACKEND_WEBSOCKET_URL)
module.exports = defineConfig({
transpileDependencies: true,
@@ -9,6 +17,7 @@ module.exports = defineConfig({
},
devServer: {
allowedHosts: ALLOWED_HOSTS,
client: {
overlay: {
warnings: false,

View File

@@ -0,0 +1 @@
<s>[INST] {{ system_message }} {{ user_message }} [/INST] {{ coercion_message }}