mirror of
https://github.com/vegu-ai/talemate.git
synced 2025-12-16 03:37:51 +01:00
* linting * Add cleanup function for recent scenes in config to remove non-existent paths * remove leghacy world state manager buttons * move world state scene tools into sub component * linting * move module properties to navigation drawer * Update icons in NodeEditorLibrary and NodeEditorModuleProperties for improved UI clarity * prompt tweaks * director chat prompt simplifications * more prompt fixes * Enhance type hints for duration conversion functions in time.py * narrate time action now has access to response length instructions * Add IsoDateDuration node for ISO 8601 interval string construction * Update advance_time method to include return type annotation and return message * Add AdvanceTime node to world state for time advancement with duration and narration instructions * linting * Add agent state exclusions to changelog with a TODO for module migration * Add message emission for actor, narrator, and scene analysis guidance in respective components. Enhance AgentMessages and SceneTools for better message handling and visual feedback. * Remove agent messages from state when opening agent message view in SceneTools component. * linting * openroute fetch models on key set * Add input history functionality to message input in TalemateApp component. Implement keyboard shortcuts for navigating history (Ctrl+Up/Down) and limit history to the last 10 messages. Update message sending logic to store messages in history. * Update message input hint in TalemateApp component to include keyboard shortcuts for navigating input history (Ctrl+Up/Down). * node updates * unified data extraction function * prompt tweaks * Add gamestate context support in BuildPrompt and corresponding template. Introduced new property for gamestate context and updated rendering logic to include gamestate information in prompts. * Refactor Prompt class by removing LoopedPrompt and cleaning up related methods. Update data response parsing to streamline functionality and improve clarity. Adjust imports accordingly. * Add 'data_multiple' property to GenerateResponse class to allow multiple data structures in responses. Update output socket type for 'data_obj' to support both dict and list formats. * Add DictUpdate node * Add UnpackGameState node to retrieve and unpack game state variables * gamestate nodes * linting * Enhance scene view toggle functionality to support shift-click behavior for closing all drawers when hiding the scene view. * immutable scenes should reset context db on load * linting * node updates * prompt tweaks * Add context type output and filtering for creative context ID meta entries in PathToContextID and ContextIDMetaEntries nodes * Add string replacement functionality and Jinja2 formatting support in nodes. Introduced 'old' and 'new' properties for substring replacement in the Replace node, and added a new Jinja2Format node for template rendering using jinja2. * Add additional outputs for context validation in ValidateContextIDItem node, including context type, context value, and name. * prompt tweaks * node adjustments * linting * Add data_expected attribute to Focal and Prompt classes for enhanced response handling * node updates * node updates * node updates * prompt tweaks * director summary return appropriately on no action taken * Enhance action handling in DirectorChatMixin by skipping actions when a question is present in the parsed response, ensuring better response accuracy. * Enhance ConfirmActionPrompt component by adding anchorTop prop for dynamic alignment and adjusting icon size and color for improved UI consistency. * anchor clear chat confirm to top * responsive layout fixes in template editors * linting * relock * Add scene progression guidance to chat-common-tasks template * Refactor push_history method to be asynchronous across multiple agents and scenes, ensuring consistent handling of message history updates. * Update chat instructions to clarify user intent considerations and enhance decisiveness in responses. Added guidance on distinguishing between scene progression and background changes, and refined analysis requirements for user interactions. * Enhance DirectorConsoleChatsToolbar by adding a usage cheat sheet tooltip for user guidance and refining the Clear Chat button's UI for better accessibility. * store character data at unified point * fix button * fix world editor auto sync * Shared context 2 (#19) Shared context * Refactor NodeEditorLibrary to improve search functionality and debounce input handling. Updated v-text-field model and added a watcher for search input to enhance performance. * Refactor NodeEditor and TalemateApp components to enhance UI interactions. Removed the exit creative mode button from NodeEditor and updated tooltips for clarity. Adjusted app bar navigation icons for better accessibility and added functionality to switch between node editor and creative mode. * comment * Character.update deserialize voice value correctly * Enhance SharedContext.update_to_scene method to properly add or update character data in the scene based on existence checks. This improves the synchronization of character states between shared context and scene. * shared context static history support fix context memory db imports to always import * Update WorldStateManagerSceneSharedContext.vue to clarify sharing of character, world entries, and history across connected scenes. * linting * Enhance chat modes by adding 'nospoilers' option to DirectorChat and related payloads. Update chat instructions to reflect new mode behavior and improve UI to support mode-specific icons and colors in the DirectorConsoleChatsToolbar. * Comment out 'repetition_penalty_range' in TabbyAPIClient to prevent unexpected "<unk><unk> .." responses. Further investigation needed. * linting * Add active_characters and intro_instructions to Inheritance model; implement intro generation in load_scene_from_data. Update WorldStateManagerSceneSharedContext.vue to enhance new scene creation dialog with character selection and premise instructions. * rename inheritance to scene initialization * linting * Update WorldStateManagerSceneSharedContext.vue to conditionally display alert based on scene saving status and new scene creation state. * Refine messages for shared context checkboxes in WorldStateManagerCharacter and WorldStateManagerWorldEntries components for clarity. * Add scene title generation to load process and update contextual generation template. Introduced a new method in AssistantMixin for generating scene titles, ensuring titles are concise and free of special characters. Updated load_scene_from_data to assign generated titles to scenes. * linting * Refactor GameState component to integrate Codemirror for JSON editing, replacing the previous treeview structure. Implement validation for JSON input and enhance error handling. Remove unused methods and streamline state management. * Add lastLoadedJSON property to GameState component for change detection. Update validation logic to prevent unnecessary updates when game state has not changed. * Remove status emission for gameplay switch in CmdSetEnvironmentToScene class. * allow individual sharing of attributes and details * linting * Remove redundant question handling logic in DirectorChatMixin to streamline action selection process. * Update EXTERNAL_DESCRIPTION in TabbyAPI client to include notes on EXL3 model sensitivity to inference parameters. Adjust handling of 'repetition_penalty_range' in parameter list for clarity. * director chat support remove message and regenerate message * Refactor ConfirmActionInline component to improve button rendering logic. Introduced 'size' prop for button customization and added 'comfortable' density option. Simplified icon handling with computed property for better clarity. * linting * node updates * Add appBusy prop to DirectorConsoleChats and DirectorConsoleChatsToolbar components to manage button states during busy operations. * Refactor DirectorChatMixin to utilize standalone utility functions for parsing response sections and extracting action blocks. This improves code clarity and maintainability. Added tests for new utility functions in test_utils_prompt.py to ensure correct functionality. * Update clear chat button logic to consider appBusy state in DirectorConsoleChatsToolbar component, enhancing user experience during busy operations. * linting * Remove plan.md * Add chat template identifier support and error handling in ModelPrompt class - Implemented logic to check for 'chat_template.jinja2' in Hugging Face repository. - Added new template identifiers: GraniteIdentifier and GLMIdentifier. - Enhanced error handling to avoid logging 404 errors for missing templates. - Introduced Granite.jinja2 template file for prompt structure. * node fixes * remove debug msg * Enhance error handling in DynamicInstruction class by enforcing header requirement and ensuring content defaults to an empty string if not provided. * recet scene message visibility on scene load * prompt tweaks * Enhance data extraction in Focal class by adding a fallback mechanism. Implemented additional error handling to attempt data extraction from a fenced block if the initial extraction fails, improving robustness in handling responses. * linting * node fixes * Add relative_to_root function for path resolution and update node export logic - Introduced a new function `relative_to_root` in path.py to resolve paths relative to the TALEMATE_ROOT. - Updated the `export_node_definitions` function in registry.py to use `relative_to_root` for module path resolution. - Added a check to skip non-selectable node definitions in litegraphUtils.js during registration. * show icons * Improve error handling in export_node_definitions by adding a try-except block for module path resolution. Log a warning if the relative path conversion fails. * typo * Refactor base_attributes type in Character model to a more generic dict type for improved flexibility * relock * ensure character gets added to character_data * prompt tweaks * linting * properly activate characters * activate needs to happen explicitly now and deactivated is the default * missing arg * avoid changed size error * Refactor character removal logic in shared context to prevent deletion; characters are now only marked as non-shared. * Add update_from_scene method calls in SharedContextMixin for scene synchronization * Add ensure_changelogs_for_all_scenes function to manage changelog files for all scenes; integrate it into the server run process. * Enhance backup restore functionality by adding base and latest snapshot options; improve UI with clearer labels and alerts for restore actions. * Update _apply_delta function to enhance delta application handling by adding parameters for error logging and force application of changes on non-existent paths. * Skip processing of changelog files in _list_files_and_directories function to prevent unnecessary inclusion in file listings. * Update IntroRecentScenes.vue to use optional chaining for selectedScene properties and enhance backup timestamp display with revision info. * linting * Refactor source entry attribute access in collect_source_entries function to use getattr for optional attributes, improving robustness. * Implement logic to always show scene view in scene mode within TalemateApp.vue, enhancing user experience during scene interactions. * prompt tweaks * prompt tweaks * Update TalemateApp.vue to set the active tab to 'main' when switching to the node editor, improving navigation consistency. * Add active frontend websocket handler management in websocket_endpoint * agent websocket handler node support * Refactor init_nodes method in DirectorAgent to call superclass method and rename chat initialization method in DirectorChatMixin for clarity. * Add characters output to ContextHistory node to track active participants in the scene * Add Agent Websocket Handler option to Node Editor Library with corresponding icons and labels * Add check for node selectability in NodeEditorNodeSearch component to filter search results accordingly. * Add SummarizeWebsocketHandler to handle summarize actions and integrate it into SummarizeAgent * nodes * Add data property to QueueResponse class for websocket communication and update run method to include action and data in output values. * Update manual context handling in WorldStateManager to include shared property from existing context * Enhance GetWorldEntry node to include 'shared' property in output values from world entry context * Update scene loading to allow setting scene ID from data and include ID in scene serialization * Update icon for AgentWebsocketHandler in NodeEditorLibrary component to mdi-web-box * Refactor WorldStateManager components to enhance history management and sharing capabilities. Added summarized history titles, improved UI for sharing static history, and integrated scene summarization functionality. Removed deprecated methods related to shared context settings. * linting * Change log level from warning to debug for migrate_narrator_source_to_meta error handling in NarratorMessage class. * Update GLM-no-reasoning template to include <think></think> tag before coercion message for improved prompt structure. * allow prompt templates to specify reasoning pattern * Add Seed.jinja2 template for LLM prompts with reasoning patterns and user interaction handling * Enhance NarratorAgent to support dynamic response length configuration. Updated max generation length from 192 to 256 tokens and introduced a new method to calculate response length. Modified narration methods to accept and utilize response length parameter. Added response length property in GenerateNarrationBase class and updated templates to include response length handling. * Update response length calculation in RevisionMixin to include token count for improved text processing. * Refactor response identifier in RevisionMixin to dynamically use calculated response length for improved prompt handling. * linting * allow contextual generation of static history entries * Add is_static property to HistoryEntry for static history entry identification * Add "static history" option to ContextualGenerate node for enhanced contextual generation capabilities. * Add CreateStaticArchiveEntry and RemoveStaticArchiveEntry nodes for managing static history entries. Implement input/output properties and error handling for entry creation and deletion. * nodes updated * linting * Add assets field to SceneInitialization model and update load_scene_from_data function to handle scene assets. Update WorldStateManagerSceneSharedContext.vue to include assets in scene initialization parameters. * Refactor CoverImage component to enhance drag-and-drop functionality and improve styling for empty portrait state. * Add intent_state to SceneInitialization model and update load_scene_from_data function to handle intent state. Introduce story_intent property in Scene class and reset method in SceneIntent class. Update WorldStateManagerSceneSharedContext.vue to include intent state in scene initialization parameters. * Refactor WorldStateManagerSceneSharedContext.vue to improve cancel functionality by introducing a dedicated cancelCreate method and removing the direct dialog toggle from the Cancel button. This enhances code clarity and maintainability. * Update SharedContext to use await for set_shared method, ensuring proper asynchronous handling when modifying character sharing status. * Add MAX_CONTENT_WIDTH constant and update components to use it for consistent max width styling * fix issue with data structure parsing * linting * fix tests * nodes * fix update_introduction * Add building blocks template for story configuration and scene management * Refactor toggleNavigation method to accept an 'open' parameter for direct control over drawer visibility in TalemateApp.vue * Update usageCheatSheet text in DirectorConsoleChatsToolbar.vue for clarity and add pre-wrap styling to tooltip * Add cover image and writing style sections to story and character templates; update chat common tasks with new scene restrictions and user guide reference. * linting * relock * Add EmitWorldEditorSync node to handle world editor synchronization; update WorldStateManager to refresh active tab on sync action. * Update Anthropic client with new models and adjust default settings; introduce limited parameter models for specific configurations. * director action module updates * direct context update fn * director action updates * Update usageCheatSheet in DirectorConsoleChatsToolbar.vue to include recommendation for 100B+ models. * Remove debug diagnostics from DirectorConsoleChats.vue to clean up console output. * Update card styles in IntroRecentScenes.vue for improved visual consistency; change card color to grey-darken-3 and adjust text classes for titles and subtitles. * Update EmitWorldEditorSync node to include websocket passthrough in sync action for improved event handling. * Increase maximum changelog file size limit from 500KB to 1MB to accommodate larger change logs. * linting * director action module updates * 0.33 added * Add Nexus agent persona to talemate template and initialize phrases array * Add support for project-specific grouping in NodeEditorLibrary for templates/modules, enhancing organization of node groups. * docs * Enhance NodeEditorLibrary by adding primary color to tree component for improved visibility and user experience. * docs * Enhance NewSceneSetupModal to include subtitles for writing styles and director personas, improving context and usability. * Update agent persona description in WorldStateManagerTemplates to specify current support for director only, enhancing clarity for users. * Refine agent persona description in WorldStateManagerTemplates to clarify assignment per agent in Scene Settings, maintaining focus on current director-only support. * fix crash when attempting to delete some clients * Add TODO comments in finalize_llama3 and finalize_YI methods to indicate removable cruft * Add lock_template feature to Client configuration and update related components for template management * linting * persist client template lock through model changes * There is no longer a point to enforcing creative mode when there are no characters * fix direct_narrator character argument * Update CharacterContextItem to allow 'value' to accept dict type in addition to existing types * docs * Update lock_template field in Client model to allow None type in addition to bool * Remove unused template_file field from Defaults model in Client configuration * Refactor lock_template field in Client model and ClientModal component to ensure consistent boolean handling * Add field validator for lock_template in Client model to ensure boolean value is returned * fix issue where valid data processed in extract_data_with_ai_fallback was not returned * Update default_player_character assignment in ConfigPlugin to use GamePlayerCharacter schema for improved data validation * linting * add heiku 4.5 model and make default * opse 4.5 isnt a thing * fix issue where fork / restore would restore duplicate messages * improve autocomplete handling when prefill isn't available * prompt tweaks * linting * gracefully handle removed attributes * Refactor scene reference handling in delete_changelog_files to prevent incorrect deletions. Added a test to verify proper scene reference construction and ensure changelog files are deleted correctly. * forked scenes reset memory id and are not immutable * emit_status export rev * Update RequestInput.vue to handle extra_params more robustly, ensuring defaults are set correctly for input. * only allow forking on saved messages * linting * tweak defaults * summarizer fire off of push_history.after * docs * : in world entry titles will now load correctly * linting * docs * removing base attrib ute or detail also clears it from shared list * fix issue where cancelling some generations would cause errors * increase font size * formatting fixes * unhandled errors at the loop level should not crash the entire scene * separate message processing from main loop * linting * remove debug cruft * enhance error logging in background processing to include traceback information * linting * nothing to detemrine of no model is sent * fix some errors during kcpp client deletion * improve configuration issue alert visibility * restore input focus after autocomplete * linting
844 lines
27 KiB
Python
844 lines
27 KiB
Python
import os
|
|
import pytest
|
|
import json
|
|
import yaml
|
|
from unittest.mock import MagicMock
|
|
import talemate.util.data
|
|
from talemate.util.data import (
|
|
fix_faulty_json,
|
|
extract_json,
|
|
extract_json_v2,
|
|
extract_yaml_v2,
|
|
extract_data_auto,
|
|
extract_data,
|
|
extract_data_with_ai_fallback,
|
|
JSONEncoder,
|
|
DataParsingError,
|
|
fix_yaml_colon_in_strings,
|
|
fix_faulty_yaml,
|
|
)
|
|
|
|
|
|
# Helper function to get test data paths
|
|
def get_test_data_path(filename):
|
|
base_dir = os.path.dirname(os.path.abspath(__file__))
|
|
return os.path.join(base_dir, "data", "util", "data", filename)
|
|
|
|
|
|
@pytest.fixture
|
|
def mock_client_and_prompt():
|
|
"""Create mock client and prompt for extract_data_auto tests."""
|
|
client = MagicMock()
|
|
prompt_cls = MagicMock()
|
|
|
|
# Mock the extract_data_with_ai_fallback to just use extract_data
|
|
async def mock_extract_with_ai(client, text, prompt_cls, schema_format):
|
|
# Wrap in codeblock format and use existing extract_data
|
|
wrapped = f"```{schema_format}\n{text}\n```"
|
|
return extract_data(wrapped, schema_format)
|
|
|
|
# Patch the function during tests
|
|
original_func = talemate.util.data.extract_data_with_ai_fallback
|
|
talemate.util.data.extract_data_with_ai_fallback = mock_extract_with_ai
|
|
|
|
yield client, prompt_cls
|
|
|
|
# Restore original function
|
|
talemate.util.data.extract_data_with_ai_fallback = original_func
|
|
|
|
|
|
def test_json_encoder():
|
|
"""Test JSONEncoder handles unknown types by converting to string."""
|
|
|
|
class CustomObject:
|
|
def __str__(self):
|
|
return "CustomObject"
|
|
|
|
# Create an object of a custom class
|
|
custom_obj = CustomObject()
|
|
|
|
# Encode it using JSONEncoder
|
|
encoded = json.dumps({"obj": custom_obj}, cls=JSONEncoder)
|
|
|
|
# Check if the object was converted to a string
|
|
assert encoded == '{"obj": "CustomObject"}'
|
|
|
|
|
|
def test_fix_faulty_json():
|
|
"""Test fix_faulty_json function with various faulty JSON strings."""
|
|
|
|
# Test adjacent objects - need to wrap in list brackets to make it valid JSON
|
|
fixed = fix_faulty_json('{"a": 1}{"b": 2}')
|
|
assert fixed == '{"a": 1},{"b": 2}'
|
|
# We need to manually wrap it in brackets for the test
|
|
assert json.loads("[" + fixed + "]") == [{"a": 1}, {"b": 2}]
|
|
|
|
# Test trailing commas
|
|
assert json.loads(fix_faulty_json('{"a": 1, "b": 2,}')) == {"a": 1, "b": 2}
|
|
assert json.loads(fix_faulty_json('{"a": [1, 2, 3,]}')) == {"a": [1, 2, 3]}
|
|
|
|
|
|
def test_extract_json():
|
|
"""Test extract_json function to extract JSON from the beginning of a string."""
|
|
# Simple test
|
|
json_str, obj = extract_json('{"name": "test", "value": 42} and some text')
|
|
assert json_str == '{"name": "test", "value": 42}'
|
|
assert obj == {"name": "test", "value": 42}
|
|
|
|
# Test with array
|
|
json_str, obj = extract_json("[1, 2, 3] and some text")
|
|
assert json_str == "[1, 2, 3]"
|
|
assert obj == [1, 2, 3]
|
|
|
|
# Test with whitespace
|
|
json_str, obj = extract_json(' {"name": "test"} and some text')
|
|
assert json_str == '{"name": "test"}'
|
|
assert obj == {"name": "test"}
|
|
|
|
# Test with invalid JSON
|
|
with pytest.raises(ValueError):
|
|
extract_json("This is not JSON")
|
|
|
|
|
|
def test_extract_json_v2_valid():
|
|
"""Test extract_json_v2 with valid JSON in code blocks."""
|
|
# Load test data
|
|
with open(get_test_data_path("valid_json.txt"), "r") as f:
|
|
text = f.read()
|
|
|
|
# Extract JSON
|
|
result = extract_json_v2(text)
|
|
|
|
# Check if we got two unique JSON objects (third is a duplicate)
|
|
assert len(result) == 2
|
|
|
|
# Check if the objects are correct
|
|
expected_first = {
|
|
"name": "Test Object",
|
|
"properties": {"id": 1, "active": True},
|
|
"tags": ["test", "json", "parsing"],
|
|
}
|
|
|
|
expected_second = {"name": "Simple Object", "value": 42}
|
|
|
|
assert expected_first in result
|
|
assert expected_second in result
|
|
|
|
|
|
def test_extract_json_v2_invalid():
|
|
"""Test extract_json_v2 raises DataParsingError for invalid JSON."""
|
|
# Load test data
|
|
with open(get_test_data_path("invalid_json.txt"), "r") as f:
|
|
text = f.read()
|
|
|
|
# Try to extract JSON, should raise DataParsingError
|
|
with pytest.raises(DataParsingError):
|
|
extract_json_v2(text)
|
|
|
|
|
|
def test_extract_json_v2_faulty():
|
|
"""Test extract_json_v2 with faulty but fixable JSON."""
|
|
# Load test data
|
|
with open(get_test_data_path("faulty_json.txt"), "r") as f:
|
|
text = f.read()
|
|
|
|
# Try to extract JSON, should successfully fix and extract some objects
|
|
# but might fail on the severely malformed ones
|
|
try:
|
|
result = extract_json_v2(text)
|
|
# If it manages to fix all JSON, verify the results
|
|
assert len(result) > 0
|
|
except DataParsingError:
|
|
# This is also acceptable if some JSON is too broken to fix
|
|
pass
|
|
|
|
|
|
def test_data_parsing_error():
|
|
"""Test the DataParsingError class."""
|
|
# Create a DataParsingError with a message and data
|
|
test_data = '{"broken": "json"'
|
|
error = DataParsingError("Test error message", test_data)
|
|
|
|
# Check properties
|
|
assert error.message == "Test error message"
|
|
assert error.data == test_data
|
|
assert str(error) == "Test error message"
|
|
|
|
|
|
def test_extract_json_v2_multiple():
|
|
"""Test extract_json_v2 with multiple JSON objects including duplicates."""
|
|
# Load test data
|
|
with open(get_test_data_path("multiple_json.txt"), "r") as f:
|
|
text = f.read()
|
|
|
|
# Extract JSON
|
|
result = extract_json_v2(text)
|
|
|
|
# Check if we got the correct number of unique objects (3 unique out of 5 total)
|
|
assert len(result) == 3
|
|
|
|
# Define expected objects
|
|
expected_objects = [
|
|
{"id": 1, "name": "First Object", "tags": ["one", "first", "primary"]},
|
|
{"id": 2, "name": "Second Object", "tags": ["two", "second"]},
|
|
{
|
|
"id": 3,
|
|
"name": "Third Object",
|
|
"metadata": {"created": "2023-01-01", "version": 1.0},
|
|
"active": True,
|
|
},
|
|
]
|
|
|
|
# Check if all expected objects are in the result
|
|
for expected in expected_objects:
|
|
assert expected in result
|
|
|
|
# Verify that each object appears exactly once (no duplicates)
|
|
id_counts = {}
|
|
for obj in result:
|
|
id_counts[obj["id"]] = id_counts.get(obj["id"], 0) + 1
|
|
|
|
# Each ID should appear exactly once
|
|
for id_val, count in id_counts.items():
|
|
assert count == 1, (
|
|
f"Object with ID {id_val} appears {count} times (should be 1)"
|
|
)
|
|
|
|
|
|
def test_extract_yaml_v2_valid():
|
|
"""Test extract_yaml_v2 with valid YAML in code blocks."""
|
|
# Load test data
|
|
with open(get_test_data_path("valid_yaml.txt"), "r") as f:
|
|
text = f.read()
|
|
|
|
# Extract YAML
|
|
result = extract_yaml_v2(text)
|
|
|
|
# Check if we got two unique YAML objects (third is a duplicate)
|
|
assert len(result) == 2
|
|
|
|
# Check if the objects are correct
|
|
expected_first = {
|
|
"name": "Test Object",
|
|
"properties": {"id": 1, "active": True},
|
|
"tags": ["test", "yaml", "parsing"],
|
|
}
|
|
|
|
expected_second = {"simple_name": "Simple Object", "value": 42}
|
|
|
|
assert expected_first in result
|
|
assert expected_second in result
|
|
|
|
|
|
def test_extract_yaml_v2_invalid():
|
|
"""Test extract_yaml_v2 raises DataParsingError for invalid YAML."""
|
|
# Load test data
|
|
with open(get_test_data_path("invalid_yaml.txt"), "r") as f:
|
|
text = f.read()
|
|
|
|
# Try to extract YAML, should raise DataParsingError
|
|
with pytest.raises(DataParsingError):
|
|
extract_yaml_v2(text)
|
|
|
|
|
|
def test_extract_yaml_v2_multiple():
|
|
"""Test extract_yaml_v2 with multiple YAML objects including duplicates."""
|
|
# Load test data
|
|
with open(get_test_data_path("multiple_yaml.txt"), "r") as f:
|
|
text = f.read()
|
|
|
|
# Extract YAML
|
|
result = extract_yaml_v2(text)
|
|
|
|
# Check if we got the correct number of unique objects (3 unique out of 5 total)
|
|
assert len(result) == 3
|
|
|
|
# Get the objects by ID for easier assertions
|
|
objects_by_id = {obj["id"]: obj for obj in result}
|
|
|
|
# Check for object 1
|
|
assert objects_by_id[1]["name"] == "First Object"
|
|
assert objects_by_id[1]["tags"] == ["one", "first", "primary"]
|
|
|
|
# Check for object 2
|
|
assert objects_by_id[2]["name"] == "Second Object"
|
|
assert objects_by_id[2]["tags"] == ["two", "second"]
|
|
|
|
# Check for object 3 - note that the date is parsed as a date object by YAML
|
|
assert objects_by_id[3]["name"] == "Third Object"
|
|
assert objects_by_id[3]["active"] is True
|
|
assert "created" in objects_by_id[3]["metadata"]
|
|
|
|
# Verify that each object ID appears exactly once (no duplicates)
|
|
id_counts = {}
|
|
for obj in result:
|
|
id_counts[obj["id"]] = id_counts.get(obj["id"], 0) + 1
|
|
|
|
# Each ID should appear exactly once
|
|
for id_val, count in id_counts.items():
|
|
assert count == 1, (
|
|
f"Object with ID {id_val} appears {count} times (should be 1)"
|
|
)
|
|
|
|
|
|
def test_extract_yaml_v2_multiple_documents():
|
|
"""Test extract_yaml_v2 with multiple YAML documents in a single code block."""
|
|
# Load test data from file
|
|
with open(get_test_data_path("multiple_yaml_documents.txt"), "r") as f:
|
|
test_data = f.read()
|
|
|
|
# Extract YAML
|
|
result = extract_yaml_v2(test_data)
|
|
|
|
# Check if we got all three documents
|
|
assert len(result) == 3
|
|
|
|
# Check if the objects are correct
|
|
objects_by_id = {obj["id"]: obj for obj in result}
|
|
|
|
assert objects_by_id[1]["name"] == "First Document"
|
|
assert "first" in objects_by_id[1]["tags"]
|
|
|
|
assert objects_by_id[2]["name"] == "Second Document"
|
|
assert "secondary" in objects_by_id[2]["tags"]
|
|
|
|
assert objects_by_id[3]["name"] == "Third Document"
|
|
assert objects_by_id[3]["active"] is True
|
|
|
|
|
|
def test_extract_yaml_v2_without_separators():
|
|
"""Test extract_yaml_v2 with multiple YAML documents without --- separators."""
|
|
# Load test data from file
|
|
with open(get_test_data_path("multiple_yaml_without_separators.txt"), "r") as f:
|
|
test_data = f.read()
|
|
|
|
# Extract YAML
|
|
result = extract_yaml_v2(test_data)
|
|
|
|
# Check if we got all three nested documents
|
|
assert len(result) == 3
|
|
|
|
# Create a dictionary of documents by name for easy testing
|
|
docs_by_name = {doc["name"]: doc for doc in result}
|
|
|
|
# Verify that all three documents are correctly parsed
|
|
assert "First Document" in docs_by_name
|
|
assert docs_by_name["First Document"]["id"] == 1
|
|
assert "first" in docs_by_name["First Document"]["tags"]
|
|
|
|
assert "Second Document" in docs_by_name
|
|
assert docs_by_name["Second Document"]["id"] == 2
|
|
assert "secondary" in docs_by_name["Second Document"]["tags"]
|
|
|
|
assert "Third Document" in docs_by_name
|
|
assert docs_by_name["Third Document"]["id"] == 3
|
|
assert docs_by_name["Third Document"]["active"] is True
|
|
|
|
|
|
def test_extract_json_v2_multiple_objects():
|
|
"""Test extract_json_v2 with multiple JSON objects in a single code block."""
|
|
# Load test data from file
|
|
with open(get_test_data_path("multiple_json_objects.txt"), "r") as f:
|
|
test_data = f.read()
|
|
|
|
# Extract JSON
|
|
result = extract_json_v2(test_data)
|
|
|
|
# Check if we got all three objects
|
|
assert len(result) == 3
|
|
|
|
# Check if the objects are correct
|
|
objects_by_id = {obj["id"]: obj for obj in result}
|
|
|
|
assert objects_by_id[1]["name"] == "First Object"
|
|
assert objects_by_id[1]["type"] == "test"
|
|
|
|
assert objects_by_id[2]["name"] == "Second Object"
|
|
assert objects_by_id[2]["values"] == [1, 2, 3]
|
|
|
|
assert objects_by_id[3]["name"] == "Third Object"
|
|
assert objects_by_id[3]["active"] is True
|
|
assert objects_by_id[3]["metadata"]["created"] == "2023-05-15"
|
|
|
|
|
|
def test_fix_yaml_colon_in_strings():
|
|
"""Test fix_yaml_colon_in_strings with problematic YAML containing unquoted colons."""
|
|
# Load test data from file
|
|
with open(get_test_data_path("yaml_with_colons.txt"), "r") as f:
|
|
problematic_yaml = f.read()
|
|
|
|
# Extract YAML from the code block
|
|
problematic_yaml = problematic_yaml.split("```")[1]
|
|
if problematic_yaml.startswith("yaml"):
|
|
problematic_yaml = problematic_yaml[4:].strip()
|
|
|
|
# Fix the YAML
|
|
fixed_yaml = fix_yaml_colon_in_strings(problematic_yaml)
|
|
|
|
# Parse the fixed YAML to check it works
|
|
parsed = yaml.safe_load(fixed_yaml)
|
|
|
|
# Check the structure and content is preserved
|
|
assert parsed["calls"][0]["name"] == "act"
|
|
assert parsed["calls"][0]["arguments"]["name"] == "Kaira"
|
|
assert (
|
|
"I can see you're scared, Elmer"
|
|
in parsed["calls"][0]["arguments"]["instructions"]
|
|
)
|
|
|
|
|
|
def test_fix_faulty_yaml():
|
|
"""Test fix_faulty_yaml with various problematic YAML constructs."""
|
|
# Load test data from file
|
|
with open(get_test_data_path("yaml_list_with_colons.txt"), "r") as f:
|
|
problematic_yaml = f.read()
|
|
|
|
# Extract YAML from the code block
|
|
problematic_yaml = problematic_yaml.split("```")[1]
|
|
if problematic_yaml.startswith("yaml"):
|
|
problematic_yaml = problematic_yaml[4:].strip()
|
|
|
|
# Fix the YAML
|
|
fixed_yaml = fix_faulty_yaml(problematic_yaml)
|
|
|
|
# Parse the fixed YAML to check it works
|
|
parsed = yaml.safe_load(fixed_yaml)
|
|
|
|
# Check the structure and content is preserved
|
|
assert len(parsed["instructions_list"]) == 2
|
|
# The content will be the full string with colons in it now
|
|
assert "Run to the door" in parsed["instructions_list"][0]
|
|
assert "Wait for me!" in parsed["instructions_list"][0]
|
|
assert "Look around" in parsed["instructions_list"][1]
|
|
assert "Is there another way out?" in parsed["instructions_list"][1]
|
|
|
|
|
|
def test_extract_yaml_v2_with_colons():
|
|
"""Test extract_yaml_v2 correctly processes YAML with problematic colons in strings."""
|
|
# Load test data containing YAML code blocks with problematic colons
|
|
with open(get_test_data_path("yaml_block_with_colons.txt"), "r") as f:
|
|
text = f.read()
|
|
|
|
# Extract YAML
|
|
result = extract_yaml_v2(text)
|
|
|
|
# Check if we got the two YAML objects
|
|
assert len(result) == 2
|
|
|
|
# Find the objects by their structure
|
|
calls_obj = None
|
|
instructions_obj = None
|
|
for obj in result:
|
|
if "calls" in obj:
|
|
calls_obj = obj
|
|
elif "instructions_list" in obj:
|
|
instructions_obj = obj
|
|
|
|
# Verify both objects were found
|
|
assert calls_obj is not None, "Could not find the 'calls' object"
|
|
assert instructions_obj is not None, "Could not find the 'instructions_list' object"
|
|
|
|
# Check the structure and content of the first object (calls)
|
|
assert calls_obj["calls"][0]["name"] == "act"
|
|
assert calls_obj["calls"][0]["arguments"]["name"] == "Kaira"
|
|
|
|
# Check that the problematic part with the colon is preserved
|
|
instructions = calls_obj["calls"][0]["arguments"]["instructions"]
|
|
assert "Speak in a calm, soothing tone and say:" in instructions
|
|
assert "I can see you're scared, Elmer" in instructions
|
|
|
|
# Check the second object (instructions_list)
|
|
assert len(instructions_obj["instructions_list"]) == 2
|
|
assert "Run to the door" in instructions_obj["instructions_list"][0]
|
|
assert "Wait for me!" in instructions_obj["instructions_list"][0]
|
|
assert "Look around" in instructions_obj["instructions_list"][1]
|
|
assert "Is there another way out?" in instructions_obj["instructions_list"][1]
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_extract_data_auto_mixed_formats(mock_client_and_prompt):
|
|
"""Test extract_data_auto with mixed JSON and YAML codeblocks."""
|
|
client, prompt_cls = mock_client_and_prompt
|
|
|
|
# Load test data
|
|
with open(get_test_data_path("mixed_formats.txt"), "r") as f:
|
|
mixed_text = f.read()
|
|
|
|
result = await extract_data_auto(mixed_text, client, prompt_cls)
|
|
|
|
# Should extract all three objects
|
|
assert len(result) == 3
|
|
|
|
# Verify objects by ID
|
|
objects_by_id = {obj["id"]: obj for obj in result}
|
|
|
|
assert objects_by_id[1]["name"] == "JSON Object"
|
|
assert objects_by_id[1]["type"] == "json"
|
|
|
|
assert objects_by_id[2]["name"] == "YAML Object"
|
|
assert objects_by_id[2]["type"] == "yaml"
|
|
assert "test" in objects_by_id[2]["tags"]
|
|
|
|
assert objects_by_id[3]["name"] == "Second JSON"
|
|
assert objects_by_id[3]["active"] is True
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_extract_data_auto_untyped_codeblocks(mock_client_and_prompt):
|
|
"""Test extract_data_auto with untyped codeblocks using default format."""
|
|
# Test with JSON default
|
|
with open(get_test_data_path("untyped_codeblocks_json.txt"), "r") as f:
|
|
json_text = f.read()
|
|
|
|
client, prompt_cls = mock_client_and_prompt
|
|
result = await extract_data_auto(
|
|
json_text, client, prompt_cls, schema_format="json"
|
|
)
|
|
assert len(result) == 2
|
|
|
|
names = {obj["name"] for obj in result}
|
|
assert "Untyped JSON" in names
|
|
assert "Another JSON" in names
|
|
|
|
# Test with YAML default
|
|
with open(get_test_data_path("untyped_codeblocks_yaml.txt"), "r") as f:
|
|
yaml_text = f.read()
|
|
|
|
result = await extract_data_auto(
|
|
yaml_text, client, prompt_cls, schema_format="yaml"
|
|
)
|
|
assert len(result) == 2
|
|
|
|
names = {obj["name"] for obj in result}
|
|
assert "Untyped YAML" in names
|
|
assert "Another YAML" in names
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_extract_data_auto_bare_codeblock(mock_client_and_prompt):
|
|
"""Test extract_data_auto with entire text being just a codeblock."""
|
|
# JSON codeblock
|
|
json_codeblock = """```json
|
|
{"name": "Bare JSON", "id": 123, "active": true}
|
|
```"""
|
|
|
|
client, prompt_cls = mock_client_and_prompt
|
|
result = await extract_data_auto(json_codeblock, client, prompt_cls)
|
|
assert len(result) == 1
|
|
assert result[0]["name"] == "Bare JSON"
|
|
assert result[0]["id"] == 123
|
|
|
|
# YAML codeblock
|
|
yaml_codeblock = """```yaml
|
|
name: Bare YAML
|
|
id: 456
|
|
active: false
|
|
tags:
|
|
- bare
|
|
- yaml
|
|
```"""
|
|
|
|
result = await extract_data_auto(yaml_codeblock, client, prompt_cls)
|
|
assert len(result) == 1
|
|
assert result[0]["name"] == "Bare YAML"
|
|
assert result[0]["id"] == 456
|
|
assert "bare" in result[0]["tags"]
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_extract_data_auto_raw_data(mock_client_and_prompt):
|
|
"""Test extract_data_auto with raw data structures (no codeblocks)."""
|
|
# Raw JSON
|
|
raw_json = '{"name": "Raw JSON", "value": 100}'
|
|
client, prompt_cls = mock_client_and_prompt
|
|
result = await extract_data_auto(raw_json, client, prompt_cls, schema_format="json")
|
|
assert len(result) == 1
|
|
assert result[0]["name"] == "Raw JSON"
|
|
assert result[0]["value"] == 100
|
|
|
|
# Raw YAML
|
|
raw_yaml = """name: Raw YAML
|
|
value: 200
|
|
metadata:
|
|
created: 2023-01-01
|
|
version: 1.0"""
|
|
|
|
result = await extract_data_auto(raw_yaml, client, prompt_cls, schema_format="yaml")
|
|
assert len(result) == 1
|
|
assert result[0]["name"] == "Raw YAML"
|
|
assert result[0]["value"] == 200
|
|
# YAML parser converts date strings to date objects
|
|
assert str(result[0]["metadata"]["created"]) == "2023-01-01"
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_extract_data_auto_empty_codeblocks(mock_client_and_prompt):
|
|
"""Test extract_data_auto skips empty codeblocks."""
|
|
# Load test data
|
|
with open(get_test_data_path("empty_codeblocks.txt"), "r") as f:
|
|
text_with_empty = f.read()
|
|
|
|
client, prompt_cls = mock_client_and_prompt
|
|
result = await extract_data_auto(text_with_empty, client, prompt_cls)
|
|
assert len(result) == 2
|
|
|
|
objects_by_id = {obj["id"]: obj for obj in result}
|
|
assert objects_by_id[1]["name"] == "Valid"
|
|
assert objects_by_id[2]["name"] == "Valid YAML"
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_extract_data_auto_malformed_blocks(mock_client_and_prompt):
|
|
"""Test extract_data_auto handles malformed blocks gracefully."""
|
|
text_with_malformed = """
|
|
Valid JSON:
|
|
|
|
```json
|
|
{"name": "Valid", "id": 1}
|
|
```
|
|
|
|
Malformed JSON:
|
|
|
|
```json
|
|
{"name": "Broken", "id":
|
|
```
|
|
|
|
Another valid JSON:
|
|
|
|
```json
|
|
{"name": "Also Valid", "id": 2}
|
|
```
|
|
"""
|
|
|
|
client, prompt_cls = mock_client_and_prompt
|
|
result = await extract_data_auto(text_with_malformed, client, prompt_cls)
|
|
# Should extract the 2 valid objects and skip the malformed one
|
|
assert len(result) == 2
|
|
|
|
names = {obj["name"] for obj in result}
|
|
assert "Valid" in names
|
|
assert "Also Valid" in names
|
|
assert "Broken" not in names # Should be skipped
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_extract_data_auto_repairs_faulty_json(mock_client_and_prompt):
|
|
"""Test extract_data_auto can repair faulty JSON blocks."""
|
|
# Load test data
|
|
with open(get_test_data_path("faulty_json_repairable.txt"), "r") as f:
|
|
text_with_faulty = f.read()
|
|
|
|
client, prompt_cls = mock_client_and_prompt
|
|
result = await extract_data_auto(text_with_faulty, client, prompt_cls)
|
|
# Should successfully repair and extract both objects
|
|
assert len(result) == 3 # Two from first block (after repair), one from second
|
|
|
|
# Check that repair worked
|
|
names = {obj["name"] for obj in result if "name" in obj}
|
|
assert "Test" in names
|
|
assert "Another" in names
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_extract_data_auto_yml_identifier(mock_client_and_prompt):
|
|
"""Test extract_data_auto recognizes 'yml' as YAML identifier."""
|
|
yml_text = """
|
|
Data with yml extension:
|
|
|
|
```yml
|
|
name: YML Test
|
|
id: 123
|
|
config:
|
|
enabled: true
|
|
timeout: 30
|
|
```
|
|
"""
|
|
|
|
client, prompt_cls = mock_client_and_prompt
|
|
result = await extract_data_auto(yml_text, client, prompt_cls)
|
|
assert len(result) == 1
|
|
assert result[0]["name"] == "YML Test"
|
|
assert result[0]["id"] == 123
|
|
assert result[0]["config"]["enabled"] is True
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_extract_data_auto_invalid_raw_data(mock_client_and_prompt):
|
|
"""Test extract_data_auto raises DataParsingError for invalid raw data."""
|
|
# Invalid raw JSON
|
|
invalid_json = '{"name": "Broken JSON", "id":'
|
|
|
|
with pytest.raises(DataParsingError) as exc_info:
|
|
client, prompt_cls = mock_client_and_prompt
|
|
await extract_data_auto(invalid_json, client, prompt_cls, schema_format="json")
|
|
|
|
assert "Failed to parse raw JSON data" in str(exc_info.value)
|
|
|
|
# Invalid raw YAML
|
|
invalid_yaml = """name: Broken YAML
|
|
- invalid: structure
|
|
without: proper indentation"""
|
|
|
|
with pytest.raises(DataParsingError) as exc_info:
|
|
await extract_data_auto(invalid_yaml, client, prompt_cls, schema_format="yaml")
|
|
|
|
assert "Failed to parse raw YAML data" in str(exc_info.value)
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_extract_data_auto_unsupported_format(mock_client_and_prompt):
|
|
"""Test extract_data_auto raises DataParsingError for unsupported formats."""
|
|
text = '{"name": "test"}'
|
|
|
|
with pytest.raises(DataParsingError) as exc_info:
|
|
client, prompt_cls = mock_client_and_prompt
|
|
await extract_data_auto(text, client, prompt_cls, schema_format="xml")
|
|
|
|
assert "Failed to parse raw XML data" in str(exc_info.value)
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_extract_data_auto_multiple_objects_in_single_block(
|
|
mock_client_and_prompt,
|
|
):
|
|
"""Test extract_data_auto handles multiple objects within a single codeblock."""
|
|
multiple_json = """
|
|
```json
|
|
{"id": 1, "name": "First"}
|
|
{"id": 2, "name": "Second"}
|
|
{"id": 3, "name": "Third"}
|
|
```
|
|
"""
|
|
|
|
client, prompt_cls = mock_client_and_prompt
|
|
result = await extract_data_auto(multiple_json, client, prompt_cls)
|
|
assert len(result) == 3
|
|
|
|
objects_by_id = {obj["id"]: obj for obj in result}
|
|
assert objects_by_id[1]["name"] == "First"
|
|
assert objects_by_id[2]["name"] == "Second"
|
|
assert objects_by_id[3]["name"] == "Third"
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_extract_data_with_ai_fallback_json_without_codeblock():
|
|
"""Test extract_data_with_ai_fallback when AI returns JSON without code block."""
|
|
# Mock client and prompt
|
|
client = MagicMock()
|
|
client.data_format = "json"
|
|
prompt_cls = MagicMock()
|
|
|
|
# Simulate AI returning corrected JSON without code block
|
|
async def mock_request(*args, **kwargs):
|
|
return '{"name": "Fixed JSON", "id": 123, "active": true}'
|
|
|
|
prompt_cls.request = mock_request
|
|
|
|
# Malformed JSON that cannot be auto-fixed (invalid structure)
|
|
malformed_json = '{"name": "Broken" this is broken, "id": 123}'
|
|
|
|
result = await extract_data_with_ai_fallback(
|
|
client, malformed_json, prompt_cls, "json"
|
|
)
|
|
|
|
# Should successfully extract the JSON even without code block
|
|
assert len(result) == 1
|
|
assert result[0]["name"] == "Fixed JSON"
|
|
assert result[0]["id"] == 123
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_extract_data_with_ai_fallback_json_with_codeblock():
|
|
"""Test extract_data_with_ai_fallback when AI returns JSON with code block."""
|
|
# Mock client and prompt
|
|
client = MagicMock()
|
|
client.data_format = "json"
|
|
prompt_cls = MagicMock()
|
|
|
|
# Simulate AI returning corrected JSON with code block
|
|
async def mock_request(*args, **kwargs):
|
|
return '```json\n{"name": "Fixed JSON", "id": 456, "active": false}\n```'
|
|
|
|
prompt_cls.request = mock_request
|
|
|
|
# Malformed JSON that will trigger AI fallback
|
|
malformed_json = '{"name": "Broken", "id": 456,'
|
|
|
|
result = await extract_data_with_ai_fallback(
|
|
client, malformed_json, prompt_cls, "json"
|
|
)
|
|
|
|
# Should successfully extract the JSON
|
|
assert len(result) == 1
|
|
assert result[0]["name"] == "Fixed JSON"
|
|
assert result[0]["id"] == 456
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_extract_data_with_ai_fallback_yaml_without_codeblock():
|
|
"""Test extract_data_with_ai_fallback when AI returns YAML without code block."""
|
|
# Mock client and prompt
|
|
client = MagicMock()
|
|
client.data_format = "yaml"
|
|
prompt_cls = MagicMock()
|
|
|
|
# Simulate AI returning corrected YAML without code block
|
|
async def mock_request(*args, **kwargs):
|
|
return """name: Fixed YAML
|
|
id: 789
|
|
active: true
|
|
tags:
|
|
- test
|
|
- fixed"""
|
|
|
|
prompt_cls.request = mock_request
|
|
|
|
# Malformed YAML that will trigger AI fallback
|
|
malformed_yaml = """name: Broken
|
|
id: 789
|
|
active: true"""
|
|
|
|
result = await extract_data_with_ai_fallback(
|
|
client, malformed_yaml, prompt_cls, "yaml"
|
|
)
|
|
|
|
# Should successfully extract the YAML even without code block
|
|
assert len(result) == 1
|
|
assert result[0]["name"] == "Fixed YAML"
|
|
assert result[0]["id"] == 789
|
|
assert result[0]["active"] is True
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_extract_data_with_ai_fallback_yaml_with_codeblock():
|
|
"""Test extract_data_with_ai_fallback when AI returns YAML with code block."""
|
|
# Mock client and prompt
|
|
client = MagicMock()
|
|
client.data_format = "yaml"
|
|
prompt_cls = MagicMock()
|
|
|
|
# Simulate AI returning corrected YAML with code block
|
|
async def mock_request(*args, **kwargs):
|
|
return """```yaml
|
|
name: Fixed YAML
|
|
id: 999
|
|
active: false
|
|
```"""
|
|
|
|
prompt_cls.request = mock_request
|
|
|
|
# Malformed YAML that will trigger AI fallback
|
|
malformed_yaml = """name: Broken
|
|
id: 999
|
|
active: false"""
|
|
|
|
result = await extract_data_with_ai_fallback(
|
|
client, malformed_yaml, prompt_cls, "yaml"
|
|
)
|
|
|
|
# Should successfully extract the YAML
|
|
assert len(result) == 1
|
|
assert result[0]["name"] == "Fixed YAML"
|
|
assert result[0]["id"] == 999
|
|
assert result[0]["active"] is False
|