Files
talemate/tests/test_utils_data.py
veguAI 89d16ae513 0.33.0 (#229)
* linting

* Add cleanup function for recent scenes in config to remove non-existent paths

* remove leghacy world state manager buttons

* move world state scene tools into sub component

* linting

* move module properties to navigation drawer

* Update icons in NodeEditorLibrary and NodeEditorModuleProperties for improved UI clarity

* prompt tweaks

* director chat prompt simplifications

* more prompt fixes

* Enhance type hints for duration conversion functions in time.py

* narrate time action now has access to response length instructions

* Add IsoDateDuration node for ISO 8601 interval string construction

* Update advance_time method to include return type annotation and return message

* Add AdvanceTime node to world state for time advancement with duration and narration instructions

* linting

* Add agent state exclusions to changelog with a TODO for module migration

* Add message emission for actor, narrator, and scene analysis guidance in respective components. Enhance AgentMessages and SceneTools for better message handling and visual feedback.

* Remove agent messages from state when opening agent message view in SceneTools component.

* linting

* openroute fetch models on key set

* Add input history functionality to message input in TalemateApp component. Implement keyboard shortcuts for navigating history (Ctrl+Up/Down) and limit history to the last 10 messages. Update message sending logic to store messages in history.

* Update message input hint in TalemateApp component to include keyboard shortcuts for navigating input history (Ctrl+Up/Down).

* node updates

* unified data extraction function

* prompt tweaks

* Add gamestate context support in BuildPrompt and corresponding template. Introduced new property for gamestate context and updated rendering logic to include gamestate information in prompts.

* Refactor Prompt class by removing LoopedPrompt and cleaning up related methods. Update data response parsing to streamline functionality and improve clarity. Adjust imports accordingly.

* Add 'data_multiple' property to GenerateResponse class to allow multiple data structures in responses. Update output socket type for 'data_obj' to support both dict and list formats.

* Add DictUpdate node

* Add UnpackGameState node to retrieve and unpack game state variables

* gamestate nodes

* linting

* Enhance scene view toggle functionality to support shift-click behavior for closing all drawers when hiding the scene view.

* immutable scenes should reset context db on load

* linting

* node updates

* prompt tweaks

* Add context type output and filtering for creative context ID meta entries in PathToContextID and ContextIDMetaEntries nodes

* Add string replacement functionality and Jinja2 formatting support in nodes. Introduced 'old' and 'new' properties for substring replacement in the Replace node, and added a new Jinja2Format node for template rendering using jinja2.

* Add additional outputs for context validation in ValidateContextIDItem node, including context type, context value, and name.

* prompt tweaks

* node adjustments

* linting

* Add data_expected attribute to Focal and Prompt classes for enhanced response handling

* node updates

* node updates

* node updates

* prompt tweaks

* director summary return appropriately on no action taken

* Enhance action handling in DirectorChatMixin by skipping actions when a question is present in the parsed response, ensuring better response accuracy.

* Enhance ConfirmActionPrompt component by adding anchorTop prop for dynamic alignment and adjusting icon size and color for improved UI consistency.

* anchor clear chat confirm to top

* responsive layout fixes in template editors

* linting

* relock

* Add scene progression guidance to chat-common-tasks template

* Refactor push_history method to be asynchronous across multiple agents and scenes, ensuring consistent handling of message history updates.

* Update chat instructions to clarify user intent considerations and enhance decisiveness in responses. Added guidance on distinguishing between scene progression and background changes, and refined analysis requirements for user interactions.

* Enhance DirectorConsoleChatsToolbar by adding a usage cheat sheet tooltip for user guidance and refining the Clear Chat button's UI for better accessibility.

* store character data at unified point

* fix button

* fix world editor auto sync

* Shared context 2 (#19)

Shared context

* Refactor NodeEditorLibrary to improve search functionality and debounce input handling. Updated v-text-field model and added a watcher for search input to enhance performance.

* Refactor NodeEditor and TalemateApp components to enhance UI interactions. Removed the exit creative mode button from NodeEditor and updated tooltips for clarity. Adjusted app bar navigation icons for better accessibility and added functionality to switch between node editor and creative mode.

* comment

* Character.update deserialize voice value correctly

* Enhance SharedContext.update_to_scene method to properly add or update character data in the scene based on existence checks. This improves the synchronization of character states between shared context and scene.

* shared context static history support
fix context memory db imports to always import

* Update WorldStateManagerSceneSharedContext.vue to clarify sharing of character, world entries, and history across connected scenes.

* linting

* Enhance chat modes by adding 'nospoilers' option to DirectorChat and related payloads. Update chat instructions to reflect new mode behavior and improve UI to support mode-specific icons and colors in the DirectorConsoleChatsToolbar.

* Comment out 'repetition_penalty_range' in TabbyAPIClient to prevent unexpected "<unk><unk> .." responses. Further investigation needed.

* linting

* Add active_characters and intro_instructions to Inheritance model; implement intro generation in load_scene_from_data. Update WorldStateManagerSceneSharedContext.vue to enhance new scene creation dialog with character selection and premise instructions.

* rename inheritance to scene initialization

* linting

* Update WorldStateManagerSceneSharedContext.vue to conditionally display alert based on scene saving status and new scene creation state.

* Refine messages for shared context checkboxes in WorldStateManagerCharacter and WorldStateManagerWorldEntries components for clarity.

* Add scene title generation to load process and update contextual generation template. Introduced a new method in AssistantMixin for generating scene titles, ensuring titles are concise and free of special characters. Updated load_scene_from_data to assign generated titles to scenes.

* linting

* Refactor GameState component to integrate Codemirror for JSON editing, replacing the previous treeview structure. Implement validation for JSON input and enhance error handling. Remove unused methods and streamline state management.

* Add lastLoadedJSON property to GameState component for change detection. Update validation logic to prevent unnecessary updates when game state has not changed.

* Remove status emission for gameplay switch in CmdSetEnvironmentToScene class.

* allow individual sharing of attributes and details

* linting

* Remove redundant question handling logic in DirectorChatMixin to streamline action selection process.

* Update EXTERNAL_DESCRIPTION in TabbyAPI client to include notes on EXL3 model sensitivity to inference parameters. Adjust handling of 'repetition_penalty_range' in parameter list for clarity.

* director chat support remove message and regenerate message

* Refactor ConfirmActionInline component to improve button rendering logic. Introduced 'size' prop for button customization and added 'comfortable' density option. Simplified icon handling with computed property for better clarity.

* linting

* node updates

* Add appBusy prop to DirectorConsoleChats and DirectorConsoleChatsToolbar components to manage button states during busy operations.

* Refactor DirectorChatMixin to utilize standalone utility functions for parsing response sections and extracting action blocks. This improves code clarity and maintainability. Added tests for new utility functions in test_utils_prompt.py to ensure correct functionality.

* Update clear chat button logic to consider appBusy state in DirectorConsoleChatsToolbar component, enhancing user experience during busy operations.

* linting

* Remove plan.md

* Add chat template identifier support and error handling in ModelPrompt class

- Implemented logic to check for 'chat_template.jinja2' in Hugging Face repository.
- Added new template identifiers: GraniteIdentifier and GLMIdentifier.
- Enhanced error handling to avoid logging 404 errors for missing templates.
- Introduced Granite.jinja2 template file for prompt structure.

* node fixes

* remove debug msg

* Enhance error handling in DynamicInstruction class by enforcing header requirement and ensuring content defaults to an empty string if not provided.

* recet scene message visibility on scene load

* prompt tweaks

* Enhance data extraction in Focal class by adding a fallback mechanism. Implemented additional error handling to attempt data extraction from a fenced block if the initial extraction fails, improving robustness in handling responses.

* linting

* node fixes

* Add relative_to_root function for path resolution and update node export logic

- Introduced a new function `relative_to_root` in path.py to resolve paths relative to the TALEMATE_ROOT.
- Updated the `export_node_definitions` function in registry.py to use `relative_to_root` for module path resolution.
- Added a check to skip non-selectable node definitions in litegraphUtils.js during registration.

* show icons

* Improve error handling in export_node_definitions by adding a try-except block for module path resolution. Log a warning if the relative path conversion fails.

* typo

* Refactor base_attributes type in Character model to a more generic dict type for improved flexibility

* relock

* ensure character gets added to character_data

* prompt tweaks

* linting

* properly activate characters

* activate needs to happen explicitly now and deactivated is the default

* missing arg

* avoid changed size error

* Refactor character removal logic in shared context to prevent deletion; characters are now only marked as non-shared.

* Add update_from_scene method calls in SharedContextMixin for scene synchronization

* Add ensure_changelogs_for_all_scenes function to manage changelog files for all scenes; integrate it into the server run process.

* Enhance backup restore functionality by adding base and latest snapshot options; improve UI with clearer labels and alerts for restore actions.

* Update _apply_delta function to enhance delta application handling by adding parameters for error logging and force application of changes on non-existent paths.

* Skip processing of changelog files in _list_files_and_directories function to prevent unnecessary inclusion in file listings.

* Update IntroRecentScenes.vue to use optional chaining for selectedScene properties and enhance backup timestamp display with revision info.

* linting

* Refactor source entry attribute access in collect_source_entries function to use getattr for optional attributes, improving robustness.

* Implement logic to always show scene view in scene mode within TalemateApp.vue, enhancing user experience during scene interactions.

* prompt tweaks

* prompt tweaks

* Update TalemateApp.vue to set the active tab to 'main' when switching to the node editor, improving navigation consistency.

* Add active frontend websocket handler management in websocket_endpoint

* agent websocket handler node support

* Refactor init_nodes method in DirectorAgent to call superclass method and rename chat initialization method in DirectorChatMixin for clarity.

* Add characters output to ContextHistory node to track active participants in the scene

* Add Agent Websocket Handler option to Node Editor Library with corresponding icons and labels

* Add check for node selectability in NodeEditorNodeSearch component to filter search results accordingly.

* Add SummarizeWebsocketHandler to handle summarize actions and integrate it into SummarizeAgent

* nodes

* Add data property to QueueResponse class for websocket communication and update run method to include action and data in output values.

* Update manual context handling in WorldStateManager to include shared property from existing context

* Enhance GetWorldEntry node to include 'shared' property in output values from world entry context

* Update scene loading to allow setting scene ID from data and include ID in scene serialization

* Update icon for AgentWebsocketHandler in NodeEditorLibrary component to mdi-web-box

* Refactor WorldStateManager components to enhance history management and sharing capabilities. Added summarized history titles, improved UI for sharing static history, and integrated scene summarization functionality. Removed deprecated methods related to shared context settings.

* linting

* Change log level from warning to debug for migrate_narrator_source_to_meta error handling in NarratorMessage class.

* Update GLM-no-reasoning template to include <think></think> tag before coercion message for improved prompt structure.

* allow prompt templates to specify reasoning pattern

* Add Seed.jinja2 template for LLM prompts with reasoning patterns and user interaction handling

* Enhance NarratorAgent to support dynamic response length configuration. Updated max generation length from 192 to 256 tokens and introduced a new method to calculate response length. Modified narration methods to accept and utilize response length parameter. Added response length property in GenerateNarrationBase class and updated templates to include response length handling.

* Update response length calculation in RevisionMixin to include token count for improved text processing.

* Refactor response identifier in RevisionMixin to dynamically use calculated response length for improved prompt handling.

* linting

* allow contextual generation of static history entries

* Add is_static property to HistoryEntry for static history entry identification

* Add "static history" option to ContextualGenerate node for enhanced contextual generation capabilities.

* Add CreateStaticArchiveEntry and RemoveStaticArchiveEntry nodes for managing static history entries. Implement input/output properties and error handling for entry creation and deletion.

* nodes updated

* linting

* Add assets field to SceneInitialization model and update load_scene_from_data function to handle scene assets. Update WorldStateManagerSceneSharedContext.vue to include assets in scene initialization parameters.

* Refactor CoverImage component to enhance drag-and-drop functionality and improve styling for empty portrait state.

* Add intent_state to SceneInitialization model and update load_scene_from_data function to handle intent state. Introduce story_intent property in Scene class and reset method in SceneIntent class. Update WorldStateManagerSceneSharedContext.vue to include intent state in scene initialization parameters.

* Refactor WorldStateManagerSceneSharedContext.vue to improve cancel functionality by introducing a dedicated cancelCreate method and removing the direct dialog toggle from the Cancel button. This enhances code clarity and maintainability.

* Update SharedContext to use await for set_shared method, ensuring proper asynchronous handling when modifying character sharing status.

* Add MAX_CONTENT_WIDTH constant and update components to use it for consistent max width styling

* fix issue with data structure parsing

* linting

* fix tests

* nodes

* fix update_introduction

* Add building blocks template for story configuration and scene management

* Refactor toggleNavigation method to accept an 'open' parameter for direct control over drawer visibility in TalemateApp.vue

* Update usageCheatSheet text in DirectorConsoleChatsToolbar.vue for clarity and add pre-wrap styling to tooltip

* Add cover image and writing style sections to story and character templates; update chat common tasks with new scene restrictions and user guide reference.

* linting

* relock

* Add EmitWorldEditorSync node to handle world editor synchronization; update WorldStateManager to refresh active tab on sync action.

* Update Anthropic client with new models and adjust default settings; introduce limited parameter models for specific configurations.

* director action  module updates

* direct context update fn

* director action updates

* Update usageCheatSheet in DirectorConsoleChatsToolbar.vue to include recommendation for 100B+ models.

* Remove debug diagnostics from DirectorConsoleChats.vue to clean up console output.

* Update card styles in IntroRecentScenes.vue for improved visual consistency; change card color to grey-darken-3 and adjust text classes for titles and subtitles.

* Update EmitWorldEditorSync node to include websocket passthrough in sync action for improved event handling.

* Increase maximum changelog file size limit from 500KB to 1MB to accommodate larger change logs.

* linting

* director action module updates

* 0.33 added

* Add Nexus agent persona to talemate template and initialize phrases array

* Add support for project-specific grouping in NodeEditorLibrary for templates/modules, enhancing organization of node groups.

* docs

* Enhance NodeEditorLibrary by adding primary color to tree component for improved visibility and user experience.

* docs

* Enhance NewSceneSetupModal to include subtitles for writing styles and director personas, improving context and usability.

* Update agent persona description in WorldStateManagerTemplates to specify current support for director only, enhancing clarity for users.

* Refine agent persona description in WorldStateManagerTemplates to clarify assignment per agent in Scene Settings, maintaining focus on current director-only support.

* fix crash when attempting to delete some clients

* Add TODO comments in finalize_llama3 and finalize_YI methods to indicate removable cruft

* Add lock_template feature to Client configuration and update related components for template management

* linting

* persist client template lock through model changes

* There is no longer a point to enforcing creative mode when there are no characters

* fix direct_narrator character argument

* Update CharacterContextItem to allow 'value' to accept dict type in addition to existing types

* docs

* Update lock_template field in Client model to allow None type in addition to bool

* Remove unused template_file field from Defaults model in Client configuration

* Refactor lock_template field in Client model and ClientModal component to ensure consistent boolean handling

* Add field validator for lock_template in Client model to ensure boolean value is returned

* fix issue where valid data processed in extract_data_with_ai_fallback was not returned

* Update default_player_character assignment in ConfigPlugin to use GamePlayerCharacter schema for improved data validation

* linting

* add heiku 4.5 model and make default

* opse 4.5 isnt a thing

* fix issue where  fork / restore would restore duplicate messages

* improve autocomplete handling when prefill isn't available

* prompt tweaks

* linting

* gracefully handle removed attributes

* Refactor scene reference handling in delete_changelog_files to prevent incorrect deletions. Added a test to verify proper scene reference construction and ensure changelog files are deleted correctly.

* forked scenes reset memory id and are not immutable

* emit_status export rev

* Update RequestInput.vue to handle extra_params more robustly, ensuring defaults are set correctly for input.

* only allow forking on saved messages

* linting

* tweak defaults

* summarizer fire off of push_history.after

* docs

* : in world entry titles will now load correctly

* linting

* docs

* removing base attrib ute or detail also clears it from shared list

* fix issue where cancelling some generations would cause errors

* increase font size

* formatting fixes

* unhandled errors at the loop level should not crash the entire scene

* separate message processing from main loop

* linting

* remove debug cruft

* enhance error logging in background processing to include traceback information

* linting

* nothing to detemrine of no model is sent

* fix some errors during kcpp client deletion

* improve configuration issue alert visibility

* restore input focus after autocomplete

* linting
2025-10-25 14:06:55 +03:00

844 lines
27 KiB
Python

import os
import pytest
import json
import yaml
from unittest.mock import MagicMock
import talemate.util.data
from talemate.util.data import (
fix_faulty_json,
extract_json,
extract_json_v2,
extract_yaml_v2,
extract_data_auto,
extract_data,
extract_data_with_ai_fallback,
JSONEncoder,
DataParsingError,
fix_yaml_colon_in_strings,
fix_faulty_yaml,
)
# Helper function to get test data paths
def get_test_data_path(filename):
base_dir = os.path.dirname(os.path.abspath(__file__))
return os.path.join(base_dir, "data", "util", "data", filename)
@pytest.fixture
def mock_client_and_prompt():
"""Create mock client and prompt for extract_data_auto tests."""
client = MagicMock()
prompt_cls = MagicMock()
# Mock the extract_data_with_ai_fallback to just use extract_data
async def mock_extract_with_ai(client, text, prompt_cls, schema_format):
# Wrap in codeblock format and use existing extract_data
wrapped = f"```{schema_format}\n{text}\n```"
return extract_data(wrapped, schema_format)
# Patch the function during tests
original_func = talemate.util.data.extract_data_with_ai_fallback
talemate.util.data.extract_data_with_ai_fallback = mock_extract_with_ai
yield client, prompt_cls
# Restore original function
talemate.util.data.extract_data_with_ai_fallback = original_func
def test_json_encoder():
"""Test JSONEncoder handles unknown types by converting to string."""
class CustomObject:
def __str__(self):
return "CustomObject"
# Create an object of a custom class
custom_obj = CustomObject()
# Encode it using JSONEncoder
encoded = json.dumps({"obj": custom_obj}, cls=JSONEncoder)
# Check if the object was converted to a string
assert encoded == '{"obj": "CustomObject"}'
def test_fix_faulty_json():
"""Test fix_faulty_json function with various faulty JSON strings."""
# Test adjacent objects - need to wrap in list brackets to make it valid JSON
fixed = fix_faulty_json('{"a": 1}{"b": 2}')
assert fixed == '{"a": 1},{"b": 2}'
# We need to manually wrap it in brackets for the test
assert json.loads("[" + fixed + "]") == [{"a": 1}, {"b": 2}]
# Test trailing commas
assert json.loads(fix_faulty_json('{"a": 1, "b": 2,}')) == {"a": 1, "b": 2}
assert json.loads(fix_faulty_json('{"a": [1, 2, 3,]}')) == {"a": [1, 2, 3]}
def test_extract_json():
"""Test extract_json function to extract JSON from the beginning of a string."""
# Simple test
json_str, obj = extract_json('{"name": "test", "value": 42} and some text')
assert json_str == '{"name": "test", "value": 42}'
assert obj == {"name": "test", "value": 42}
# Test with array
json_str, obj = extract_json("[1, 2, 3] and some text")
assert json_str == "[1, 2, 3]"
assert obj == [1, 2, 3]
# Test with whitespace
json_str, obj = extract_json(' {"name": "test"} and some text')
assert json_str == '{"name": "test"}'
assert obj == {"name": "test"}
# Test with invalid JSON
with pytest.raises(ValueError):
extract_json("This is not JSON")
def test_extract_json_v2_valid():
"""Test extract_json_v2 with valid JSON in code blocks."""
# Load test data
with open(get_test_data_path("valid_json.txt"), "r") as f:
text = f.read()
# Extract JSON
result = extract_json_v2(text)
# Check if we got two unique JSON objects (third is a duplicate)
assert len(result) == 2
# Check if the objects are correct
expected_first = {
"name": "Test Object",
"properties": {"id": 1, "active": True},
"tags": ["test", "json", "parsing"],
}
expected_second = {"name": "Simple Object", "value": 42}
assert expected_first in result
assert expected_second in result
def test_extract_json_v2_invalid():
"""Test extract_json_v2 raises DataParsingError for invalid JSON."""
# Load test data
with open(get_test_data_path("invalid_json.txt"), "r") as f:
text = f.read()
# Try to extract JSON, should raise DataParsingError
with pytest.raises(DataParsingError):
extract_json_v2(text)
def test_extract_json_v2_faulty():
"""Test extract_json_v2 with faulty but fixable JSON."""
# Load test data
with open(get_test_data_path("faulty_json.txt"), "r") as f:
text = f.read()
# Try to extract JSON, should successfully fix and extract some objects
# but might fail on the severely malformed ones
try:
result = extract_json_v2(text)
# If it manages to fix all JSON, verify the results
assert len(result) > 0
except DataParsingError:
# This is also acceptable if some JSON is too broken to fix
pass
def test_data_parsing_error():
"""Test the DataParsingError class."""
# Create a DataParsingError with a message and data
test_data = '{"broken": "json"'
error = DataParsingError("Test error message", test_data)
# Check properties
assert error.message == "Test error message"
assert error.data == test_data
assert str(error) == "Test error message"
def test_extract_json_v2_multiple():
"""Test extract_json_v2 with multiple JSON objects including duplicates."""
# Load test data
with open(get_test_data_path("multiple_json.txt"), "r") as f:
text = f.read()
# Extract JSON
result = extract_json_v2(text)
# Check if we got the correct number of unique objects (3 unique out of 5 total)
assert len(result) == 3
# Define expected objects
expected_objects = [
{"id": 1, "name": "First Object", "tags": ["one", "first", "primary"]},
{"id": 2, "name": "Second Object", "tags": ["two", "second"]},
{
"id": 3,
"name": "Third Object",
"metadata": {"created": "2023-01-01", "version": 1.0},
"active": True,
},
]
# Check if all expected objects are in the result
for expected in expected_objects:
assert expected in result
# Verify that each object appears exactly once (no duplicates)
id_counts = {}
for obj in result:
id_counts[obj["id"]] = id_counts.get(obj["id"], 0) + 1
# Each ID should appear exactly once
for id_val, count in id_counts.items():
assert count == 1, (
f"Object with ID {id_val} appears {count} times (should be 1)"
)
def test_extract_yaml_v2_valid():
"""Test extract_yaml_v2 with valid YAML in code blocks."""
# Load test data
with open(get_test_data_path("valid_yaml.txt"), "r") as f:
text = f.read()
# Extract YAML
result = extract_yaml_v2(text)
# Check if we got two unique YAML objects (third is a duplicate)
assert len(result) == 2
# Check if the objects are correct
expected_first = {
"name": "Test Object",
"properties": {"id": 1, "active": True},
"tags": ["test", "yaml", "parsing"],
}
expected_second = {"simple_name": "Simple Object", "value": 42}
assert expected_first in result
assert expected_second in result
def test_extract_yaml_v2_invalid():
"""Test extract_yaml_v2 raises DataParsingError for invalid YAML."""
# Load test data
with open(get_test_data_path("invalid_yaml.txt"), "r") as f:
text = f.read()
# Try to extract YAML, should raise DataParsingError
with pytest.raises(DataParsingError):
extract_yaml_v2(text)
def test_extract_yaml_v2_multiple():
"""Test extract_yaml_v2 with multiple YAML objects including duplicates."""
# Load test data
with open(get_test_data_path("multiple_yaml.txt"), "r") as f:
text = f.read()
# Extract YAML
result = extract_yaml_v2(text)
# Check if we got the correct number of unique objects (3 unique out of 5 total)
assert len(result) == 3
# Get the objects by ID for easier assertions
objects_by_id = {obj["id"]: obj for obj in result}
# Check for object 1
assert objects_by_id[1]["name"] == "First Object"
assert objects_by_id[1]["tags"] == ["one", "first", "primary"]
# Check for object 2
assert objects_by_id[2]["name"] == "Second Object"
assert objects_by_id[2]["tags"] == ["two", "second"]
# Check for object 3 - note that the date is parsed as a date object by YAML
assert objects_by_id[3]["name"] == "Third Object"
assert objects_by_id[3]["active"] is True
assert "created" in objects_by_id[3]["metadata"]
# Verify that each object ID appears exactly once (no duplicates)
id_counts = {}
for obj in result:
id_counts[obj["id"]] = id_counts.get(obj["id"], 0) + 1
# Each ID should appear exactly once
for id_val, count in id_counts.items():
assert count == 1, (
f"Object with ID {id_val} appears {count} times (should be 1)"
)
def test_extract_yaml_v2_multiple_documents():
"""Test extract_yaml_v2 with multiple YAML documents in a single code block."""
# Load test data from file
with open(get_test_data_path("multiple_yaml_documents.txt"), "r") as f:
test_data = f.read()
# Extract YAML
result = extract_yaml_v2(test_data)
# Check if we got all three documents
assert len(result) == 3
# Check if the objects are correct
objects_by_id = {obj["id"]: obj for obj in result}
assert objects_by_id[1]["name"] == "First Document"
assert "first" in objects_by_id[1]["tags"]
assert objects_by_id[2]["name"] == "Second Document"
assert "secondary" in objects_by_id[2]["tags"]
assert objects_by_id[3]["name"] == "Third Document"
assert objects_by_id[3]["active"] is True
def test_extract_yaml_v2_without_separators():
"""Test extract_yaml_v2 with multiple YAML documents without --- separators."""
# Load test data from file
with open(get_test_data_path("multiple_yaml_without_separators.txt"), "r") as f:
test_data = f.read()
# Extract YAML
result = extract_yaml_v2(test_data)
# Check if we got all three nested documents
assert len(result) == 3
# Create a dictionary of documents by name for easy testing
docs_by_name = {doc["name"]: doc for doc in result}
# Verify that all three documents are correctly parsed
assert "First Document" in docs_by_name
assert docs_by_name["First Document"]["id"] == 1
assert "first" in docs_by_name["First Document"]["tags"]
assert "Second Document" in docs_by_name
assert docs_by_name["Second Document"]["id"] == 2
assert "secondary" in docs_by_name["Second Document"]["tags"]
assert "Third Document" in docs_by_name
assert docs_by_name["Third Document"]["id"] == 3
assert docs_by_name["Third Document"]["active"] is True
def test_extract_json_v2_multiple_objects():
"""Test extract_json_v2 with multiple JSON objects in a single code block."""
# Load test data from file
with open(get_test_data_path("multiple_json_objects.txt"), "r") as f:
test_data = f.read()
# Extract JSON
result = extract_json_v2(test_data)
# Check if we got all three objects
assert len(result) == 3
# Check if the objects are correct
objects_by_id = {obj["id"]: obj for obj in result}
assert objects_by_id[1]["name"] == "First Object"
assert objects_by_id[1]["type"] == "test"
assert objects_by_id[2]["name"] == "Second Object"
assert objects_by_id[2]["values"] == [1, 2, 3]
assert objects_by_id[3]["name"] == "Third Object"
assert objects_by_id[3]["active"] is True
assert objects_by_id[3]["metadata"]["created"] == "2023-05-15"
def test_fix_yaml_colon_in_strings():
"""Test fix_yaml_colon_in_strings with problematic YAML containing unquoted colons."""
# Load test data from file
with open(get_test_data_path("yaml_with_colons.txt"), "r") as f:
problematic_yaml = f.read()
# Extract YAML from the code block
problematic_yaml = problematic_yaml.split("```")[1]
if problematic_yaml.startswith("yaml"):
problematic_yaml = problematic_yaml[4:].strip()
# Fix the YAML
fixed_yaml = fix_yaml_colon_in_strings(problematic_yaml)
# Parse the fixed YAML to check it works
parsed = yaml.safe_load(fixed_yaml)
# Check the structure and content is preserved
assert parsed["calls"][0]["name"] == "act"
assert parsed["calls"][0]["arguments"]["name"] == "Kaira"
assert (
"I can see you're scared, Elmer"
in parsed["calls"][0]["arguments"]["instructions"]
)
def test_fix_faulty_yaml():
"""Test fix_faulty_yaml with various problematic YAML constructs."""
# Load test data from file
with open(get_test_data_path("yaml_list_with_colons.txt"), "r") as f:
problematic_yaml = f.read()
# Extract YAML from the code block
problematic_yaml = problematic_yaml.split("```")[1]
if problematic_yaml.startswith("yaml"):
problematic_yaml = problematic_yaml[4:].strip()
# Fix the YAML
fixed_yaml = fix_faulty_yaml(problematic_yaml)
# Parse the fixed YAML to check it works
parsed = yaml.safe_load(fixed_yaml)
# Check the structure and content is preserved
assert len(parsed["instructions_list"]) == 2
# The content will be the full string with colons in it now
assert "Run to the door" in parsed["instructions_list"][0]
assert "Wait for me!" in parsed["instructions_list"][0]
assert "Look around" in parsed["instructions_list"][1]
assert "Is there another way out?" in parsed["instructions_list"][1]
def test_extract_yaml_v2_with_colons():
"""Test extract_yaml_v2 correctly processes YAML with problematic colons in strings."""
# Load test data containing YAML code blocks with problematic colons
with open(get_test_data_path("yaml_block_with_colons.txt"), "r") as f:
text = f.read()
# Extract YAML
result = extract_yaml_v2(text)
# Check if we got the two YAML objects
assert len(result) == 2
# Find the objects by their structure
calls_obj = None
instructions_obj = None
for obj in result:
if "calls" in obj:
calls_obj = obj
elif "instructions_list" in obj:
instructions_obj = obj
# Verify both objects were found
assert calls_obj is not None, "Could not find the 'calls' object"
assert instructions_obj is not None, "Could not find the 'instructions_list' object"
# Check the structure and content of the first object (calls)
assert calls_obj["calls"][0]["name"] == "act"
assert calls_obj["calls"][0]["arguments"]["name"] == "Kaira"
# Check that the problematic part with the colon is preserved
instructions = calls_obj["calls"][0]["arguments"]["instructions"]
assert "Speak in a calm, soothing tone and say:" in instructions
assert "I can see you're scared, Elmer" in instructions
# Check the second object (instructions_list)
assert len(instructions_obj["instructions_list"]) == 2
assert "Run to the door" in instructions_obj["instructions_list"][0]
assert "Wait for me!" in instructions_obj["instructions_list"][0]
assert "Look around" in instructions_obj["instructions_list"][1]
assert "Is there another way out?" in instructions_obj["instructions_list"][1]
@pytest.mark.asyncio
async def test_extract_data_auto_mixed_formats(mock_client_and_prompt):
"""Test extract_data_auto with mixed JSON and YAML codeblocks."""
client, prompt_cls = mock_client_and_prompt
# Load test data
with open(get_test_data_path("mixed_formats.txt"), "r") as f:
mixed_text = f.read()
result = await extract_data_auto(mixed_text, client, prompt_cls)
# Should extract all three objects
assert len(result) == 3
# Verify objects by ID
objects_by_id = {obj["id"]: obj for obj in result}
assert objects_by_id[1]["name"] == "JSON Object"
assert objects_by_id[1]["type"] == "json"
assert objects_by_id[2]["name"] == "YAML Object"
assert objects_by_id[2]["type"] == "yaml"
assert "test" in objects_by_id[2]["tags"]
assert objects_by_id[3]["name"] == "Second JSON"
assert objects_by_id[3]["active"] is True
@pytest.mark.asyncio
async def test_extract_data_auto_untyped_codeblocks(mock_client_and_prompt):
"""Test extract_data_auto with untyped codeblocks using default format."""
# Test with JSON default
with open(get_test_data_path("untyped_codeblocks_json.txt"), "r") as f:
json_text = f.read()
client, prompt_cls = mock_client_and_prompt
result = await extract_data_auto(
json_text, client, prompt_cls, schema_format="json"
)
assert len(result) == 2
names = {obj["name"] for obj in result}
assert "Untyped JSON" in names
assert "Another JSON" in names
# Test with YAML default
with open(get_test_data_path("untyped_codeblocks_yaml.txt"), "r") as f:
yaml_text = f.read()
result = await extract_data_auto(
yaml_text, client, prompt_cls, schema_format="yaml"
)
assert len(result) == 2
names = {obj["name"] for obj in result}
assert "Untyped YAML" in names
assert "Another YAML" in names
@pytest.mark.asyncio
async def test_extract_data_auto_bare_codeblock(mock_client_and_prompt):
"""Test extract_data_auto with entire text being just a codeblock."""
# JSON codeblock
json_codeblock = """```json
{"name": "Bare JSON", "id": 123, "active": true}
```"""
client, prompt_cls = mock_client_and_prompt
result = await extract_data_auto(json_codeblock, client, prompt_cls)
assert len(result) == 1
assert result[0]["name"] == "Bare JSON"
assert result[0]["id"] == 123
# YAML codeblock
yaml_codeblock = """```yaml
name: Bare YAML
id: 456
active: false
tags:
- bare
- yaml
```"""
result = await extract_data_auto(yaml_codeblock, client, prompt_cls)
assert len(result) == 1
assert result[0]["name"] == "Bare YAML"
assert result[0]["id"] == 456
assert "bare" in result[0]["tags"]
@pytest.mark.asyncio
async def test_extract_data_auto_raw_data(mock_client_and_prompt):
"""Test extract_data_auto with raw data structures (no codeblocks)."""
# Raw JSON
raw_json = '{"name": "Raw JSON", "value": 100}'
client, prompt_cls = mock_client_and_prompt
result = await extract_data_auto(raw_json, client, prompt_cls, schema_format="json")
assert len(result) == 1
assert result[0]["name"] == "Raw JSON"
assert result[0]["value"] == 100
# Raw YAML
raw_yaml = """name: Raw YAML
value: 200
metadata:
created: 2023-01-01
version: 1.0"""
result = await extract_data_auto(raw_yaml, client, prompt_cls, schema_format="yaml")
assert len(result) == 1
assert result[0]["name"] == "Raw YAML"
assert result[0]["value"] == 200
# YAML parser converts date strings to date objects
assert str(result[0]["metadata"]["created"]) == "2023-01-01"
@pytest.mark.asyncio
async def test_extract_data_auto_empty_codeblocks(mock_client_and_prompt):
"""Test extract_data_auto skips empty codeblocks."""
# Load test data
with open(get_test_data_path("empty_codeblocks.txt"), "r") as f:
text_with_empty = f.read()
client, prompt_cls = mock_client_and_prompt
result = await extract_data_auto(text_with_empty, client, prompt_cls)
assert len(result) == 2
objects_by_id = {obj["id"]: obj for obj in result}
assert objects_by_id[1]["name"] == "Valid"
assert objects_by_id[2]["name"] == "Valid YAML"
@pytest.mark.asyncio
async def test_extract_data_auto_malformed_blocks(mock_client_and_prompt):
"""Test extract_data_auto handles malformed blocks gracefully."""
text_with_malformed = """
Valid JSON:
```json
{"name": "Valid", "id": 1}
```
Malformed JSON:
```json
{"name": "Broken", "id":
```
Another valid JSON:
```json
{"name": "Also Valid", "id": 2}
```
"""
client, prompt_cls = mock_client_and_prompt
result = await extract_data_auto(text_with_malformed, client, prompt_cls)
# Should extract the 2 valid objects and skip the malformed one
assert len(result) == 2
names = {obj["name"] for obj in result}
assert "Valid" in names
assert "Also Valid" in names
assert "Broken" not in names # Should be skipped
@pytest.mark.asyncio
async def test_extract_data_auto_repairs_faulty_json(mock_client_and_prompt):
"""Test extract_data_auto can repair faulty JSON blocks."""
# Load test data
with open(get_test_data_path("faulty_json_repairable.txt"), "r") as f:
text_with_faulty = f.read()
client, prompt_cls = mock_client_and_prompt
result = await extract_data_auto(text_with_faulty, client, prompt_cls)
# Should successfully repair and extract both objects
assert len(result) == 3 # Two from first block (after repair), one from second
# Check that repair worked
names = {obj["name"] for obj in result if "name" in obj}
assert "Test" in names
assert "Another" in names
@pytest.mark.asyncio
async def test_extract_data_auto_yml_identifier(mock_client_and_prompt):
"""Test extract_data_auto recognizes 'yml' as YAML identifier."""
yml_text = """
Data with yml extension:
```yml
name: YML Test
id: 123
config:
enabled: true
timeout: 30
```
"""
client, prompt_cls = mock_client_and_prompt
result = await extract_data_auto(yml_text, client, prompt_cls)
assert len(result) == 1
assert result[0]["name"] == "YML Test"
assert result[0]["id"] == 123
assert result[0]["config"]["enabled"] is True
@pytest.mark.asyncio
async def test_extract_data_auto_invalid_raw_data(mock_client_and_prompt):
"""Test extract_data_auto raises DataParsingError for invalid raw data."""
# Invalid raw JSON
invalid_json = '{"name": "Broken JSON", "id":'
with pytest.raises(DataParsingError) as exc_info:
client, prompt_cls = mock_client_and_prompt
await extract_data_auto(invalid_json, client, prompt_cls, schema_format="json")
assert "Failed to parse raw JSON data" in str(exc_info.value)
# Invalid raw YAML
invalid_yaml = """name: Broken YAML
- invalid: structure
without: proper indentation"""
with pytest.raises(DataParsingError) as exc_info:
await extract_data_auto(invalid_yaml, client, prompt_cls, schema_format="yaml")
assert "Failed to parse raw YAML data" in str(exc_info.value)
@pytest.mark.asyncio
async def test_extract_data_auto_unsupported_format(mock_client_and_prompt):
"""Test extract_data_auto raises DataParsingError for unsupported formats."""
text = '{"name": "test"}'
with pytest.raises(DataParsingError) as exc_info:
client, prompt_cls = mock_client_and_prompt
await extract_data_auto(text, client, prompt_cls, schema_format="xml")
assert "Failed to parse raw XML data" in str(exc_info.value)
@pytest.mark.asyncio
async def test_extract_data_auto_multiple_objects_in_single_block(
mock_client_and_prompt,
):
"""Test extract_data_auto handles multiple objects within a single codeblock."""
multiple_json = """
```json
{"id": 1, "name": "First"}
{"id": 2, "name": "Second"}
{"id": 3, "name": "Third"}
```
"""
client, prompt_cls = mock_client_and_prompt
result = await extract_data_auto(multiple_json, client, prompt_cls)
assert len(result) == 3
objects_by_id = {obj["id"]: obj for obj in result}
assert objects_by_id[1]["name"] == "First"
assert objects_by_id[2]["name"] == "Second"
assert objects_by_id[3]["name"] == "Third"
@pytest.mark.asyncio
async def test_extract_data_with_ai_fallback_json_without_codeblock():
"""Test extract_data_with_ai_fallback when AI returns JSON without code block."""
# Mock client and prompt
client = MagicMock()
client.data_format = "json"
prompt_cls = MagicMock()
# Simulate AI returning corrected JSON without code block
async def mock_request(*args, **kwargs):
return '{"name": "Fixed JSON", "id": 123, "active": true}'
prompt_cls.request = mock_request
# Malformed JSON that cannot be auto-fixed (invalid structure)
malformed_json = '{"name": "Broken" this is broken, "id": 123}'
result = await extract_data_with_ai_fallback(
client, malformed_json, prompt_cls, "json"
)
# Should successfully extract the JSON even without code block
assert len(result) == 1
assert result[0]["name"] == "Fixed JSON"
assert result[0]["id"] == 123
@pytest.mark.asyncio
async def test_extract_data_with_ai_fallback_json_with_codeblock():
"""Test extract_data_with_ai_fallback when AI returns JSON with code block."""
# Mock client and prompt
client = MagicMock()
client.data_format = "json"
prompt_cls = MagicMock()
# Simulate AI returning corrected JSON with code block
async def mock_request(*args, **kwargs):
return '```json\n{"name": "Fixed JSON", "id": 456, "active": false}\n```'
prompt_cls.request = mock_request
# Malformed JSON that will trigger AI fallback
malformed_json = '{"name": "Broken", "id": 456,'
result = await extract_data_with_ai_fallback(
client, malformed_json, prompt_cls, "json"
)
# Should successfully extract the JSON
assert len(result) == 1
assert result[0]["name"] == "Fixed JSON"
assert result[0]["id"] == 456
@pytest.mark.asyncio
async def test_extract_data_with_ai_fallback_yaml_without_codeblock():
"""Test extract_data_with_ai_fallback when AI returns YAML without code block."""
# Mock client and prompt
client = MagicMock()
client.data_format = "yaml"
prompt_cls = MagicMock()
# Simulate AI returning corrected YAML without code block
async def mock_request(*args, **kwargs):
return """name: Fixed YAML
id: 789
active: true
tags:
- test
- fixed"""
prompt_cls.request = mock_request
# Malformed YAML that will trigger AI fallback
malformed_yaml = """name: Broken
id: 789
active: true"""
result = await extract_data_with_ai_fallback(
client, malformed_yaml, prompt_cls, "yaml"
)
# Should successfully extract the YAML even without code block
assert len(result) == 1
assert result[0]["name"] == "Fixed YAML"
assert result[0]["id"] == 789
assert result[0]["active"] is True
@pytest.mark.asyncio
async def test_extract_data_with_ai_fallback_yaml_with_codeblock():
"""Test extract_data_with_ai_fallback when AI returns YAML with code block."""
# Mock client and prompt
client = MagicMock()
client.data_format = "yaml"
prompt_cls = MagicMock()
# Simulate AI returning corrected YAML with code block
async def mock_request(*args, **kwargs):
return """```yaml
name: Fixed YAML
id: 999
active: false
```"""
prompt_cls.request = mock_request
# Malformed YAML that will trigger AI fallback
malformed_yaml = """name: Broken
id: 999
active: false"""
result = await extract_data_with_ai_fallback(
client, malformed_yaml, prompt_cls, "yaml"
)
# Should successfully extract the YAML
assert len(result) == 1
assert result[0]["name"] == "Fixed YAML"
assert result[0]["id"] == 999
assert result[0]["active"] is False