Skip to content

feat(wren-ai-service): flexible chart generation (ai-env-changed) #1652

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 44 commits into
base: main
Choose a base branch
from

Conversation

cyyeh
Copy link
Member

@cyyeh cyyeh commented May 14, 2025

Summary by CodeRabbit

  • New Features

    • Introduced chart validation and correction pipelines to enhance chart accuracy and quality.
    • Added retry mechanisms for chart validation and correction in chart generation and adjustment services.
    • Enabled direct data input and image URL support for chart adjustments.
    • Added a custom Power BI theme for chart visualizations.
  • Enhancements

    • Simplified chart schema handling by replacing strict schemas with flexible dictionary-based models.
    • Improved prompt instructions and output formats for chart generation and adjustment.
    • Standardized configuration defaults and request model initializations across services.
    • Enhanced support for custom theming in chart post-processing.
    • Strengthened type annotations by removing optional types where defaults are always provided.
  • Bug Fixes

    • Fixed issues with shared mutable default configurations by using default factories.
  • Documentation

    • Updated example configuration files to include new chart validation and correction pipelines.
  • Chores

    • Added new dependency for chart visualization support.
    • Refactored chat message and streaming chunk handling with improved typing and OpenAI format conversion.

Copy link
Contributor

coderabbitai bot commented May 14, 2025

Walkthrough

This update adds two new pipeline components, chart_validation and chart_correction, across multiple configuration files and the main pipeline codebase. It introduces support for validating and correcting Vega-Lite chart schemas using LLMs, including new prompts, preprocessing, postprocessing, and retry logic in chart-related services. Several type annotations were refined to remove optionality, and a new Power BI theme was added.

Changes

Files (Grouped by Similar Changes) Change Summary
deployment/kustomizations/base/cm.yaml, docker/config.example.yaml, tools/config/config.example.yaml,
tools/config/config.full.yaml,
wren-ai-service/docs/config_examples/config.*.yaml (all example config files)
Added two new pipeline entries chart_validation and chart_correction using litellm_llm.default language model.
wren-ai-service/pyproject.toml Added dependency vl-convert-python = "^1.7.0".
wren-ai-service/src/config.py Added two new config fields: allow_chart_validation: bool = False and max_chart_correction_retries: int = 3.
wren-ai-service/src/globals.py Added new pipeline components chart_validation and chart_correction to service containers with new config params.
wren-ai-service/src/pipelines/generation/__init__.py Imported and exported new pipeline classes ChartCorrection and ChartValidation.
wren-ai-service/src/pipelines/generation/chart_adjustment.py Simplified adjustment interface to use command string; added image URL support; improved preprocessing and prompts; added retry and validation logic support.
wren-ai-service/src/pipelines/generation/chart_correction.py New pipeline to correct Vega-Lite charts based on user query and SQL with preprocessing, prompt building, generation, and postprocessing.
wren-ai-service/src/pipelines/generation/chart_generation.py Removed Vega schema validation; added custom theme support; updated prompt and postprocessing signatures and logic.
wren-ai-service/src/pipelines/generation/chart_validation.py New pipeline to validate if Vega-Lite chart images contain meaningful content; includes image conversion and LLM-based validation.
wren-ai-service/src/pipelines/generation/utils/chart.py Removed strict typed chart schema models; replaced with flexible dict-based schema; added custom theme loading and preprocessing component; simplified postprocessing.
wren-ai-service/src/pipelines/generation/utils/theme_powerbi.json Added new Power BI theme JSON configuration file for chart styling.
wren-ai-service/src/providers/llm/__init__.py Replaced Haystack dataclasses with custom ChatMessage and StreamingChunk dataclasses; added ChatRole enum; added OpenAI message format converter.
wren-ai-service/src/providers/llm/litellm.py Updated imports to local ChatMessage and conversion function; extended internal async function to accept image_url.
wren-ai-service/src/web/v1/services/chart.py Added chart validation and correction retry logic in chart generation; updated models and constructor to support validation flags and retry limits.
wren-ai-service/src/web/v1/services/chart_adjustment.py Removed ChartAdjustmentOption; replaced with adjustment_command string; added support for optional data and image URL; added validation and correction retry logic; updated models and constructor.
wren-ai-service/src/web/v1/routers/question_recommendation.py Changed optional fields with default values to non-optional with explicit defaults; updated configuration to use Field(default_factory=Configuration).
wren-ai-service/src/web/v1/routers/relationship_recommendation.py Changed configuration field to use Field(default_factory=Configuration) instead of optional with default instance.
wren-ai-service/src/web/v1/routers/semantics_description.py Changed configuration field to use Field(default_factory=Configuration) instead of optional with default instance.
wren-ai-service/src/web/v1/services/ask.py Changed multiple optional fields with defaults to non-optional with explicit defaults; updated configuration to use default factory.
wren-ai-service/src/web/v1/services/question_recommendation.py Changed optional fields with defaults to non-optional with explicit defaults; updated configuration and nested model fields accordingly.
wren-ai-service/src/web/v1/services/relationship_recommendation.py Changed configuration field to use Field(default_factory=Configuration) instead of optional with default instance.
wren-ai-service/src/web/v1/services/semantics_description.py Changed configuration field to use Field(default_factory=Configuration) instead of optional with default instance.
wren-ai-service/src/web/v1/services/sql_answer.py Changed configurations field to use Field(default_factory=Configuration) instead of optional with default instance.
wren-ai-service/src/web/v1/services/sql_question.py Changed configurations field to use Field(default_factory=Configuration) instead of optional with default instance.
wren-ai-service/src/web/v1/services/__init__.py Changed language and timezone attributes in Configuration class from optional to non-optional with defaults.
Multiple pipeline constructors and methods (IntentClassification.py, RelationshipRecommendation.py, SQLCorrection.py, SQLGeneration.py, SQLRegeneration.py, SQLGenPostProcessor in utils/sql.py, DBSchema.py, Instructions.py, SqlPairs.py, DbSchemaRetrieval.py, HistoricalQuestionRetrieval.py, Instructions.py, SQLExecutor.py, SqlFunctions.py, SqlPairsRetrieval.py, LitellmEmbedderProvider.py) Changed parameters from optional types with defaults to non-optional types with defaults, removing Optional from type hints for better typing clarity.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant ChartService
    participant ChartValidationPipeline
    participant ChartCorrectionPipeline
    participant ChartGenerationPipeline

    Client->>ChartService: Request chart generation
    ChartService->>ChartGenerationPipeline: Generate initial chart schema
    ChartGenerationPipeline-->>ChartService: Return chart schema and reasoning

    alt Chart validation enabled
        ChartService->>ChartValidationPipeline: Validate chart schema
        ChartValidationPipeline-->>ChartService: Validation result (valid/invalid)

        alt Chart is valid
            ChartService->>Client: Return finished chart result
        else Chart invalid and retries left
            ChartService->>ChartCorrectionPipeline: Correct chart schema
            ChartCorrectionPipeline-->>ChartService: Updated chart schema
            ChartService->>ChartValidationPipeline: Validate updated chart schema
            Note over ChartService: Retry loop until valid or max retries
        else Chart invalid and no retries left
            ChartService->>Client: Return failure result
        end
    else Chart validation disabled
        ChartService->>Client: Return finished chart result
    end
Loading

Possibly related PRs

Poem

🐇 In charts we trust, with tweaks so bright,
Validation and correction bring data to light.
Pipelines flow with careful care,
Themes and prompts dance in the air.
From schema raw to polished gleam,
Our charts now tell the perfect dream! 🎨✨

Note

⚡️ AI Code Reviews for VS Code, Cursor, Windsurf

CodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback.
Learn more here.


Note

⚡️ Faster reviews with caching

CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 30th. To opt out, configure Review - Disable Cache at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the Data Retention setting under your Organization Settings.
Enjoy the performance boost—your workflow just got faster.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fb17cb7 and 69d07fe.

📒 Files selected for processing (1)
  • wren-ai-service/src/pipelines/generation/utils/chart.py (4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
wren-ai-service/src/pipelines/generation/utils/chart.py (5)
wren-ai-service/src/pipelines/generation/chart_correction.py (1)
  • run (127-146)
wren-ai-service/src/pipelines/generation/chart_generation.py (1)
  • run (140-160)
wren-ai-service/src/pipelines/generation/chart_adjustment.py (1)
  • run (163-190)
wren-ai-service/src/pipelines/generation/utils/sql.py (1)
  • run (31-76)
wren-ai-service/src/pipelines/generation/chart_validation.py (1)
  • run (125-136)
🔇 Additional comments (6)
wren-ai-service/src/pipelines/generation/utils/chart.py (6)

60-68: Good implementation of ChartSchemaPreprocessor

The component correctly uses pop() with a default value to safely handle missing keys and properly defines the output types.


77-79: Fixed type hint and added custom theme support

Good job fixing the type hint for replies parameter and adding the optional custom_theme parameter. This improves type safety and adds flexibility for chart customization.


13-22: Well-implemented error handling

The function now properly handles file not found errors, IO errors, and JSON parsing errors with appropriate logging.


135-146: Clear generation instructions

The general instructions provide clear guidance for chart generation with specific requirements for Vega-lite schema generation.


149-153: Good model update with deprecation notice

The chart schema is now a dictionary type for more flexibility, and the deprecated field is properly marked with a comment.


155-163: Well-structured model response format

The response format definition using the Pydantic model schema ensures consistent output structure.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@cyyeh cyyeh force-pushed the feat/ai-service/flexible-chart-gen branch from 479f257 to 1dc2638 Compare May 19, 2025 02:35
@github-actions github-actions bot changed the title feat(wren-ai-service): flexible chart generation feat(wren-ai-service): flexible chart generation (ai-env-changed) May 19, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🔭 Outside diff range comments (1)
wren-ai-service/src/web/v1/services/chart.py (1)

80-95: 🛠️ Refactor suggestion

Constructor parameters can get silently out-of-sync with the supplied pipeline dict

ChartService.__init__() now expects that the pipelines mapping always contains chart_validation and chart_correction when allow_chart_validation=True.
Because these keys are not accessed until run-time, a mis-configured DI container will raise a KeyError only after the first request, turning a configuration error into a 500-level production error.

Consider validating the presence of the required pipelines up-front:

 def __init__(self, pipelines: Dict[str, BasicPipeline], 
              allow_chart_validation: bool = False,
              max_chart_correction_retries: int = 3,
              maxsize: int = 1_000_000,
              ttl: int = 120):
-    self._pipelines = pipelines
+    required = {"sql_executor", "chart_generation"}
+    if allow_chart_validation:
+        required |= {"chart_validation", "chart_correction"}
+    missing = required - pipelines.keys()
+    if missing:
+        raise ValueError(f"Missing pipeline(s): {', '.join(missing)}")
+    self._pipelines = pipelines
♻️ Duplicate comments (7)
wren-ai-service/docs/config_examples/config.anthropic.yaml (1)

139-142: (Duplicate) Update example settings with new chart config fields
As noted in the config.grok.yaml review, this example should include allow_chart_validation and max_chart_correction_retries under its settings: section to surface the new options to users.

docker/config.example.yaml (1)

142-145: (Duplicate) Update example settings with new chart config fields
As mentioned in the config.grok.yaml review, the top‐level example should reflect the new allow_chart_validation and max_chart_correction_retries settings in its settings: block.

wren-ai-service/docs/config_examples/config.deepseek.yaml (1)

162-165: (Duplicate) Update example settings with new chart config fields
Consistent with other configs, please add the allow_chart_validation and max_chart_correction_retries fields under the settings: section to this example.

wren-ai-service/docs/config_examples/config.groq.yaml (1)

143-146: Duplicate: Group chart-related pipelines together.
As in config.ollama.yaml, relocate chart_validation and chart_correction entries to immediately follow chart_adjustment under the pipes list for coherent grouping.

wren-ai-service/docs/config_examples/config.azure.yaml (1)

152-155: Duplicate: Group chart-related pipelines together.
Consistently reposition chart_validation and chart_correction entries to come right after chart_adjustment in the pipes list, matching the grouping in other environment configs.

wren-ai-service/tools/config/config.full.yaml (1)

153-156: Duplicate: Group chart-related pipelines together.
Align with other examples by placing chart_validation and chart_correction entries adjacent to chart_adjustment to preserve a logical, chart-focused sequence in the pipes list.

wren-ai-service/src/web/v1/services/chart.py (1)

216-218: Duplicate unsafe mutation – see comment above

The same unconditional mutation appears here; refactor into a small helper to avoid repetition and future omissions.

🧹 Nitpick comments (21)
wren-ai-service/src/pipelines/generation/utils/theme_powerbi.json (1)

1-128: Validate theme JSON structure and integration.

Ensure this Power BI theme adheres to Vega-Lite’s theme specification (e.g., wrapping under a config key if required) and is correctly loaded via load_custom_theme(). Consider adding a unit test or lint step to verify that the theme produces the expected styling when applied to a sample chart.

wren-ai-service/tools/config/config.example.yaml (1)

153-156: Document new chart pipelines in example config.

Ensure that chart_validation and chart_correction pipelines are reflected in the project README and in other config examples. Also verify any configurable parameters (e.g., retry counts, timeouts) are documented so users can customize behavior.

wren-ai-service/docs/config_examples/config.google_ai_studio.yaml (1)

148-151: Enhance documentation for chart pipelines in example config.

Add inline comments describing the purpose of chart_validation and chart_correction, and note any prerequisites (e.g., environment variables or dependencies for image conversion). This will guide users customizing the Google AI Studio config file.

wren-ai-service/docs/config_examples/config.grok.yaml (1)

144-147: Chart validation and correction pipelines added; update example settings
The new chart_validation and chart_correction pipelines follow the established pattern. To surface the corresponding controls to users, consider adding the new settings fields (allow_chart_validation and max_chart_correction_retries) under the settings: block of this example config so they can override the defaults.

wren-ai-service/src/config.py (1)

45-46: Document new chart config settings
You've added allow_chart_validation and max_chart_correction_retries to control the new pipelines. To improve discoverability, please add inline comments or update the class‐level docstring to describe these fields and their expected use.

wren-ai-service/src/pipelines/generation/__init__.py (2)

2-4: Ensure imports follow alphabetical grouping.
The newly added ChartCorrection and ChartValidation imports interrupt the existing alphabetical order of pipeline modules. Consider reordering these imports to maintain consistency:

from .chart_adjustment import ChartAdjustment
from .chart_correction import ChartCorrection
from .chart_generation import ChartGeneration
from .chart_validation import ChartValidation

41-42: Maintain __all__ ordering for pipeline exports.
The entries "ChartValidation" and "ChartCorrection" should be inserted alongside other chart-related exports in alphabetical order—ideally immediately after "ChartAdjustment" and "ChartGeneration"—to keep the public API list consistent and readable.

wren-ai-service/docs/config_examples/config.ollama.yaml (1)

140-143: Group chart-related pipelines together.
The chart_validation and chart_correction entries are currently appended at the end of the pipes list. To improve logical flow and readability, consider moving them to directly follow chart_generation and chart_adjustment within the same group.

wren-ai-service/test_vega_lite_chart.py (1)

1-16: Consider turning this script into a proper unit test.

This file successfully demonstrates the Vega-Lite to PNG conversion functionality that's likely used by the chart validation pipeline. However, it appears to be a quick test script rather than a formal unit test.

Consider converting this into a proper unit test with assertions to verify the output and integrating it into the project's test suite for automated testing.

A test implementation could look like:

import base64
import pytest
import vl_convert as vlc

def test_vega_lite_to_png_conversion():
    vl_spec = """
    # existing Vega-Lite spec
    """
    
    # Convert to PNG
    png_bytes = vlc.vegalite_to_png(vl_spec=vl_spec, vl_version="v5.15")
    
    # Assert PNG bytes are not empty
    assert len(png_bytes) > 0
    
    # Verify it can be base64 encoded properly
    b64_str = base64.b64encode(png_bytes).decode("utf-8")
    assert b64_str.startswith('/9j/') or b64_str.startswith('iVBOR')  # Common PNG header patterns
wren-ai-service/src/providers/llm/litellm.py (1)

119-121: extract_braces_content() can return an empty string – consider a safer fallback

When the LLM reply does not contain braces, extract_braces_content will return an empty
string, propagating upwards as the final reply.
If the downstream code assumes valid JSON this will surface as hard-to-trace errors.

Suggestion: fall back to the raw content when extraction fails, or raise a descriptive
exception early.

-                "replies": [
-                    extract_braces_content(message.content) for message in completions
-                ],
+                "replies": [
+                    extract_braces_content(msg.content) or msg.content
+                    for msg in completions
+                ],
wren-ai-service/src/pipelines/generation/chart_generation.py (3)

23-33: Minor wording & spacing issues in the system prompt

A couple of typos can hurt the LLM’s comprehension signal.

-You are a data analyst great at generating data visualization using vega-lite! Given the user's question, SQL, sample data and sample column values, you need to think about the best chart type and generate correspondingvega-lite schema in JSON format.
+You are a data analyst great at generating data visualizations using Vega-Lite!  
+Given the user's question, SQL, sample data and sample column values, think about the best chart type and generate the corresponding Vega-Lite schema in JSON format.

92-107: Guard against missing keys when data_provided is True

If an upstream change drops the raw_data key, the ternary expression will raise
KeyError, aborting the whole pipeline instead of failing gracefully.

-            preprocess_data["raw_data"]
+            preprocess_data.get("raw_data", [])

(The same applies to "sample_data".) Falling back to an empty list is usually safer for prompt construction.


132-134: Load the custom theme once and reuse

load_custom_theme() reads a file on every pipeline instantiation. Caching the
result at module level avoids needless disk I/O and JSON parsing.

_theme_cache: Optional[dict[str, Any]] = None


def _get_theme() -> dict[str, Any]:
    global _theme_cache
    if _theme_cache is None:
        _theme_cache = load_custom_theme()
    return _theme_cache
...
        self._configs = {
            "custom_theme": _get_theme(),
        }
wren-ai-service/src/pipelines/generation/chart_validation.py (1)

112-118: Prompt builder is invoked without variables – safe but wastes latency

PromptBuilder.run() re-renders the template on every call even though
chart_validation_user_prompt_template is static.
Since the template contains no placeholders, replace the builder with a constant
string to save a couple of milliseconds per request.

"prompt_builder": lambda **_: {"prompt": chart_validation_user_prompt_template.strip()},
wren-ai-service/src/web/v1/services/chart.py (1)

120-127: data_provided flag can stay False even when chart_request.data is falsy but non-empty

if not chart_request.data: treats “any falsy value” (empty dict/list) as “no data”, even though the caller attempted to supply data.
This may break downstream prompts that rely on data_provided.

-if not chart_request.data:
+if chart_request.data is None:

This distinguishes “omitted” from “provided but empty”.

wren-ai-service/src/providers/llm/__init__.py (1)

224-253: Type annotation too narrow for returned structure

convert_message_to_openai_format returns a dict where "content" can be either a str or a List[Dict[str,Any]], but the signature constrains it to Dict[str, str].

-def convert_message_to_openai_format(message: ChatMessage) -> Dict[str, str]:
+from typing import Any, Mapping
+
+def convert_message_to_openai_format(message: ChatMessage) -> Dict[str, Any]:

Mismatched annotations propagate to callers and may mask bugs when static analysis is enabled.

wren-ai-service/src/pipelines/generation/chart_correction.py (1)

101-126: Consider injecting PromptBuilder template via kwargs for easier runtime tuning

Hard-coding long system/user prompt strings inside the module makes quick experimentation in prod almost impossible.
Expose them as optional constructor args or load from config to enable A/B testing without code changes.

wren-ai-service/src/pipelines/generation/chart_adjustment.py (1)

118-130: Raw-vs-sample data ambiguity

post_process() feeds either raw_data or sample_data into the Vega-Lite spec based on data_provided.
For very large tables this can explode the payload size and slow down client rendering.

Consider streaming only a capped sample in all cases, or introducing a hard limit (e.g. MAX_ROWS = 5 000).

wren-ai-service/src/web/v1/services/chart_adjustment.py (2)

84-85: chart_type should be str | None not Optional[str] = ""

An empty string is not “missing”; it’s a value.
Prefer chart_type: str | None = None to clearly indicate absence.


152-162: if not chart_adjustment_request.data treats empty dict as “no data”

An empty dict is falsy, so users passing {} deliberately will trigger an unnecessary SQL round-trip.
Use an explicit is None check:

-if not chart_adjustment_request.data:
+if chart_adjustment_request.data is None:
wren-ai-service/src/pipelines/generation/utils/chart.py (1)

13-15: Robust theme loading for packaged deployments

Hard-coded path assumes execution from repo root.
Use importlib.resources so the theme is available when the package is installed elsewhere.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 4b7fdb7 and 89ba259.

⛔ Files ignored due to path filters (1)
  • wren-ai-service/poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (28)
  • deployment/kustomizations/base/cm.yaml (1 hunks)
  • docker/config.example.yaml (1 hunks)
  • wren-ai-service/docs/config_examples/config.anthropic.yaml (1 hunks)
  • wren-ai-service/docs/config_examples/config.azure.yaml (1 hunks)
  • wren-ai-service/docs/config_examples/config.deepseek.yaml (1 hunks)
  • wren-ai-service/docs/config_examples/config.google_ai_studio.yaml (1 hunks)
  • wren-ai-service/docs/config_examples/config.grok.yaml (1 hunks)
  • wren-ai-service/docs/config_examples/config.groq.yaml (1 hunks)
  • wren-ai-service/docs/config_examples/config.lm_studio.yaml (1 hunks)
  • wren-ai-service/docs/config_examples/config.ollama.yaml (1 hunks)
  • wren-ai-service/docs/config_examples/config.open_router.yaml (1 hunks)
  • wren-ai-service/pyproject.toml (1 hunks)
  • wren-ai-service/src/config.py (1 hunks)
  • wren-ai-service/src/globals.py (2 hunks)
  • wren-ai-service/src/pipelines/generation/__init__.py (2 hunks)
  • wren-ai-service/src/pipelines/generation/chart_adjustment.py (8 hunks)
  • wren-ai-service/src/pipelines/generation/chart_correction.py (1 hunks)
  • wren-ai-service/src/pipelines/generation/chart_generation.py (6 hunks)
  • wren-ai-service/src/pipelines/generation/chart_validation.py (1 hunks)
  • wren-ai-service/src/pipelines/generation/utils/chart.py (5 hunks)
  • wren-ai-service/src/pipelines/generation/utils/theme_powerbi.json (1 hunks)
  • wren-ai-service/src/providers/llm/__init__.py (2 hunks)
  • wren-ai-service/src/providers/llm/litellm.py (4 hunks)
  • wren-ai-service/src/web/v1/services/chart.py (6 hunks)
  • wren-ai-service/src/web/v1/services/chart_adjustment.py (7 hunks)
  • wren-ai-service/test_vega_lite_chart.py (1 hunks)
  • wren-ai-service/tools/config/config.example.yaml (1 hunks)
  • wren-ai-service/tools/config/config.full.yaml (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
wren-ai-service/src/pipelines/generation/__init__.py (2)
wren-ai-service/src/pipelines/generation/chart_correction.py (1)
  • ChartCorrection (101-145)
wren-ai-service/src/pipelines/generation/chart_validation.py (1)
  • ChartValidation (104-136)
wren-ai-service/src/globals.py (2)
wren-ai-service/src/pipelines/generation/chart_validation.py (1)
  • ChartValidation (104-136)
wren-ai-service/src/pipelines/generation/chart_correction.py (1)
  • ChartCorrection (101-145)
wren-ai-service/src/pipelines/generation/chart_generation.py (1)
wren-ai-service/src/pipelines/generation/utils/chart.py (6)
  • ChartDataPreprocessor (19-50)
  • ChartGenerationPostProcessor (64-125)
  • load_custom_theme (13-15)
  • run (24-50)
  • run (58-60)
  • run (68-125)
wren-ai-service/src/pipelines/generation/chart_adjustment.py (1)
wren-ai-service/src/pipelines/generation/utils/chart.py (7)
  • ChartDataPreprocessor (19-50)
  • ChartGenerationPostProcessor (64-125)
  • ChartSchemaPreprocessor (54-60)
  • load_custom_theme (13-15)
  • run (24-50)
  • run (58-60)
  • run (68-125)
🔇 Additional comments (9)
wren-ai-service/pyproject.toml (1)

39-39:

✅ Verification successful

Approve addition of vl-convert-python dependency.

This new runtime dependency is required for converting Vega-Lite charts to PNG in the chart validation and correction pipelines. Ensure that poetry lock has been updated and that this version is compatible with Python 3.12.

To verify installation and importability, you can run:


🏁 Script executed:

#!/bin/bash
poetry run python - <<EOF
import vl_convert as vc
print("vl-convert-python version:", vc.__version__)
EOF

Length of output: 118


🏁 Script executed:

#!/bin/bash
# Verify that 'vl-convert-python' is in the lock file and check its Python version constraints
rg 'name = "vl-convert-python"' -A4 poetry.lock

Length of output: 101


🏁 Script executed:

#!/bin/bash
# Search for poetry.lock in the repository to verify that the lock file includes vl-convert-python
find . -type f -name "poetry.lock"

Length of output: 64


🏁 Script executed:

#!/bin/bash
# Verify that vl-convert-python is in the lock file and show its locked version and Python version constraints
rg -A5 'name = "vl-convert-python"' wren-ai-service/poetry.lock

Length of output: 239


Approve addition of vl-convert-python dependency.

  • Verified in wren-ai-service/poetry.lock:
    • name = "vl-convert-python"
    • version = "1.7.0"
    • python-versions = ">=3.7"
  • Python 3.12 falls within the supported range.
  • Import module as vl_convert in your local env to confirm if needed.
deployment/kustomizations/base/cm.yaml (1)

192-195: Verify consistency of new chart pipelines in Kustomize config.

Confirm that chart_validation and chart_correction entries here align with those in config.example.yaml and the docs. Check that the indentation matches Kustomize’s parsing expectations and that no required fields (e.g., timeouts or retries) are missing.

wren-ai-service/docs/config_examples/config.lm_studio.yaml (1)

142-145: New chart validation and correction pipelines added.

The addition of these two new pipelines aligns with the pattern of other pipelines in the configuration file, correctly using the litellm_llm.default model. These components will provide chart validation and correction capabilities to the system.

wren-ai-service/docs/config_examples/config.open_router.yaml (1)

140-143: New chart validation and correction pipelines added.

The addition of these two new pipelines follows the same pattern as in other configuration files, properly using the litellm_llm.default model. These components will provide chart validation and correction capabilities to the system.

wren-ai-service/src/globals.py (4)

163-168: Chart validation and correction pipelines properly integrated.

The new chart validation and correction pipelines are correctly integrated into the chart service, ensuring that the components can be used for validating chart images and correcting chart schemas.


170-171: Configuration parameters properly added to chart service.

The new configuration parameters allow_chart_validation and max_chart_correction_retries are correctly passed to the chart service, providing control over the validation functionality and retry behavior.


183-188: Chart validation and correction pipelines added to chart adjustment service.

The same chart validation and correction pipelines are properly integrated into the chart adjustment service, providing consistent capabilities across both services.


190-191: Configuration parameters added to chart adjustment service.

The same configuration parameters for chart validation and correction are correctly passed to the chart adjustment service, maintaining consistency between services.

wren-ai-service/src/pipelines/generation/chart_adjustment.py (1)

108-112:

❌ Incorrect review comment

Double-check that the underlying generator really supports image_url

await generator(prompt=..., image_url=image_url) forwards image_url blindly.
If the concrete generator returned by llm_provider.get_generator() does not declare the
parameter (or only when non-empty) this will raise a run-time TypeError.

Please verify with:


🏁 Script executed:

#!/bin/bash
# Grep for the generator signature
rg -A3 -B2 "def .*get_generator" src/providers | head

Length of output: 109


Let’s locate the actual get_generator definition and its signature across the repo:


🏁 Script executed:

#!/bin/bash
# Find where get_generator is defined
rg "def get_generator" -n .

# Find all usages to inspect how it’s called
rg "get_generator(" -n .

Length of output: 295


Let’s inspect the actual signatures of get_generator in both files:


🏁 Script executed:

#!/bin/bash
# Show get_generator in litellm.py
FILE1="wren-ai-service/src/providers/llm/litellm.py"
LOC1=$(rg -n "def get_generator" "$FILE1" | cut -d: -f2)
START1=$((LOC1-5)); END1=$((LOC1+10))
sed -n "${START1},${END1}p" "$FILE1"

# Show get_generator in core/provider.py
FILE2="wren-ai-service/src/core/provider.py"
LOC2=$(rg -n "def get_generator" "$FILE2" | cut -d: -f2)
START2=$((LOC2-5)); END2=$((LOC2+10))
sed -n "${START2},${END2}p" "$FILE2"

Length of output: 912


Let’s pull a few lines of context around each get_generator definition to inspect their signatures:


🏁 Script executed:

#!/bin/bash
rg -n -C5 "def get_generator" wren-ai-service/src/providers/llm/litellm.py
rg -n -C5 "def get_generator" wren-ai-service/src/core/provider.py

Length of output: 849


Let’s inspect the full body of get_generator in litellm.py to see if the returned generator supports image_url:


🏁 Script executed:

#!/bin/bash
# Show the body of get_generator for the LLM provider
rg -n -A30 "def get_generator" wren-ai-service/src/providers/llm/litellm.py

Length of output: 1324


No TypeError risk: generator supports image_url
The litellm provider’s get_generator returns an async _run function whose signature is:

async def _run(
    prompt: str,
    image_url: Optional[str] = None,
    history_messages: Optional[List[ChatMessage]] = None,
    generation_kwargs: Optional[Dict[str, Any]] = None,
    query_id: Optional[str] = None,
):
    …

Since image_url is explicitly declared there, forwarding it via

await generator(prompt=..., image_url=image_url)

will not raise a TypeError. No changes needed.

Likely an incorrect or invalid review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
wren-ai-service/src/providers/llm/__init__.py (1)

80-88: Fix the meta parameter in from_function

The from_function method passes meta=None which violates the type contract. meta is expected to be a Dict[str, Any] but receives None, which will cause an AttributeError when code tries to mutate meta.

-        return cls(content, ChatRole.FUNCTION, name=name, image_url=None, meta=None)
+        return cls(content, ChatRole.FUNCTION, name=name, image_url=None, meta={})
🧹 Nitpick comments (1)
wren-ai-service/src/providers/llm/__init__.py (1)

60-67: Update docstring to include the image_url parameter

The from_user method now accepts an image_url parameter, but the docstring doesn't mention it.

    @classmethod
    def from_user(cls, content: str, image_url: Optional[str] = None) -> "ChatMessage":
        """
        Create a message from the user.

        :param content: The text content of the message.
+       :param image_url: Optional URL to an image to include with the message.
        :returns: A new ChatMessage instance.
        """
        return cls(content, ChatRole.USER, name=None, image_url=image_url)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 49bebb8 and 3dd06d1.

📒 Files selected for processing (1)
  • wren-ai-service/src/providers/llm/__init__.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (3)
wren-ai-service/src/providers/llm/__init__.py (3)

224-252: Solid implementation of OpenAI's multimodal format

The convert_message_to_openai_format function correctly handles the different scenarios for messages:

  1. Text content only
  2. Image URL only
  3. Both text and image

This implementation aligns well with OpenAI's Chat API format for multimodal content.


29-33: Good fix: Default values added to optional fields

The optional fields name and image_url now have default values of None, which makes them properly optional and prevents instantiation errors.


9-16: Well-designed ChatRole enum

The introduction of a proper ChatRole enum instead of string literals improves type safety and code readability. The string-based enum approach maintains compatibility with JSON serialization.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
wren-ai-service/src/pipelines/generation/utils/chart.py (1)

70-70: Type hint for replies parameter is incorrect

The type hint for replies indicates it's a string, but code accesses it as a list with replies[0].

-    replies: str,
+    replies: list[str],
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 3dd06d1 and 569d0f5.

📒 Files selected for processing (1)
  • wren-ai-service/src/pipelines/generation/utils/chart.py (5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (6)
wren-ai-service/src/pipelines/generation/utils/chart.py (6)

46-50: Consider memory usage when returning full raw data

Adding raw_data to the return value is useful, but could cause memory issues with large datasets.

For large datasets, consider:

  1. Adding pagination
  2. Implementing a size limit with a warning
  3. Making it optional based on a parameter
return {
-    "raw_data": df.to_dict(orient="records"),
+    "raw_data": df.to_dict(orient="records") if len(df) < 1000 else df.head(1000).to_dict(orient="records"),
    "sample_data": sample_data,
    "sample_column_values": sample_column_values,
}

53-61: LGTM - Fixed ChartSchemaPreprocessor implementation

Great implementation that safely handles the config key removal with pop() and properly follows Haystack's output contract by returning the dict with the output name as key.


72-72: Good change to support optional theming

Making the custom theme optional with proper type annotation supports the added flexibility for chart generation.


106-107: Reconsider ValidationError handling

The code catches ValidationError, but the jsonschema validation step has been removed, making this catch potentially obsolete.

Verify if this exception can still occur in the current implementation. If not, consider removing or updating this exception handler.


128-132: Good simplification of the ChartGenerationResults model

Using a generic dictionary for chart schema and marking chart_type as deprecated improves flexibility while maintaining backward compatibility.


134-142: Excellent addition of model kwargs for consistent output

The CHART_GENERATION_MODEL_KWARGS dictionary ensures consistent output format using the model schema, which is a good practice.

@cyyeh cyyeh added module/ai-service ai-service related ci/ai-service ai-service related labels May 20, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
wren-ai-service/src/web/v1/services/chart_adjustment.py (1)

203-204: Direct nested dictionary access potential issue.

As with the chart.py file, the code directly accesses chart_result["chart_schema"]["data"]["values"] without checking if these keys exist, which could raise KeyError if any key is missing.

I note there was a previous review comment about this, and you indicated it's not an issue, so I'm respecting that assessment.

Also applies to: 244-245

🧹 Nitpick comments (1)
wren-ai-service/src/web/v1/services/chart_adjustment.py (1)

194-243: Consider extracting the validation logic to reduce duplication.

The validation and correction retry logic is nearly identical between this file and chart.py. Consider extracting this to a shared helper method to reduce duplication and simplify maintenance.

You could create a utility function like:

async def validate_and_correct_chart(
    self, 
    chart_schema: dict,
    query: str,
    sql: str,
    max_retries: int
) -> tuple[dict, bool]:  # Returns (chart_result, is_valid)
    # Implementation of the retry loop
    # ...
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 569d0f5 and 089cb6d.

📒 Files selected for processing (13)
  • wren-ai-service/src/pipelines/generation/chart_generation.py (7 hunks)
  • wren-ai-service/src/pipelines/generation/utils/chart.py (4 hunks)
  • wren-ai-service/src/web/v1/routers/question_recommendation.py (2 hunks)
  • wren-ai-service/src/web/v1/routers/relationship_recommendation.py (3 hunks)
  • wren-ai-service/src/web/v1/routers/semantics_description.py (2 hunks)
  • wren-ai-service/src/web/v1/services/ask.py (3 hunks)
  • wren-ai-service/src/web/v1/services/chart.py (8 hunks)
  • wren-ai-service/src/web/v1/services/chart_adjustment.py (7 hunks)
  • wren-ai-service/src/web/v1/services/question_recommendation.py (3 hunks)
  • wren-ai-service/src/web/v1/services/relationship_recommendation.py (2 hunks)
  • wren-ai-service/src/web/v1/services/semantics_description.py (2 hunks)
  • wren-ai-service/src/web/v1/services/sql_answer.py (2 hunks)
  • wren-ai-service/src/web/v1/services/sql_question.py (3 hunks)
✅ Files skipped from review due to trivial changes (1)
  • wren-ai-service/src/web/v1/routers/relationship_recommendation.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • wren-ai-service/src/pipelines/generation/chart_generation.py
🧰 Additional context used
🧬 Code Graph Analysis (8)
wren-ai-service/src/web/v1/services/relationship_recommendation.py (1)
wren-ai-service/src/web/v1/services/__init__.py (1)
  • Configuration (25-45)
wren-ai-service/src/web/v1/services/semantics_description.py (1)
wren-ai-service/src/web/v1/services/__init__.py (1)
  • Configuration (25-45)
wren-ai-service/src/web/v1/routers/semantics_description.py (1)
wren-ai-service/src/web/v1/services/__init__.py (1)
  • Configuration (25-45)
wren-ai-service/src/web/v1/services/sql_answer.py (1)
wren-ai-service/src/web/v1/services/__init__.py (1)
  • Configuration (25-45)
wren-ai-service/src/web/v1/routers/question_recommendation.py (1)
wren-ai-service/src/web/v1/services/__init__.py (1)
  • Configuration (25-45)
wren-ai-service/src/web/v1/services/question_recommendation.py (1)
wren-ai-service/src/web/v1/services/__init__.py (1)
  • Configuration (25-45)
wren-ai-service/src/web/v1/services/ask.py (1)
wren-ai-service/src/web/v1/services/__init__.py (1)
  • Configuration (25-45)
wren-ai-service/src/web/v1/services/chart.py (4)
wren-ai-service/src/pipelines/generation/chart_generation.py (1)
  • run (141-161)
wren-ai-service/src/pipelines/generation/chart_validation.py (1)
  • run (125-136)
wren-ai-service/src/pipelines/generation/chart_correction.py (1)
  • run (128-145)
wren-ai-service/src/pipelines/retrieval/sql_executor.py (2)
  • run (25-42)
  • run (85-98)
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: build-image (linux_arm64_runner, linux/arm64)
  • GitHub Check: build-image (ubuntu-latest, linux/amd64)
  • GitHub Check: pytest
  • GitHub Check: pytest
🔇 Additional comments (43)
wren-ai-service/src/web/v1/services/semantics_description.py (2)

8-8: Good import addition for the Field class.

The Field import from pydantic is necessary for the default_factory pattern used below.


58-58: Excellent improvement to configuration initialization.

Changing from a direct default instance to Field(default_factory=Configuration) ensures each GenerateRequest instance gets its own fresh Configuration object rather than sharing a single mutable default instance. This prevents potential bugs from shared state between different requests.

wren-ai-service/src/web/v1/services/sql_answer.py (2)

7-7: Good import addition for the Field class.

The Field import from pydantic is necessary for the default_factory pattern used below.


24-24: Excellent improvement to configuration initialization.

Changing from a direct default instance to Field(default_factory=Configuration) ensures each SqlAnswerRequest instance gets its own fresh Configuration object rather than sharing a single mutable default instance. This prevents potential bugs from shared state between different requests.

wren-ai-service/src/web/v1/services/relationship_recommendation.py (2)

7-7: Good import addition for the Field class.

The Field import from pydantic is necessary for the default_factory pattern used below.


21-21: Excellent improvement to configuration initialization.

Changing from a direct default instance to Field(default_factory=Configuration) ensures each Input instance gets its own fresh Configuration object rather than sharing a single mutable default instance. This prevents potential bugs from shared state between different requests.

wren-ai-service/src/web/v1/routers/semantics_description.py (2)

6-6: Good import addition for the Field class.

The Field import from pydantic is necessary for the default_factory pattern used below.


93-93: Excellent improvement to configuration initialization.

Changing from a direct default instance to Field(default_factory=Configuration) ensures each PostRequest instance gets its own fresh Configuration object rather than sharing a single mutable default instance. This prevents potential bugs from shared state between different requests.

This change is consistent with the pattern applied across other service and router modules, showing a systematic approach to address the mutable default value issue.

wren-ai-service/src/web/v1/services/sql_question.py (3)

6-6: Added import for Field to support default_factory.

The import of Field from pydantic is necessary to support the change in the default value handling for the configurations attribute.


20-20: Improved default value handling to prevent shared mutable instances.

Changing from Optional[Configuration] with a direct instance to Configuration = Field(default_factory=Configuration) ensures a new Configuration instance is created for each model instantiation rather than sharing a single mutable instance across all requests. This is a best practice in Pydantic models.


50-50: Added consistent blank line for better code organization.

The blank line improves readability by clearly separating class definitions within the parent class.

wren-ai-service/src/web/v1/routers/question_recommendation.py (2)

6-6: Added import for Field to support default_factory.

The import of Field from pydantic is necessary to support the change in the default value handling for the configuration attribute.


78-81: Improved Pydantic model field definitions.

Changes improve the PostRequest model by:

  1. Removing unnecessary Optional type hints for fields that always have default values
  2. Using Field(default_factory=Configuration) to ensure a new Configuration instance is created for each model instantiation rather than sharing a single instance

These changes make field requirements clearer and prevent potential bugs from shared mutable defaults.

wren-ai-service/src/web/v1/services/question_recommendation.py (3)

8-8: Added import for Field to support default_factory.

The import of Field from pydantic is necessary to support the change in the default value handling for the configuration parameters.


65-65: Improved default value handling in _validate_question method.

Changed from Optional[Configuration] = Configuration() to Configuration = Field(default_factory=Configuration) to ensure a new Configuration instance is created for each method call rather than sharing a single mutable instance.


157-160: Improved Pydantic model field definitions in Request class.

Changes improve the Request model by:

  1. Removing unnecessary Optional type hints for fields that always have default values
  2. Using Field(default_factory=Configuration) to ensure a new Configuration instance is created for each model instantiation

These changes align with the changes in the router counterpart and maintain consistency across the codebase.

wren-ai-service/src/web/v1/services/ask.py (4)

31-34: Improved default value handling in AskRequest model.

Changes address potential bugs from mutable default values:

  1. Used Field(default_factory=list) for histories to ensure a new list is created per instance
  2. Used Field(default_factory=Configuration) for configurations to ensure a new object per instance
  3. Removed unnecessary Optional type hints for fields with default values

These changes make the model more robust against shared mutable state issues.


103-103: Removed unnecessary Optional type hint for is_followup field.

Since is_followup always has a default value of False, the Optional type hint is redundant and has been correctly removed.


110-110: Removed unnecessary Optional type hint for is_followup field in AskResultResponse.

Since is_followup has a default value and the exclude=True parameter, the Optional type hint is redundant and has been correctly removed.


124-124: Improved default value handling in AskFeedbackRequest model.

Changed from Optional[Configuration] = Configuration() to Configuration = Field(default_factory=Configuration) to ensure a new Configuration instance is created for each model instantiation rather than sharing a single mutable instance.

wren-ai-service/src/web/v1/services/chart.py (9)

6-6: Proper Field import for Pydantic.

Good addition of the Field import which will be used for safer handling of default values.


23-24: Nice improvement with Field default_factory.

Using Field(default_factory=Configuration) instead of direct instantiation prevents the mutable default issue where the same Configuration instance could be shared across requests.


69-70: Model refinements enhance flexibility.

Making chart_type and chart_schema optional with sensible defaults ("" and None) improves handling of edge cases when these values might be missing.


83-95: Good implementation of feature flags for chart validation.

These new parameters allow toggling the validation system and configuring retry behavior at the service level, providing flexibility for different deployment environments.


121-126: Well-organized variable initialization.

The code clearly initializes all necessary variables at the beginning of the method, improving readability and maintainability.


142-142: Proper usage of the data_provided flag.

The code correctly sets and passes the data_provided flag to indicate when data is directly provided versus when it's fetched via SQL execution.

Also applies to: 154-155


157-159: Robust dictionary access with default values.

Using get() with default values instead of direct dictionary access is safer and prevents KeyError exceptions if keys are missing.


170-243: Well-implemented validation and correction retry mechanism.

The code effectively implements a validation and correction retry loop that:

  1. Validates the chart schema
  2. Proceeds if valid
  3. Attempts correction and retries if invalid
  4. Fails after exhausting retry attempts

This approach ensures more reliable chart generation.


179-181:

✅ Verification successful

Direct nested dictionary access potential issue.

The code directly accesses chart_result["chart_schema"]["data"]["values"] without checking if these keys exist, which could raise KeyError if any key is missing.

I note there was a previous review comment about this, and you indicated it's not an issue, so I'm respecting that assessment.

Also applies to: 216-217


🏁 Script executed:

#!/bin/bash
# Check if the chart schema structure in the test data consistently has the expected nested keys
echo "Searching for chart schema test fixtures with data.values structure..."
rg -A 5 -B 5 '"chart_schema".*"data".*"values"' --json

Length of output: 11205


🏁 Script executed:

#!/bin/bash
echo "=== ChartService Method Context (lines 140–260) ==="
sed -n '140,260p' wren-ai-service/src/web/v1/services/chart.py

echo
echo "=== ChartResult Model Definitions ==="
rg -n "class ChartResult" -n wren-ai-service/src
rg -n "class ChartAdjustmentResult" -n wren-ai-service/src

Length of output: 6036


🏁 Script executed:

#!/bin/bash
echo "=== chart.py Imports and Type Aliases (lines 1–80) ==="
sed -n '1,80p' wren-ai-service/src/web/v1/services/chart.py

echo
echo "=== ChartResult Model Definition (next 20 lines) ==="
rg -n "class ChartResult" wren-ai-service/src/web/v1/services/chart.py -A 20

echo
echo "=== ChartSchema References Across Repo ==="
rg -n "ChartSchema" -n wren-ai-service/src

Length of output: 4133


🏁 Script executed:

#!/bin/bash
echo "=== ChartSchemaPreprocessor implementation ==="
sed -n '1,200p' wren-ai-service/src/pipelines/generation/utils/chart.py

Length of output: 4312


Nested keys are guaranteed by the pipeline – no guards needed

The ChartGenerationPostProcessor always injects

chart_schema["data"] = {"values": sample_data}

before the service logic runs, and the ChartSchemaPreprocessor only strips out the "config" key. By the time you reach the if remove_data_from_chart_schema branches, every non‐failed path will have a chart_schema["data"]["values"] list to clear.

wren-ai-service/src/pipelines/generation/utils/chart.py (7)

13-15: New theme loading function.

Added a function to load a custom theme from a JSON file. The hardcoded path could potentially cause issues in different environments, but I note from previous reviews that you've decided to skip making this configurable.


47-47: Enhanced raw data access.

Adding the full raw data as part of the return object allows downstream components to access the complete dataset when needed.


53-61: Useful chart schema preprocessing component.

This component safely removes the "config" key from chart schemas using pop() with a default value to prevent KeyError. This allows for a clean application of custom themes later in the pipeline.


70-73: Improved type hints and custom theme support.

The type hint for replies has been correctly updated to list[str] to match its usage, and a new custom_theme parameter has been added to support theming.


88-90: Simple and effective theme application.

The code conditionally applies a custom theme when provided, which allows for consistent visual styling across chart visualizations.


128-132: Streamlined chart generation results model.

The ChartGenerationResults model has been simplified, noting that chart_type is deprecated while maintaining backward compatibility.


134-142: Well-structured model kwargs for consistent API usage.

The CHART_GENERATION_MODEL_KWARGS constant provides a standardized configuration for the JSON schema response format, ensuring consistency across different LLM calls.

wren-ai-service/src/web/v1/services/chart_adjustment.py (7)

31-34: Enhanced input flexibility for chart adjustment.

The code now accepts:

  • A required adjustment_command string (replacing the complex option object)
  • Optional data for direct input
  • Optional image URL for visual context

These changes improve the API's flexibility and support new use cases.


37-39: Proper field declarations with safe defaults.

Good use of Field(default_factory=Configuration) to prevent mutable default issues, and clear annotation of the deprecated adjustment_option field.


84-84: Consistent model field updates.

Making chart_type optional with a default value aligns with the changes in the other models and improves handling of missing values.


101-112: Parallel implementation of validation parameters.

The service constructor mirrors the implementation in ChartService, maintaining consistency across related services.


152-161: Flexible data source handling.

The code now properly handles both provided data and SQL-executed data, setting the data_provided flag appropriately.


171-177: Updated pipeline call with new parameters.

The chart adjustment pipeline call has been updated to include the new adjustment_command, data_provided flag, and image_url, supporting the enhanced functionality.


194-243: Comprehensive validation and correction logic.

Similar to the implementation in ChartService, this code effectively implements a validation and correction retry mechanism that improves chart generation reliability.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (4)
wren-ai-service/src/pipelines/generation/utils/chart.py (3)

13-15: Hardcoded file path could be problematic

The hardcoded file path might cause issues in different environments or if the file structure changes. Consider making this path configurable.

-def load_custom_theme() -> Dict[str, Any]:
-    with open("src/pipelines/generation/utils/theme_powerbi.json", "r") as f:
+def load_custom_theme(theme_path: str = "src/pipelines/generation/utils/theme_powerbi.json") -> Dict[str, Any]:
+    with open(theme_path, "r", encoding="utf-8") as f:
        return orjson.loads(f.read())

This would allow for different themes to be loaded and ensure proper encoding handling.


70-72: Fixed type hint for replies parameter

The change from replies: str to replies: list[str] correctly aligns the type hint with the actual usage of the parameter in the code (accessing replies[0]).


88-90: Added custom theme support

The conditional block to apply a custom theme if provided enhances the flexibility of chart styling. However, there's no validation of the theme structure, which could lead to runtime errors if a malformed theme is provided.

Consider adding basic validation:

if custom_theme:
+    # Validate minimum theme structure
+    if not isinstance(custom_theme, dict):
+        logger.warning("Custom theme is not a dictionary. Using default configuration.")
+    else:
        chart_schema["config"] = custom_theme
wren-ai-service/src/web/v1/services/chart_adjustment.py (1)

201-202: Potential KeyError when stripping data values

The line chart_result["chart_schema"]["data"]["values"] = [] assumes that both "data" and "values" exist in the chart schema. If either key is missing, this will raise a KeyError.

Consider using a safer approach:

-chart_result["chart_schema"]["data"]["values"] = []
+chart_result.setdefault("chart_schema", {}).setdefault("data", {})["values"] = []

Let's verify if this is actually a concern by checking how chart schemas are structured:

#!/bin/bash
# Look for chart schema creation to understand the structure
rg -A 5 "chart_schema\[.*\]" --glob "*.py"
🧹 Nitpick comments (1)
wren-ai-service/src/pipelines/indexing/sql_pairs.py (1)

196-197: Type annotation refinement - removed unnecessary Optional

The changes to both the project_id and external_pairs parameters remove unnecessary Optional type hints while maintaining the default values. This is a good practice for type consistency.

However, I should note that using an empty dictionary {} as a default argument creates a mutable default, which can lead to unexpected behavior if modified. Consider using Field(default_factory=dict) instead:

-    async def run(
-        self,
-        mdl_str: str,
-        project_id: str = "",
-        external_pairs: Dict[str, Any] = {},
-    ) -> Dict[str, Any]:
+    async def run(
+        self,
+        mdl_str: str,
+        project_id: str = "",
+        external_pairs: Dict[str, Any] = Field(default_factory=dict),
+    ) -> Dict[str, Any]:
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 089cb6d and d734f54.

📒 Files selected for processing (21)
  • wren-ai-service/src/pipelines/generation/intent_classification.py (1 hunks)
  • wren-ai-service/src/pipelines/generation/relationship_recommendation.py (2 hunks)
  • wren-ai-service/src/pipelines/generation/sql_correction.py (2 hunks)
  • wren-ai-service/src/pipelines/generation/sql_generation.py (2 hunks)
  • wren-ai-service/src/pipelines/generation/sql_regeneration.py (2 hunks)
  • wren-ai-service/src/pipelines/generation/utils/chart.py (4 hunks)
  • wren-ai-service/src/pipelines/generation/utils/sql.py (1 hunks)
  • wren-ai-service/src/pipelines/indexing/db_schema.py (1 hunks)
  • wren-ai-service/src/pipelines/indexing/instructions.py (4 hunks)
  • wren-ai-service/src/pipelines/indexing/sql_pairs.py (5 hunks)
  • wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py (1 hunks)
  • wren-ai-service/src/pipelines/retrieval/historical_question_retrieval.py (1 hunks)
  • wren-ai-service/src/pipelines/retrieval/instructions.py (1 hunks)
  • wren-ai-service/src/pipelines/retrieval/sql_executor.py (1 hunks)
  • wren-ai-service/src/pipelines/retrieval/sql_functions.py (1 hunks)
  • wren-ai-service/src/pipelines/retrieval/sql_pairs_retrieval.py (1 hunks)
  • wren-ai-service/src/providers/embedder/litellm.py (1 hunks)
  • wren-ai-service/src/web/v1/services/__init__.py (1 hunks)
  • wren-ai-service/src/web/v1/services/chart.py (8 hunks)
  • wren-ai-service/src/web/v1/services/chart_adjustment.py (6 hunks)
  • wren-ai-service/src/web/v1/services/question_recommendation.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • wren-ai-service/src/web/v1/services/question_recommendation.py
  • wren-ai-service/src/web/v1/services/chart.py
🧰 Additional context used
🧬 Code Graph Analysis (2)
wren-ai-service/src/pipelines/indexing/instructions.py (5)
wren-ai-service/src/pipelines/generation/intent_classification.py (1)
  • run (364-386)
wren-ai-service/src/pipelines/indexing/db_schema.py (2)
  • run (27-59)
  • run (365-379)
wren-ai-service/src/pipelines/indexing/sql_pairs.py (3)
  • run (31-49)
  • run (58-73)
  • run (193-213)
wren-ai-service/src/pipelines/retrieval/instructions.py (2)
  • run (23-34)
  • run (177-187)
wren-ai-service/src/web/v1/services/instructions.py (1)
  • Instruction (17-22)
wren-ai-service/src/pipelines/indexing/sql_pairs.py (2)
wren-ai-service/src/pipelines/indexing/instructions.py (3)
  • run (31-52)
  • run (61-76)
  • run (149-164)
wren-ai-service/src/pipelines/indexing/__init__.py (3)
  • run (25-47)
  • run (57-72)
  • run (78-87)
🔇 Additional comments (36)
wren-ai-service/src/pipelines/retrieval/historical_question_retrieval.py (1)

125-125: Type annotation improvement

Changed the type annotation of historical_question_retrieval_similarity_threshold from Optional[float] to float. This improves type safety by ensuring the parameter must be a float and cannot be None, which aligns with the provided default value.

wren-ai-service/src/pipelines/retrieval/sql_executor.py (1)

69-69: Type annotation improvement

Changed the type annotation of engine_timeout from Optional[float] to float. This matches the default value and ensures type consistency, preventing the parameter from being None.

wren-ai-service/src/pipelines/generation/relationship_recommendation.py (2)

4-4: Import change

Replaced imports to remove unnecessary Optional type import.


188-188: Type annotation improvement

Changed the type annotation of engine_timeout from Optional[float] to float. This ensures type safety by enforcing that the parameter must be a float, consistent with the default value.

wren-ai-service/src/pipelines/generation/intent_classification.py (1)

332-333: Type annotation improvement

Changed the type annotations of table_retrieval_size and table_column_retrieval_size from Optional[int] to int. This improves type clarity by indicating these parameters must be integers and cannot be None, matching their default values.

wren-ai-service/src/pipelines/generation/sql_correction.py (1)

104-104: LGTM: Type hint refinement improves clarity

The change from Optional[float] to float for the engine_timeout parameter improves type safety while maintaining the same default value. This better reflects how the parameter is actually used in the code (no null checks are performed when using it).

wren-ai-service/src/web/v1/services/__init__.py (1)

44-45: LGTM: Type safety improvement prevents potential bugs

Changing language and timezone from optional to non-optional with defaults improves type safety and prevents potential None dereference errors in methods like show_current_time() which access timezone.name without null checks.

wren-ai-service/src/pipelines/retrieval/instructions.py (1)

153-154: LGTM: Type hint improvements match parameter usage

Changing similarity_threshold and top_k from Optional[float/int] to non-optional types better reflects their actual usage in the filtered_documents function (lines 92-94) where they're used directly without null checks.

wren-ai-service/src/pipelines/indexing/db_schema.py (1)

339-339: LGTM: Consistent type hinting improvement

Changing column_batch_size from Optional[int] to int is consistent with other changes in this PR and better reflects the parameter usage in the chunk function (line 306) where it's passed directly without null checks.

wren-ai-service/src/pipelines/generation/sql_generation.py (2)

3-3: Simplified imports by removing unused Optional

The Optional import is no longer needed after making the engine_timeout parameter non-optional with a default value.


126-126: Improved type annotation for engine_timeout parameter

Changing from Optional[float] to just float with a default value of 30.0 is a good practice. Since there's a default value, this parameter is effectively never optional, making the type annotation more accurate.

wren-ai-service/src/pipelines/generation/sql_regeneration.py (2)

3-3: Simplified imports by removing unused Optional

The Optional import is no longer needed after updating the engine_timeout parameter type.


138-138: Improved type annotation for engine_timeout parameter

Changing from Optional[float] to just float with a default value of 30.0 makes the type annotation more accurate since parameters with default values are never None at runtime.

wren-ai-service/src/pipelines/generation/utils/sql.py (1)

34-34: Enhanced type annotation for timeout parameter

Simplified the type annotation from Optional[float] to float while keeping the default value of 30.0. This more accurately represents that the parameter will always have a value at runtime.

wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py (1)

436-437: Improved type annotations for retrieval size parameters

Removed Optional from the type annotations while keeping the default values, making the annotations more accurate. This is part of a consistent pattern across the codebase to improve type safety.

wren-ai-service/src/providers/embedder/litellm.py (1)

165-165: Appropriate type annotation change

Changing from Optional[float] to float with a default value is a good practice since the parameter will never be None when it has a default value. This improves type consistency and static analysis.

wren-ai-service/src/pipelines/retrieval/sql_functions.py (1)

104-105: Type annotation improvement

Good change removing Optional from parameters that have default values. This makes the type system more accurate since these parameters can never be None when defaults are provided.

wren-ai-service/src/pipelines/retrieval/sql_pairs_retrieval.py (1)

121-122: Type consistency improvement

Removing Optional from parameters with default values strengthens the type system. Since the filtered_documents function expects non-optional parameters, this change ensures type consistency throughout the pipeline.

wren-ai-service/src/pipelines/indexing/instructions.py (4)

31-31: Type annotation updated correctly

Changing from Optional[str] to str with a default empty string maintains the existing behavior while improving type clarity. The conditional at line 34 will still work as intended since an empty string is falsy in Python.


86-86: Type annotation improved

Good change from Optional[str] to str with a default empty string, consistent with the pattern in other functions.


104-104: Type annotation consistency

This change maintains the function's behavior while improving type annotations. The function passes project_id to cleaner.run() which still accepts an optional parameter, so this change is safe.


152-152: Type annotation improvement

Changing from Optional[str] to str with a default empty string ensures consistency with other parameter type annotations in the codebase while preserving the behavior.

wren-ai-service/src/pipelines/indexing/sql_pairs.py (3)

31-31: Type annotation refinement - removed unnecessary Optional

The change from Optional[str] = "" to just str = "" is appropriate here. Since the parameter has a default value, it doesn't need to be marked as Optional.


107-111: Type annotation refinement - removed unnecessary Optional

The change from Optional[str] = "" to just str = "" for project_id parameter is appropriate. This improves type consistency by removing unnecessary Optional markers on parameters that have default values.


125-129: Type annotation refinement - removed unnecessary Optional

The change from Optional[str] = "" to just str = "" for project_id parameter is appropriate. This follows the pattern of removing unnecessary optionality from parameters with default values.

wren-ai-service/src/pipelines/generation/utils/chart.py (4)

46-50: Enhanced data output includes raw data

The addition of "raw_data" to the return dictionary is a good enhancement, providing access to the full dataset along with the sample data. This is useful for downstream components that might need more than just the sample.


53-61: Well-implemented chart schema preprocessor

The new ChartSchemaPreprocessor component is well implemented. It properly:

  1. Uses the @component decorator and defines output types
  2. Safely removes the "config" key using pop with a default value of None to avoid KeyError
  3. Returns the processed schema wrapped in a dictionary with the output name as key

This pattern follows Haystack's component contract correctly.


130-131: Deprecated chart_type field and simplified type

The change from Dict to dict makes the type hint more consistent with Python's built-in types. The deprecation comment for chart_type is helpful to signal that this field should not be relied upon in new code.


134-142: Added model kwargs for standardized response format

The new CHART_GENERATION_MODEL_KWARGS constant provides a standardized JSON schema for the response format, which is very useful for ensuring consistent API responses. Using the model's JSON schema from ChartGenerationResults directly ensures the schema stays in sync with the model.

wren-ai-service/src/web/v1/services/chart_adjustment.py (7)

31-34: Enhanced request model with more flexible control options

The changes to the request model represent a significant improvement:

  1. Replacing adjustment_option with a simpler adjustment_command string allows for more flexible instructions
  2. Adding data parameter enables direct data input without requiring SQL execution
  3. Adding image_url parameter enables visual input for chart adjustment

These additions will make the API more versatile and user-friendly.


37-39: Improved field definitions to prevent mutable default issues

The changes here represent good practices:

  1. Using Field(default_factory=Configuration) prevents the mutable default issue where all instances would share the same Configuration object
  2. Removing Optional from remove_data_from_chart_schema since it has a default value
  3. Adding a deprecation comment for adjustment_option is helpful for API transition

These changes will prevent potential bugs and improve code clarity.


85-85: Deprecated strict chart type literals

Changing from Literal[...] to str = "" with a deprecation comment makes the API more flexible while clearly indicating the transition away from fixed chart types.


89-89: Removed "understanding" status

The removal of the "understanding" status from the status literals simplifies the state machine for chart adjustment requests. The remaining states provide a clearer progression through the pipeline.


99-100: Added chart validation and correction configuration

The new parameters provide valuable configuration options:

  1. allow_chart_validation enables/disables the validation feature
  2. max_chart_correction_retries controls how many correction attempts are made

This gives operators control over the validation process and its performance impact.


150-159: Added flexible data input options

The new conditional logic for data handling is well-implemented:

  1. If data is not provided, execute SQL to fetch it
  2. If data is provided directly, use it and set a flag
  3. Track the source of data with data_provided for later use

This enhances the API's flexibility while maintaining backward compatibility.


180-243: Well-structured chart validation and correction retry mechanism

The new validation and correction mechanism is well-implemented:

  1. Only activates when allow_chart_validation is true
  2. Uses a bounded retry loop with clear exit conditions
  3. Properly handles failure cases after exhausting retries
  4. Maintains the original behavior (cleaning data) upon success
  5. Uses safe dictionary access via .get() with defaults

This implementation enhances chart quality while preventing infinite loops and handling errors gracefully.

@cyyeh cyyeh requested a review from imAsterSun May 21, 2025 01:51
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
wren-ai-service/src/providers/llm/__init__.py (1)

80-88: from_function passes meta=None – can violate type contract

The meta parameter is expected to be a Dict[str, Any], yet from_function passes None, which could lead to an AttributeError later when callers try to mutate meta.

-return cls(content, ChatRole.FUNCTION, name=name, image_url=None, meta=None)
+return cls(content, ChatRole.FUNCTION, name=name, image_url=None, meta={})
🧹 Nitpick comments (3)
wren-ai-service/src/providers/llm/__init__.py (3)

224-252: Simplify content type handling logic in convert_message_to_openai_format

The function has some duplication in the content handling logic between mixed content, text-only, and image-only cases. Consider refactoring for better maintainability.

 def convert_message_to_openai_format(message: ChatMessage) -> Dict[str, str]:
     """
     Convert a message to the format expected by OpenAI's Chat API.
 
     See the [API reference](https://platform.openai.com/docs/api-reference/chat/create) for details.
 
     :returns: A dictionary with the following key:
         - `role`
         - `content`
         - `name` (optional)
     """
     openai_msg = {"role": message.role.value}
 
-    if message.content and hasattr(message, "image_url") and message.image_url:
-        openai_msg["content"] = [
-            {"type": "text", "text": message.content},
-            {"type": "image_url", "image_url": {"url": message.image_url}},
-        ]
-    elif message.content:
-        openai_msg["content"] = message.content
-    elif hasattr(message, "image_url") and message.image_url:
-        openai_msg["content"] = [
-            {"type": "image_url", "image_url": {"url": message.image_url}}
-        ]
+    # Build content based on available data
+    content_parts = []
+    
+    if message.content:
+        content_parts.append({"type": "text", "text": message.content})
+        
+    if message.image_url:
+        content_parts.append({"type": "image_url", "image_url": {"url": message.image_url}})
+        
+    if len(content_parts) > 1:
+        openai_msg["content"] = content_parts
+    elif len(content_parts) == 1 and content_parts[0]["type"] == "text":
+        openai_msg["content"] = message.content
+    elif len(content_parts) == 1:
+        openai_msg["content"] = [content_parts[0]]
 
     if hasattr(message, "name") and message.name:
         openai_msg["name"] = message.name

237-247: Consider removing redundant hasattr checks

Since image_url is a defined attribute of the ChatMessage class, using hasattr(message, "image_url") is unnecessary and can be replaced with just checking if message.image_url is truthy.

-    if message.content and hasattr(message, "image_url") and message.image_url:
+    if message.content and message.image_url:
     openai_msg["content"] = [
         {"type": "text", "text": message.content},
         {"type": "image_url", "image_url": {"url": message.image_url}},
     ]
 elif message.content:
     openai_msg["content"] = message.content
-elif hasattr(message, "image_url") and message.image_url:
+elif message.image_url:
     openai_msg["content"] = [
         {"type": "image_url", "image_url": {"url": message.image_url}}
     ]

249-250: Remove redundant hasattr check for name

Similar to the image_url attribute, name is already defined in the ChatMessage class, making the hasattr check unnecessary.

-    if hasattr(message, "name") and message.name:
+    if message.name:
         openai_msg["name"] = message.name
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between d734f54 and eac9f94.

📒 Files selected for processing (2)
  • docker/config.example.yaml (1 hunks)
  • wren-ai-service/src/providers/llm/__init__.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • docker/config.example.yaml
🔇 Additional comments (3)
wren-ai-service/src/providers/llm/__init__.py (3)

18-34: Great job with the ChatMessage dataclass structure

You've added proper defaults for optional fields name and image_url, addressing the previous issue flagged in earlier reviews. Using field(default_factory=dict) for the meta attribute is the correct approach to avoid the mutable default issue.


9-16: Good implementation of ChatRole enum

Using str, Enum as base classes is a good choice for string-based enums. The included documentation is helpful and the defined roles cover the standard OpenAI chat roles.


117-130: Well-structured StreamingChunk dataclass

The StreamingChunk dataclass is concise and well-documented, with proper default handling for the meta dictionary using field(default_factory=dict).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
wren-ai-service/src/pipelines/generation/chart_adjustment.py (1)

155-156: Theme loading implementation may cause issues in package contexts

The theme loading uses a hardcoded relative path, which could cause issues when the code is packaged or run from different directories.

wren-ai-service/src/pipelines/generation/utils/chart.py (1)

13-15: 🛠️ Refactor suggestion

Consider more robust theme file loading

The theme loading function uses a hardcoded relative path which can cause issues in different environments.

Consider using the importlib.resources approach for more robust path resolution:

-def load_custom_theme() -> Dict[str, Any]:
-    with open("src/pipelines/generation/utils/theme_powerbi.json", "r") as f:
-        return orjson.loads(f.read())
+def load_custom_theme() -> Dict[str, Any]:
+    from importlib import resources
+    from src.pipelines.generation import utils
+    
+    with resources.open_text(utils, "theme_powerbi.json") as f:
+        return orjson.loads(f.read())
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4d1dcba and 1d05fb3.

📒 Files selected for processing (5)
  • wren-ai-service/src/pipelines/generation/chart_adjustment.py (8 hunks)
  • wren-ai-service/src/pipelines/generation/chart_correction.py (1 hunks)
  • wren-ai-service/src/pipelines/generation/chart_generation.py (7 hunks)
  • wren-ai-service/src/pipelines/generation/utils/chart.py (4 hunks)
  • wren-ai-service/src/web/v1/services/chart.py (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • wren-ai-service/src/web/v1/services/chart.py
  • wren-ai-service/src/pipelines/generation/chart_correction.py
🧰 Additional context used
🧬 Code Graph Analysis (2)
wren-ai-service/src/pipelines/generation/chart_generation.py (1)
wren-ai-service/src/pipelines/generation/utils/chart.py (6)
  • ChartDataPreprocessor (19-50)
  • ChartGenerationPostProcessor (64-125)
  • load_custom_theme (13-15)
  • run (24-50)
  • run (58-60)
  • run (68-125)
wren-ai-service/src/pipelines/generation/chart_adjustment.py (1)
wren-ai-service/src/pipelines/generation/utils/chart.py (7)
  • ChartDataPreprocessor (19-50)
  • ChartGenerationPostProcessor (64-125)
  • ChartSchemaPreprocessor (54-60)
  • load_custom_theme (13-15)
  • run (24-50)
  • run (58-60)
  • run (68-125)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: pytest
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (15)
wren-ai-service/src/pipelines/generation/chart_generation.py (4)

3-3: Good refactoring of imports to support new functionality

The imports have been correctly updated to include the necessary components for flexible chart generation, such as general instructions, model parameters, and the theme loading function.

Also applies to: 13-17


26-32: Improved system prompt with clearer instructions

The system prompt has been enhanced with better instructions and now includes the general chart generation instructions as a reusable component. Adding the requirement that reasoning language should match the user's language improves the user experience.


93-107: Enhanced post-processing with conditional data handling and theming

The post-processing step has been improved to:

  1. Conditionally use either raw data or sample data based on the data_provided flag
  2. Apply a custom theme to the chart schema

This makes the chart generation more flexible and visually consistent.


146-157: Improved pipeline run method with data_provided flag

The run method has been updated to use the data_provided flag instead of remove_data_from_chart_schema, which better reflects its purpose of determining whether to use the full dataset or a sample.

wren-ai-service/src/pipelines/generation/chart_adjustment.py (6)

13-19: Good refactoring of imports to support new functionality

The imports have been correctly updated to include necessary components for flexible chart adjustment, including the chart schema preprocessor and theme loader.


27-37: Enhanced system prompt with image support and clear instructions

The system prompt has been significantly improved to:

  1. Include general chart generation instructions
  2. Support image-based chart generation references
  3. Ensure reasoning language matches user language

These enhancements make the chart adjustment more versatile and user-friendly.


74-80: Good addition of chart schema preprocessing

Adding a dedicated preprocessing step for chart schemas improves modularity and ensures consistent handling of chart configurations before adjustment. This component properly implements the Haystack component pattern.


107-111: Added support for image references in chart adjustment

The generate_chart_adjustment function now accepts an image_url parameter, enabling image-based chart adjustments. This significantly enhances the capability of the chart adjustment pipeline.


147-151: Improved pipeline component configuration

The pipeline now:

  1. Uses the shared CHART_GENERATION_MODEL_KWARGS for consistent model configuration
  2. Includes the ChartSchemaPreprocessor component for proper schema handling

This standardization helps maintain consistency across different chart-related pipelines.


167-173: Enhanced pipeline run method with flexible parameters

The run method has been enhanced with:

  1. A simpler adjustment_command parameter instead of a structured object
  2. Support for data_provided flag for conditional data handling
  3. Support for image_url for image-based chart adjustments

These changes make the pipeline more versatile and easier to use.

Also applies to: 185-187

wren-ai-service/src/pipelines/generation/utils/chart.py (5)

53-61: Well implemented ChartSchemaPreprocessor

The ChartSchemaPreprocessor correctly implements the Haystack component pattern and safely removes the config key using pop() with a default value to avoid KeyError exceptions.


70-73: Improved ChartGenerationPostProcessor with theme support

The post-processor has been enhanced to:

  1. Correctly specify replies as list[str] for proper type hinting
  2. Accept an optional custom_theme parameter
  3. Apply the theme to the chart schema when available

These improvements make the post-processor more flexible and maintain visual consistency.

Also applies to: 88-90


128-136: Good extraction of general instructions as a constant

Extracting the general chart generation instructions into a reusable constant improves code organization and ensures consistency across different pipelines.


139-143: Simplified ChartGenerationResults model

The ChartGenerationResults model has been simplified to use a generic dict for chart_schema instead of a strict schema, which increases flexibility. The chart_type field is appropriately marked as deprecated.


145-153: Well-structured model configuration

The CHART_GENERATION_MODEL_KWARGS constant provides a standardized configuration for the model response format, ensuring consistent JSON schema validation across pipelines.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (2)
wren-ai-service/src/pipelines/generation/utils/chart.py (2)

74-89: 🛠️ Refactor suggestion

Handle potential IndexError with empty replies

The code assumes replies list is non-empty but doesn't verify this before accessing replies[0].

Add a check to prevent IndexError:

def run(
    self,
    replies: list[str],
    sample_data: list[dict],
    custom_theme: Optional[dict[str, Any]] = None,
):
    try:
+        if not replies:
+            logger.warning("Received empty replies list")
+            return {
+                "results": {
+                    "chart_schema": {},
+                    "reasoning": "",
+                    "chart_type": "",
+                }
+            }
        generation_result = orjson.loads(replies[0])

106-125: 🛠️ Refactor suggestion

Remove or update ValidationError handling

The exception handler for ValidationError is present, but the code no longer performs schema validation with jsonschema.validate. This inconsistency should be addressed.

Either remove the unused ValidationError exception handling or update it to reflect the current validation approach:

-        except ValidationError as e:
-            logger.exception(f"Vega-lite schema is not valid: {e}")
-
-            return {
-                "results": {
-                    "chart_schema": {},
-                    "reasoning": "",
-                    "chart_type": "",
-                }
-            }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1d05fb3 and fb17cb7.

📒 Files selected for processing (1)
  • wren-ai-service/src/pipelines/generation/utils/chart.py (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: pytest
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (6)
wren-ai-service/src/pipelines/generation/utils/chart.py (6)

47-47: Good addition of raw data to the return value

Including the complete dataset provides downstream components with access to the full data when needed, enhancing flexibility.


53-61: Well-implemented preprocessor

The ChartSchemaPreprocessor is correctly implemented with proper output typing and safe dictionary key removal.


70-72: Properly fixed type hints and added custom theme support

The type hint for replies correctly reflects its list nature, and the optional custom theme parameter enables theme customization.


128-139: Clear and concise chart generation instructions

The updated instructions are well-structured and provide clear guidance on what should be included/omitted in the Vega-Lite schema.


142-145: Good simplification of ChartGenerationResults model

Using a generic dictionary for chart_schema provides more flexibility, and the deprecation notice for chart_type is helpful for future maintenance.


148-156: Well-structured model configuration

The CHART_GENERATION_MODEL_KWARGS dictionary provides a clean way to specify the JSON schema response format using the Pydantic model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant