-
Notifications
You must be signed in to change notification settings - Fork 767
feat(wren-ai-service): flexible chart generation (ai-env-changed) #1652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThis update adds two new pipeline components, Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant ChartService
participant ChartValidationPipeline
participant ChartCorrectionPipeline
participant ChartGenerationPipeline
Client->>ChartService: Request chart generation
ChartService->>ChartGenerationPipeline: Generate initial chart schema
ChartGenerationPipeline-->>ChartService: Return chart schema and reasoning
alt Chart validation enabled
ChartService->>ChartValidationPipeline: Validate chart schema
ChartValidationPipeline-->>ChartService: Validation result (valid/invalid)
alt Chart is valid
ChartService->>Client: Return finished chart result
else Chart invalid and retries left
ChartService->>ChartCorrectionPipeline: Correct chart schema
ChartCorrectionPipeline-->>ChartService: Updated chart schema
ChartService->>ChartValidationPipeline: Validate updated chart schema
Note over ChartService: Retry loop until valid or max retries
else Chart invalid and no retries left
ChartService->>Client: Return failure result
end
else Chart validation disabled
ChartService->>Client: Return finished chart result
end
Possibly related PRs
Poem
Note ⚡️ AI Code Reviews for VS Code, Cursor, WindsurfCodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback. Note ⚡️ Faster reviews with cachingCodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 30th. To opt out, configure 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🧰 Additional context used🧬 Code Graph Analysis (1)wren-ai-service/src/pipelines/generation/utils/chart.py (5)
🔇 Additional comments (6)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
479f257
to
1dc2638
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 12
🔭 Outside diff range comments (1)
wren-ai-service/src/web/v1/services/chart.py (1)
80-95
: 🛠️ Refactor suggestionConstructor parameters can get silently out-of-sync with the supplied pipeline dict
ChartService.__init__()
now expects that thepipelines
mapping always containschart_validation
andchart_correction
whenallow_chart_validation=True
.
Because these keys are not accessed until run-time, a mis-configured DI container will raise aKeyError
only after the first request, turning a configuration error into a 500-level production error.Consider validating the presence of the required pipelines up-front:
def __init__(self, pipelines: Dict[str, BasicPipeline], allow_chart_validation: bool = False, max_chart_correction_retries: int = 3, maxsize: int = 1_000_000, ttl: int = 120): - self._pipelines = pipelines + required = {"sql_executor", "chart_generation"} + if allow_chart_validation: + required |= {"chart_validation", "chart_correction"} + missing = required - pipelines.keys() + if missing: + raise ValueError(f"Missing pipeline(s): {', '.join(missing)}") + self._pipelines = pipelines
♻️ Duplicate comments (7)
wren-ai-service/docs/config_examples/config.anthropic.yaml (1)
139-142
: (Duplicate) Update example settings with new chart config fields
As noted in theconfig.grok.yaml
review, this example should includeallow_chart_validation
andmax_chart_correction_retries
under itssettings:
section to surface the new options to users.docker/config.example.yaml (1)
142-145
: (Duplicate) Update example settings with new chart config fields
As mentioned in theconfig.grok.yaml
review, the top‐level example should reflect the newallow_chart_validation
andmax_chart_correction_retries
settings in itssettings:
block.wren-ai-service/docs/config_examples/config.deepseek.yaml (1)
162-165
: (Duplicate) Update example settings with new chart config fields
Consistent with other configs, please add theallow_chart_validation
andmax_chart_correction_retries
fields under thesettings:
section to this example.wren-ai-service/docs/config_examples/config.groq.yaml (1)
143-146
: Duplicate: Group chart-related pipelines together.
As inconfig.ollama.yaml
, relocatechart_validation
andchart_correction
entries to immediately followchart_adjustment
under thepipes
list for coherent grouping.wren-ai-service/docs/config_examples/config.azure.yaml (1)
152-155
: Duplicate: Group chart-related pipelines together.
Consistently repositionchart_validation
andchart_correction
entries to come right afterchart_adjustment
in thepipes
list, matching the grouping in other environment configs.wren-ai-service/tools/config/config.full.yaml (1)
153-156
: Duplicate: Group chart-related pipelines together.
Align with other examples by placingchart_validation
andchart_correction
entries adjacent tochart_adjustment
to preserve a logical, chart-focused sequence in thepipes
list.wren-ai-service/src/web/v1/services/chart.py (1)
216-218
: Duplicate unsafe mutation – see comment aboveThe same unconditional mutation appears here; refactor into a small helper to avoid repetition and future omissions.
🧹 Nitpick comments (21)
wren-ai-service/src/pipelines/generation/utils/theme_powerbi.json (1)
1-128
: Validate theme JSON structure and integration.Ensure this Power BI theme adheres to Vega-Lite’s theme specification (e.g., wrapping under a
config
key if required) and is correctly loaded viaload_custom_theme()
. Consider adding a unit test or lint step to verify that the theme produces the expected styling when applied to a sample chart.wren-ai-service/tools/config/config.example.yaml (1)
153-156
: Document new chart pipelines in example config.Ensure that
chart_validation
andchart_correction
pipelines are reflected in the project README and in other config examples. Also verify any configurable parameters (e.g., retry counts, timeouts) are documented so users can customize behavior.wren-ai-service/docs/config_examples/config.google_ai_studio.yaml (1)
148-151
: Enhance documentation for chart pipelines in example config.Add inline comments describing the purpose of
chart_validation
andchart_correction
, and note any prerequisites (e.g., environment variables or dependencies for image conversion). This will guide users customizing the Google AI Studio config file.wren-ai-service/docs/config_examples/config.grok.yaml (1)
144-147
: Chart validation and correction pipelines added; update example settings
The newchart_validation
andchart_correction
pipelines follow the established pattern. To surface the corresponding controls to users, consider adding the new settings fields (allow_chart_validation
andmax_chart_correction_retries
) under thesettings:
block of this example config so they can override the defaults.wren-ai-service/src/config.py (1)
45-46
: Document new chart config settings
You've addedallow_chart_validation
andmax_chart_correction_retries
to control the new pipelines. To improve discoverability, please add inline comments or update the class‐level docstring to describe these fields and their expected use.wren-ai-service/src/pipelines/generation/__init__.py (2)
2-4
: Ensure imports follow alphabetical grouping.
The newly addedChartCorrection
andChartValidation
imports interrupt the existing alphabetical order of pipeline modules. Consider reordering these imports to maintain consistency:from .chart_adjustment import ChartAdjustment from .chart_correction import ChartCorrection from .chart_generation import ChartGeneration from .chart_validation import ChartValidation
41-42
: Maintain__all__
ordering for pipeline exports.
The entries"ChartValidation"
and"ChartCorrection"
should be inserted alongside other chart-related exports in alphabetical order—ideally immediately after"ChartAdjustment"
and"ChartGeneration"
—to keep the public API list consistent and readable.wren-ai-service/docs/config_examples/config.ollama.yaml (1)
140-143
: Group chart-related pipelines together.
Thechart_validation
andchart_correction
entries are currently appended at the end of thepipes
list. To improve logical flow and readability, consider moving them to directly followchart_generation
andchart_adjustment
within the same group.wren-ai-service/test_vega_lite_chart.py (1)
1-16
: Consider turning this script into a proper unit test.This file successfully demonstrates the Vega-Lite to PNG conversion functionality that's likely used by the chart validation pipeline. However, it appears to be a quick test script rather than a formal unit test.
Consider converting this into a proper unit test with assertions to verify the output and integrating it into the project's test suite for automated testing.
A test implementation could look like:
import base64 import pytest import vl_convert as vlc def test_vega_lite_to_png_conversion(): vl_spec = """ # existing Vega-Lite spec """ # Convert to PNG png_bytes = vlc.vegalite_to_png(vl_spec=vl_spec, vl_version="v5.15") # Assert PNG bytes are not empty assert len(png_bytes) > 0 # Verify it can be base64 encoded properly b64_str = base64.b64encode(png_bytes).decode("utf-8") assert b64_str.startswith('/9j/') or b64_str.startswith('iVBOR') # Common PNG header patternswren-ai-service/src/providers/llm/litellm.py (1)
119-121
:extract_braces_content()
can return an empty string – consider a safer fallbackWhen the LLM reply does not contain braces,
extract_braces_content
will return an empty
string, propagating upwards as the final reply.
If the downstream code assumes valid JSON this will surface as hard-to-trace errors.Suggestion: fall back to the raw content when extraction fails, or raise a descriptive
exception early.- "replies": [ - extract_braces_content(message.content) for message in completions - ], + "replies": [ + extract_braces_content(msg.content) or msg.content + for msg in completions + ],wren-ai-service/src/pipelines/generation/chart_generation.py (3)
23-33
: Minor wording & spacing issues in the system promptA couple of typos can hurt the LLM’s comprehension signal.
-You are a data analyst great at generating data visualization using vega-lite! Given the user's question, SQL, sample data and sample column values, you need to think about the best chart type and generate correspondingvega-lite schema in JSON format. +You are a data analyst great at generating data visualizations using Vega-Lite! +Given the user's question, SQL, sample data and sample column values, think about the best chart type and generate the corresponding Vega-Lite schema in JSON format.
92-107
: Guard against missing keys whendata_provided
is TrueIf an upstream change drops the
raw_data
key, the ternary expression will raise
KeyError
, aborting the whole pipeline instead of failing gracefully.- preprocess_data["raw_data"] + preprocess_data.get("raw_data", [])(The same applies to
"sample_data"
.) Falling back to an empty list is usually safer for prompt construction.
132-134
: Load the custom theme once and reuse
load_custom_theme()
reads a file on every pipeline instantiation. Caching the
result at module level avoids needless disk I/O and JSON parsing._theme_cache: Optional[dict[str, Any]] = None def _get_theme() -> dict[str, Any]: global _theme_cache if _theme_cache is None: _theme_cache = load_custom_theme() return _theme_cache ... self._configs = { "custom_theme": _get_theme(), }wren-ai-service/src/pipelines/generation/chart_validation.py (1)
112-118
: Prompt builder is invoked without variables – safe but wastes latency
PromptBuilder.run()
re-renders the template on every call even though
chart_validation_user_prompt_template
is static.
Since the template contains no placeholders, replace the builder with a constant
string to save a couple of milliseconds per request."prompt_builder": lambda **_: {"prompt": chart_validation_user_prompt_template.strip()},wren-ai-service/src/web/v1/services/chart.py (1)
120-127
:data_provided
flag can stayFalse
even whenchart_request.data
is falsy but non-empty
if not chart_request.data:
treats “any falsy value” (empty dict/list) as “no data”, even though the caller attempted to supply data.
This may break downstream prompts that rely ondata_provided
.-if not chart_request.data: +if chart_request.data is None:This distinguishes “omitted” from “provided but empty”.
wren-ai-service/src/providers/llm/__init__.py (1)
224-253
: Type annotation too narrow for returned structure
convert_message_to_openai_format
returns a dict where"content"
can be either astr
or aList[Dict[str,Any]]
, but the signature constrains it toDict[str, str]
.-def convert_message_to_openai_format(message: ChatMessage) -> Dict[str, str]: +from typing import Any, Mapping + +def convert_message_to_openai_format(message: ChatMessage) -> Dict[str, Any]:Mismatched annotations propagate to callers and may mask bugs when static analysis is enabled.
wren-ai-service/src/pipelines/generation/chart_correction.py (1)
101-126
: Consider injectingPromptBuilder
template via kwargs for easier runtime tuningHard-coding long system/user prompt strings inside the module makes quick experimentation in prod almost impossible.
Expose them as optional constructor args or load from config to enable A/B testing without code changes.wren-ai-service/src/pipelines/generation/chart_adjustment.py (1)
118-130
: Raw-vs-sample data ambiguity
post_process()
feeds eitherraw_data
orsample_data
into the Vega-Lite spec based ondata_provided
.
For very large tables this can explode the payload size and slow down client rendering.Consider streaming only a capped sample in all cases, or introducing a hard limit (e.g.
MAX_ROWS = 5 000
).wren-ai-service/src/web/v1/services/chart_adjustment.py (2)
84-85
:chart_type
should bestr | None
notOptional[str] = ""
An empty string is not “missing”; it’s a value.
Preferchart_type: str | None = None
to clearly indicate absence.
152-162
:if not chart_adjustment_request.data
treats empty dict as “no data”An empty dict is falsy, so users passing
{}
deliberately will trigger an unnecessary SQL round-trip.
Use an explicitis None
check:-if not chart_adjustment_request.data: +if chart_adjustment_request.data is None:wren-ai-service/src/pipelines/generation/utils/chart.py (1)
13-15
: Robust theme loading for packaged deploymentsHard-coded path assumes execution from repo root.
Useimportlib.resources
so the theme is available when the package is installed elsewhere.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting
⛔ Files ignored due to path filters (1)
wren-ai-service/poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (28)
deployment/kustomizations/base/cm.yaml
(1 hunks)docker/config.example.yaml
(1 hunks)wren-ai-service/docs/config_examples/config.anthropic.yaml
(1 hunks)wren-ai-service/docs/config_examples/config.azure.yaml
(1 hunks)wren-ai-service/docs/config_examples/config.deepseek.yaml
(1 hunks)wren-ai-service/docs/config_examples/config.google_ai_studio.yaml
(1 hunks)wren-ai-service/docs/config_examples/config.grok.yaml
(1 hunks)wren-ai-service/docs/config_examples/config.groq.yaml
(1 hunks)wren-ai-service/docs/config_examples/config.lm_studio.yaml
(1 hunks)wren-ai-service/docs/config_examples/config.ollama.yaml
(1 hunks)wren-ai-service/docs/config_examples/config.open_router.yaml
(1 hunks)wren-ai-service/pyproject.toml
(1 hunks)wren-ai-service/src/config.py
(1 hunks)wren-ai-service/src/globals.py
(2 hunks)wren-ai-service/src/pipelines/generation/__init__.py
(2 hunks)wren-ai-service/src/pipelines/generation/chart_adjustment.py
(8 hunks)wren-ai-service/src/pipelines/generation/chart_correction.py
(1 hunks)wren-ai-service/src/pipelines/generation/chart_generation.py
(6 hunks)wren-ai-service/src/pipelines/generation/chart_validation.py
(1 hunks)wren-ai-service/src/pipelines/generation/utils/chart.py
(5 hunks)wren-ai-service/src/pipelines/generation/utils/theme_powerbi.json
(1 hunks)wren-ai-service/src/providers/llm/__init__.py
(2 hunks)wren-ai-service/src/providers/llm/litellm.py
(4 hunks)wren-ai-service/src/web/v1/services/chart.py
(6 hunks)wren-ai-service/src/web/v1/services/chart_adjustment.py
(7 hunks)wren-ai-service/test_vega_lite_chart.py
(1 hunks)wren-ai-service/tools/config/config.example.yaml
(1 hunks)wren-ai-service/tools/config/config.full.yaml
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
wren-ai-service/src/pipelines/generation/__init__.py (2)
wren-ai-service/src/pipelines/generation/chart_correction.py (1)
ChartCorrection
(101-145)wren-ai-service/src/pipelines/generation/chart_validation.py (1)
ChartValidation
(104-136)
wren-ai-service/src/globals.py (2)
wren-ai-service/src/pipelines/generation/chart_validation.py (1)
ChartValidation
(104-136)wren-ai-service/src/pipelines/generation/chart_correction.py (1)
ChartCorrection
(101-145)
wren-ai-service/src/pipelines/generation/chart_generation.py (1)
wren-ai-service/src/pipelines/generation/utils/chart.py (6)
ChartDataPreprocessor
(19-50)ChartGenerationPostProcessor
(64-125)load_custom_theme
(13-15)run
(24-50)run
(58-60)run
(68-125)
wren-ai-service/src/pipelines/generation/chart_adjustment.py (1)
wren-ai-service/src/pipelines/generation/utils/chart.py (7)
ChartDataPreprocessor
(19-50)ChartGenerationPostProcessor
(64-125)ChartSchemaPreprocessor
(54-60)load_custom_theme
(13-15)run
(24-50)run
(58-60)run
(68-125)
🔇 Additional comments (9)
wren-ai-service/pyproject.toml (1)
39-39
:✅ Verification successful
Approve addition of
vl-convert-python
dependency.This new runtime dependency is required for converting Vega-Lite charts to PNG in the chart validation and correction pipelines. Ensure that
poetry lock
has been updated and that this version is compatible with Python 3.12.To verify installation and importability, you can run:
🏁 Script executed:
#!/bin/bash poetry run python - <<EOF import vl_convert as vc print("vl-convert-python version:", vc.__version__) EOFLength of output: 118
🏁 Script executed:
#!/bin/bash # Verify that 'vl-convert-python' is in the lock file and check its Python version constraints rg 'name = "vl-convert-python"' -A4 poetry.lockLength of output: 101
🏁 Script executed:
#!/bin/bash # Search for poetry.lock in the repository to verify that the lock file includes vl-convert-python find . -type f -name "poetry.lock"Length of output: 64
🏁 Script executed:
#!/bin/bash # Verify that vl-convert-python is in the lock file and show its locked version and Python version constraints rg -A5 'name = "vl-convert-python"' wren-ai-service/poetry.lockLength of output: 239
Approve addition of
vl-convert-python
dependency.
- Verified in wren-ai-service/poetry.lock:
• name = "vl-convert-python"
• version = "1.7.0"
• python-versions = ">=3.7"- Python 3.12 falls within the supported range.
- Import module as
vl_convert
in your local env to confirm if needed.deployment/kustomizations/base/cm.yaml (1)
192-195
: Verify consistency of new chart pipelines in Kustomize config.Confirm that
chart_validation
andchart_correction
entries here align with those inconfig.example.yaml
and the docs. Check that the indentation matches Kustomize’s parsing expectations and that no required fields (e.g., timeouts or retries) are missing.wren-ai-service/docs/config_examples/config.lm_studio.yaml (1)
142-145
: New chart validation and correction pipelines added.The addition of these two new pipelines aligns with the pattern of other pipelines in the configuration file, correctly using the
litellm_llm.default
model. These components will provide chart validation and correction capabilities to the system.wren-ai-service/docs/config_examples/config.open_router.yaml (1)
140-143
: New chart validation and correction pipelines added.The addition of these two new pipelines follows the same pattern as in other configuration files, properly using the
litellm_llm.default
model. These components will provide chart validation and correction capabilities to the system.wren-ai-service/src/globals.py (4)
163-168
: Chart validation and correction pipelines properly integrated.The new chart validation and correction pipelines are correctly integrated into the chart service, ensuring that the components can be used for validating chart images and correcting chart schemas.
170-171
: Configuration parameters properly added to chart service.The new configuration parameters
allow_chart_validation
andmax_chart_correction_retries
are correctly passed to the chart service, providing control over the validation functionality and retry behavior.
183-188
: Chart validation and correction pipelines added to chart adjustment service.The same chart validation and correction pipelines are properly integrated into the chart adjustment service, providing consistent capabilities across both services.
190-191
: Configuration parameters added to chart adjustment service.The same configuration parameters for chart validation and correction are correctly passed to the chart adjustment service, maintaining consistency between services.
wren-ai-service/src/pipelines/generation/chart_adjustment.py (1)
108-112
:❌ Incorrect review comment
Double-check that the underlying generator really supports
image_url
await generator(prompt=..., image_url=image_url)
forwardsimage_url
blindly.
If the concrete generator returned byllm_provider.get_generator()
does not declare the
parameter (or only when non-empty) this will raise a run-timeTypeError
.Please verify with:
🏁 Script executed:
#!/bin/bash # Grep for the generator signature rg -A3 -B2 "def .*get_generator" src/providers | headLength of output: 109
Let’s locate the actual
get_generator
definition and its signature across the repo:
🏁 Script executed:
#!/bin/bash # Find where get_generator is defined rg "def get_generator" -n . # Find all usages to inspect how it’s called rg "get_generator(" -n .Length of output: 295
Let’s inspect the actual signatures of
get_generator
in both files:
🏁 Script executed:
#!/bin/bash # Show get_generator in litellm.py FILE1="wren-ai-service/src/providers/llm/litellm.py" LOC1=$(rg -n "def get_generator" "$FILE1" | cut -d: -f2) START1=$((LOC1-5)); END1=$((LOC1+10)) sed -n "${START1},${END1}p" "$FILE1" # Show get_generator in core/provider.py FILE2="wren-ai-service/src/core/provider.py" LOC2=$(rg -n "def get_generator" "$FILE2" | cut -d: -f2) START2=$((LOC2-5)); END2=$((LOC2+10)) sed -n "${START2},${END2}p" "$FILE2"Length of output: 912
Let’s pull a few lines of context around each
get_generator
definition to inspect their signatures:
🏁 Script executed:
#!/bin/bash rg -n -C5 "def get_generator" wren-ai-service/src/providers/llm/litellm.py rg -n -C5 "def get_generator" wren-ai-service/src/core/provider.pyLength of output: 849
Let’s inspect the full body of
get_generator
inlitellm.py
to see if the returned generator supportsimage_url
:
🏁 Script executed:
#!/bin/bash # Show the body of get_generator for the LLM provider rg -n -A30 "def get_generator" wren-ai-service/src/providers/llm/litellm.pyLength of output: 1324
No TypeError risk: generator supports
image_url
Thelitellm
provider’sget_generator
returns an async_run
function whose signature is:async def _run( prompt: str, image_url: Optional[str] = None, history_messages: Optional[List[ChatMessage]] = None, generation_kwargs: Optional[Dict[str, Any]] = None, query_id: Optional[str] = None, ): …Since
image_url
is explicitly declared there, forwarding it viaawait generator(prompt=..., image_url=image_url)will not raise a
TypeError
. No changes needed.Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
wren-ai-service/src/providers/llm/__init__.py (1)
80-88
: Fix themeta
parameter infrom_function
The
from_function
method passesmeta=None
which violates the type contract.meta
is expected to be aDict[str, Any]
but receivesNone
, which will cause anAttributeError
when code tries to mutatemeta
.- return cls(content, ChatRole.FUNCTION, name=name, image_url=None, meta=None) + return cls(content, ChatRole.FUNCTION, name=name, image_url=None, meta={})
🧹 Nitpick comments (1)
wren-ai-service/src/providers/llm/__init__.py (1)
60-67
: Update docstring to include theimage_url
parameterThe
from_user
method now accepts animage_url
parameter, but the docstring doesn't mention it.@classmethod def from_user(cls, content: str, image_url: Optional[str] = None) -> "ChatMessage": """ Create a message from the user. :param content: The text content of the message. + :param image_url: Optional URL to an image to include with the message. :returns: A new ChatMessage instance. """ return cls(content, ChatRole.USER, name=None, image_url=image_url)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting
📒 Files selected for processing (1)
wren-ai-service/src/providers/llm/__init__.py
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
🔇 Additional comments (3)
wren-ai-service/src/providers/llm/__init__.py (3)
224-252
: Solid implementation of OpenAI's multimodal formatThe
convert_message_to_openai_format
function correctly handles the different scenarios for messages:
- Text content only
- Image URL only
- Both text and image
This implementation aligns well with OpenAI's Chat API format for multimodal content.
29-33
: Good fix: Default values added to optional fieldsThe optional fields
name
andimage_url
now have default values ofNone
, which makes them properly optional and prevents instantiation errors.
9-16
: Well-designed ChatRole enumThe introduction of a proper
ChatRole
enum instead of string literals improves type safety and code readability. The string-based enum approach maintains compatibility with JSON serialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
wren-ai-service/src/pipelines/generation/utils/chart.py (1)
70-70
: Type hint forreplies
parameter is incorrectThe type hint for
replies
indicates it's a string, but code accesses it as a list withreplies[0]
.- replies: str, + replies: list[str],
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting
📒 Files selected for processing (1)
wren-ai-service/src/pipelines/generation/utils/chart.py
(5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
🔇 Additional comments (6)
wren-ai-service/src/pipelines/generation/utils/chart.py (6)
46-50
: Consider memory usage when returning full raw dataAdding
raw_data
to the return value is useful, but could cause memory issues with large datasets.For large datasets, consider:
- Adding pagination
- Implementing a size limit with a warning
- Making it optional based on a parameter
return { - "raw_data": df.to_dict(orient="records"), + "raw_data": df.to_dict(orient="records") if len(df) < 1000 else df.head(1000).to_dict(orient="records"), "sample_data": sample_data, "sample_column_values": sample_column_values, }
53-61
: LGTM - Fixed ChartSchemaPreprocessor implementationGreat implementation that safely handles the config key removal with
pop()
and properly follows Haystack's output contract by returning the dict with the output name as key.
72-72
: Good change to support optional themingMaking the custom theme optional with proper type annotation supports the added flexibility for chart generation.
106-107
: Reconsider ValidationError handlingThe code catches
ValidationError
, but the jsonschema validation step has been removed, making this catch potentially obsolete.Verify if this exception can still occur in the current implementation. If not, consider removing or updating this exception handler.
128-132
: Good simplification of the ChartGenerationResults modelUsing a generic dictionary for chart schema and marking chart_type as deprecated improves flexibility while maintaining backward compatibility.
134-142
: Excellent addition of model kwargs for consistent outputThe
CHART_GENERATION_MODEL_KWARGS
dictionary ensures consistent output format using the model schema, which is a good practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
wren-ai-service/src/web/v1/services/chart_adjustment.py (1)
203-204
: Direct nested dictionary access potential issue.As with the chart.py file, the code directly accesses
chart_result["chart_schema"]["data"]["values"]
without checking if these keys exist, which could raise KeyError if any key is missing.I note there was a previous review comment about this, and you indicated it's not an issue, so I'm respecting that assessment.
Also applies to: 244-245
🧹 Nitpick comments (1)
wren-ai-service/src/web/v1/services/chart_adjustment.py (1)
194-243
: Consider extracting the validation logic to reduce duplication.The validation and correction retry logic is nearly identical between this file and chart.py. Consider extracting this to a shared helper method to reduce duplication and simplify maintenance.
You could create a utility function like:
async def validate_and_correct_chart( self, chart_schema: dict, query: str, sql: str, max_retries: int ) -> tuple[dict, bool]: # Returns (chart_result, is_valid) # Implementation of the retry loop # ...
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting
📒 Files selected for processing (13)
wren-ai-service/src/pipelines/generation/chart_generation.py
(7 hunks)wren-ai-service/src/pipelines/generation/utils/chart.py
(4 hunks)wren-ai-service/src/web/v1/routers/question_recommendation.py
(2 hunks)wren-ai-service/src/web/v1/routers/relationship_recommendation.py
(3 hunks)wren-ai-service/src/web/v1/routers/semantics_description.py
(2 hunks)wren-ai-service/src/web/v1/services/ask.py
(3 hunks)wren-ai-service/src/web/v1/services/chart.py
(8 hunks)wren-ai-service/src/web/v1/services/chart_adjustment.py
(7 hunks)wren-ai-service/src/web/v1/services/question_recommendation.py
(3 hunks)wren-ai-service/src/web/v1/services/relationship_recommendation.py
(2 hunks)wren-ai-service/src/web/v1/services/semantics_description.py
(2 hunks)wren-ai-service/src/web/v1/services/sql_answer.py
(2 hunks)wren-ai-service/src/web/v1/services/sql_question.py
(3 hunks)
✅ Files skipped from review due to trivial changes (1)
- wren-ai-service/src/web/v1/routers/relationship_recommendation.py
🚧 Files skipped from review as they are similar to previous changes (1)
- wren-ai-service/src/pipelines/generation/chart_generation.py
🧰 Additional context used
🧬 Code Graph Analysis (8)
wren-ai-service/src/web/v1/services/relationship_recommendation.py (1)
wren-ai-service/src/web/v1/services/__init__.py (1)
Configuration
(25-45)
wren-ai-service/src/web/v1/services/semantics_description.py (1)
wren-ai-service/src/web/v1/services/__init__.py (1)
Configuration
(25-45)
wren-ai-service/src/web/v1/routers/semantics_description.py (1)
wren-ai-service/src/web/v1/services/__init__.py (1)
Configuration
(25-45)
wren-ai-service/src/web/v1/services/sql_answer.py (1)
wren-ai-service/src/web/v1/services/__init__.py (1)
Configuration
(25-45)
wren-ai-service/src/web/v1/routers/question_recommendation.py (1)
wren-ai-service/src/web/v1/services/__init__.py (1)
Configuration
(25-45)
wren-ai-service/src/web/v1/services/question_recommendation.py (1)
wren-ai-service/src/web/v1/services/__init__.py (1)
Configuration
(25-45)
wren-ai-service/src/web/v1/services/ask.py (1)
wren-ai-service/src/web/v1/services/__init__.py (1)
Configuration
(25-45)
wren-ai-service/src/web/v1/services/chart.py (4)
wren-ai-service/src/pipelines/generation/chart_generation.py (1)
run
(141-161)wren-ai-service/src/pipelines/generation/chart_validation.py (1)
run
(125-136)wren-ai-service/src/pipelines/generation/chart_correction.py (1)
run
(128-145)wren-ai-service/src/pipelines/retrieval/sql_executor.py (2)
run
(25-42)run
(85-98)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: build-image (linux_arm64_runner, linux/arm64)
- GitHub Check: build-image (ubuntu-latest, linux/amd64)
- GitHub Check: pytest
- GitHub Check: pytest
🔇 Additional comments (43)
wren-ai-service/src/web/v1/services/semantics_description.py (2)
8-8
: Good import addition for the Field class.The Field import from pydantic is necessary for the default_factory pattern used below.
58-58
: Excellent improvement to configuration initialization.Changing from a direct default instance to
Field(default_factory=Configuration)
ensures eachGenerateRequest
instance gets its own freshConfiguration
object rather than sharing a single mutable default instance. This prevents potential bugs from shared state between different requests.wren-ai-service/src/web/v1/services/sql_answer.py (2)
7-7
: Good import addition for the Field class.The Field import from pydantic is necessary for the default_factory pattern used below.
24-24
: Excellent improvement to configuration initialization.Changing from a direct default instance to
Field(default_factory=Configuration)
ensures eachSqlAnswerRequest
instance gets its own freshConfiguration
object rather than sharing a single mutable default instance. This prevents potential bugs from shared state between different requests.wren-ai-service/src/web/v1/services/relationship_recommendation.py (2)
7-7
: Good import addition for the Field class.The Field import from pydantic is necessary for the default_factory pattern used below.
21-21
: Excellent improvement to configuration initialization.Changing from a direct default instance to
Field(default_factory=Configuration)
ensures eachInput
instance gets its own freshConfiguration
object rather than sharing a single mutable default instance. This prevents potential bugs from shared state between different requests.wren-ai-service/src/web/v1/routers/semantics_description.py (2)
6-6
: Good import addition for the Field class.The Field import from pydantic is necessary for the default_factory pattern used below.
93-93
: Excellent improvement to configuration initialization.Changing from a direct default instance to
Field(default_factory=Configuration)
ensures eachPostRequest
instance gets its own freshConfiguration
object rather than sharing a single mutable default instance. This prevents potential bugs from shared state between different requests.This change is consistent with the pattern applied across other service and router modules, showing a systematic approach to address the mutable default value issue.
wren-ai-service/src/web/v1/services/sql_question.py (3)
6-6
: Added import for Field to support default_factory.The import of
Field
frompydantic
is necessary to support the change in the default value handling for theconfigurations
attribute.
20-20
: Improved default value handling to prevent shared mutable instances.Changing from
Optional[Configuration]
with a direct instance toConfiguration = Field(default_factory=Configuration)
ensures a newConfiguration
instance is created for each model instantiation rather than sharing a single mutable instance across all requests. This is a best practice in Pydantic models.
50-50
: Added consistent blank line for better code organization.The blank line improves readability by clearly separating class definitions within the parent class.
wren-ai-service/src/web/v1/routers/question_recommendation.py (2)
6-6
: Added import for Field to support default_factory.The import of
Field
frompydantic
is necessary to support the change in the default value handling for theconfiguration
attribute.
78-81
: Improved Pydantic model field definitions.Changes improve the
PostRequest
model by:
- Removing unnecessary
Optional
type hints for fields that always have default values- Using
Field(default_factory=Configuration)
to ensure a newConfiguration
instance is created for each model instantiation rather than sharing a single instanceThese changes make field requirements clearer and prevent potential bugs from shared mutable defaults.
wren-ai-service/src/web/v1/services/question_recommendation.py (3)
8-8
: Added import for Field to support default_factory.The import of
Field
frompydantic
is necessary to support the change in the default value handling for theconfiguration
parameters.
65-65
: Improved default value handling in _validate_question method.Changed from
Optional[Configuration] = Configuration()
toConfiguration = Field(default_factory=Configuration)
to ensure a newConfiguration
instance is created for each method call rather than sharing a single mutable instance.
157-160
: Improved Pydantic model field definitions in Request class.Changes improve the
Request
model by:
- Removing unnecessary
Optional
type hints for fields that always have default values- Using
Field(default_factory=Configuration)
to ensure a newConfiguration
instance is created for each model instantiationThese changes align with the changes in the router counterpart and maintain consistency across the codebase.
wren-ai-service/src/web/v1/services/ask.py (4)
31-34
: Improved default value handling in AskRequest model.Changes address potential bugs from mutable default values:
- Used
Field(default_factory=list)
forhistories
to ensure a new list is created per instance- Used
Field(default_factory=Configuration)
forconfigurations
to ensure a new object per instance- Removed unnecessary
Optional
type hints for fields with default valuesThese changes make the model more robust against shared mutable state issues.
103-103
: Removed unnecessary Optional type hint for is_followup field.Since
is_followup
always has a default value ofFalse
, theOptional
type hint is redundant and has been correctly removed.
110-110
: Removed unnecessary Optional type hint for is_followup field in AskResultResponse.Since
is_followup
has a default value and theexclude=True
parameter, theOptional
type hint is redundant and has been correctly removed.
124-124
: Improved default value handling in AskFeedbackRequest model.Changed from
Optional[Configuration] = Configuration()
toConfiguration = Field(default_factory=Configuration)
to ensure a newConfiguration
instance is created for each model instantiation rather than sharing a single mutable instance.wren-ai-service/src/web/v1/services/chart.py (9)
6-6
: Proper Field import for Pydantic.Good addition of the Field import which will be used for safer handling of default values.
23-24
: Nice improvement with Field default_factory.Using
Field(default_factory=Configuration)
instead of direct instantiation prevents the mutable default issue where the same Configuration instance could be shared across requests.
69-70
: Model refinements enhance flexibility.Making
chart_type
andchart_schema
optional with sensible defaults (""
andNone
) improves handling of edge cases when these values might be missing.
83-95
: Good implementation of feature flags for chart validation.These new parameters allow toggling the validation system and configuring retry behavior at the service level, providing flexibility for different deployment environments.
121-126
: Well-organized variable initialization.The code clearly initializes all necessary variables at the beginning of the method, improving readability and maintainability.
142-142
: Proper usage of the data_provided flag.The code correctly sets and passes the
data_provided
flag to indicate when data is directly provided versus when it's fetched via SQL execution.Also applies to: 154-155
157-159
: Robust dictionary access with default values.Using
get()
with default values instead of direct dictionary access is safer and prevents KeyError exceptions if keys are missing.
170-243
: Well-implemented validation and correction retry mechanism.The code effectively implements a validation and correction retry loop that:
- Validates the chart schema
- Proceeds if valid
- Attempts correction and retries if invalid
- Fails after exhausting retry attempts
This approach ensures more reliable chart generation.
179-181
:✅ Verification successful
Direct nested dictionary access potential issue.
The code directly accesses
chart_result["chart_schema"]["data"]["values"]
without checking if these keys exist, which could raise KeyError if any key is missing.I note there was a previous review comment about this, and you indicated it's not an issue, so I'm respecting that assessment.
Also applies to: 216-217
🏁 Script executed:
#!/bin/bash # Check if the chart schema structure in the test data consistently has the expected nested keys echo "Searching for chart schema test fixtures with data.values structure..." rg -A 5 -B 5 '"chart_schema".*"data".*"values"' --jsonLength of output: 11205
🏁 Script executed:
#!/bin/bash echo "=== ChartService Method Context (lines 140–260) ===" sed -n '140,260p' wren-ai-service/src/web/v1/services/chart.py echo echo "=== ChartResult Model Definitions ===" rg -n "class ChartResult" -n wren-ai-service/src rg -n "class ChartAdjustmentResult" -n wren-ai-service/srcLength of output: 6036
🏁 Script executed:
#!/bin/bash echo "=== chart.py Imports and Type Aliases (lines 1–80) ===" sed -n '1,80p' wren-ai-service/src/web/v1/services/chart.py echo echo "=== ChartResult Model Definition (next 20 lines) ===" rg -n "class ChartResult" wren-ai-service/src/web/v1/services/chart.py -A 20 echo echo "=== ChartSchema References Across Repo ===" rg -n "ChartSchema" -n wren-ai-service/srcLength of output: 4133
🏁 Script executed:
#!/bin/bash echo "=== ChartSchemaPreprocessor implementation ===" sed -n '1,200p' wren-ai-service/src/pipelines/generation/utils/chart.pyLength of output: 4312
Nested keys are guaranteed by the pipeline – no guards needed
The
ChartGenerationPostProcessor
always injectschart_schema["data"] = {"values": sample_data}before the service logic runs, and the
ChartSchemaPreprocessor
only strips out the"config"
key. By the time you reach theif remove_data_from_chart_schema
branches, every non‐failed path will have achart_schema["data"]["values"]
list to clear.wren-ai-service/src/pipelines/generation/utils/chart.py (7)
13-15
: New theme loading function.Added a function to load a custom theme from a JSON file. The hardcoded path could potentially cause issues in different environments, but I note from previous reviews that you've decided to skip making this configurable.
47-47
: Enhanced raw data access.Adding the full raw data as part of the return object allows downstream components to access the complete dataset when needed.
53-61
: Useful chart schema preprocessing component.This component safely removes the "config" key from chart schemas using
pop()
with a default value to prevent KeyError. This allows for a clean application of custom themes later in the pipeline.
70-73
: Improved type hints and custom theme support.The type hint for
replies
has been correctly updated tolist[str]
to match its usage, and a newcustom_theme
parameter has been added to support theming.
88-90
: Simple and effective theme application.The code conditionally applies a custom theme when provided, which allows for consistent visual styling across chart visualizations.
128-132
: Streamlined chart generation results model.The
ChartGenerationResults
model has been simplified, noting thatchart_type
is deprecated while maintaining backward compatibility.
134-142
: Well-structured model kwargs for consistent API usage.The
CHART_GENERATION_MODEL_KWARGS
constant provides a standardized configuration for the JSON schema response format, ensuring consistency across different LLM calls.wren-ai-service/src/web/v1/services/chart_adjustment.py (7)
31-34
: Enhanced input flexibility for chart adjustment.The code now accepts:
- A required
adjustment_command
string (replacing the complex option object)- Optional data for direct input
- Optional image URL for visual context
These changes improve the API's flexibility and support new use cases.
37-39
: Proper field declarations with safe defaults.Good use of
Field(default_factory=Configuration)
to prevent mutable default issues, and clear annotation of the deprecatedadjustment_option
field.
84-84
: Consistent model field updates.Making
chart_type
optional with a default value aligns with the changes in the other models and improves handling of missing values.
101-112
: Parallel implementation of validation parameters.The service constructor mirrors the implementation in ChartService, maintaining consistency across related services.
152-161
: Flexible data source handling.The code now properly handles both provided data and SQL-executed data, setting the
data_provided
flag appropriately.
171-177
: Updated pipeline call with new parameters.The chart adjustment pipeline call has been updated to include the new
adjustment_command
,data_provided
flag, andimage_url
, supporting the enhanced functionality.
194-243
: Comprehensive validation and correction logic.Similar to the implementation in ChartService, this code effectively implements a validation and correction retry mechanism that improves chart generation reliability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
wren-ai-service/src/pipelines/generation/utils/chart.py (3)
13-15
: Hardcoded file path could be problematicThe hardcoded file path might cause issues in different environments or if the file structure changes. Consider making this path configurable.
-def load_custom_theme() -> Dict[str, Any]: - with open("src/pipelines/generation/utils/theme_powerbi.json", "r") as f: +def load_custom_theme(theme_path: str = "src/pipelines/generation/utils/theme_powerbi.json") -> Dict[str, Any]: + with open(theme_path, "r", encoding="utf-8") as f: return orjson.loads(f.read())This would allow for different themes to be loaded and ensure proper encoding handling.
70-72
: Fixed type hint for replies parameterThe change from
replies: str
toreplies: list[str]
correctly aligns the type hint with the actual usage of the parameter in the code (accessingreplies[0]
).
88-90
: Added custom theme supportThe conditional block to apply a custom theme if provided enhances the flexibility of chart styling. However, there's no validation of the theme structure, which could lead to runtime errors if a malformed theme is provided.
Consider adding basic validation:
if custom_theme: + # Validate minimum theme structure + if not isinstance(custom_theme, dict): + logger.warning("Custom theme is not a dictionary. Using default configuration.") + else: chart_schema["config"] = custom_themewren-ai-service/src/web/v1/services/chart_adjustment.py (1)
201-202
: Potential KeyError when stripping data valuesThe line
chart_result["chart_schema"]["data"]["values"] = []
assumes that both"data"
and"values"
exist in the chart schema. If either key is missing, this will raise a KeyError.Consider using a safer approach:
-chart_result["chart_schema"]["data"]["values"] = [] +chart_result.setdefault("chart_schema", {}).setdefault("data", {})["values"] = []Let's verify if this is actually a concern by checking how chart schemas are structured:
#!/bin/bash # Look for chart schema creation to understand the structure rg -A 5 "chart_schema\[.*\]" --glob "*.py"
🧹 Nitpick comments (1)
wren-ai-service/src/pipelines/indexing/sql_pairs.py (1)
196-197
: Type annotation refinement - removed unnecessary OptionalThe changes to both the
project_id
andexternal_pairs
parameters remove unnecessaryOptional
type hints while maintaining the default values. This is a good practice for type consistency.However, I should note that using an empty dictionary
{}
as a default argument creates a mutable default, which can lead to unexpected behavior if modified. Consider usingField(default_factory=dict)
instead:- async def run( - self, - mdl_str: str, - project_id: str = "", - external_pairs: Dict[str, Any] = {}, - ) -> Dict[str, Any]: + async def run( + self, + mdl_str: str, + project_id: str = "", + external_pairs: Dict[str, Any] = Field(default_factory=dict), + ) -> Dict[str, Any]:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting
📒 Files selected for processing (21)
wren-ai-service/src/pipelines/generation/intent_classification.py
(1 hunks)wren-ai-service/src/pipelines/generation/relationship_recommendation.py
(2 hunks)wren-ai-service/src/pipelines/generation/sql_correction.py
(2 hunks)wren-ai-service/src/pipelines/generation/sql_generation.py
(2 hunks)wren-ai-service/src/pipelines/generation/sql_regeneration.py
(2 hunks)wren-ai-service/src/pipelines/generation/utils/chart.py
(4 hunks)wren-ai-service/src/pipelines/generation/utils/sql.py
(1 hunks)wren-ai-service/src/pipelines/indexing/db_schema.py
(1 hunks)wren-ai-service/src/pipelines/indexing/instructions.py
(4 hunks)wren-ai-service/src/pipelines/indexing/sql_pairs.py
(5 hunks)wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py
(1 hunks)wren-ai-service/src/pipelines/retrieval/historical_question_retrieval.py
(1 hunks)wren-ai-service/src/pipelines/retrieval/instructions.py
(1 hunks)wren-ai-service/src/pipelines/retrieval/sql_executor.py
(1 hunks)wren-ai-service/src/pipelines/retrieval/sql_functions.py
(1 hunks)wren-ai-service/src/pipelines/retrieval/sql_pairs_retrieval.py
(1 hunks)wren-ai-service/src/providers/embedder/litellm.py
(1 hunks)wren-ai-service/src/web/v1/services/__init__.py
(1 hunks)wren-ai-service/src/web/v1/services/chart.py
(8 hunks)wren-ai-service/src/web/v1/services/chart_adjustment.py
(6 hunks)wren-ai-service/src/web/v1/services/question_recommendation.py
(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- wren-ai-service/src/web/v1/services/question_recommendation.py
- wren-ai-service/src/web/v1/services/chart.py
🧰 Additional context used
🧬 Code Graph Analysis (2)
wren-ai-service/src/pipelines/indexing/instructions.py (5)
wren-ai-service/src/pipelines/generation/intent_classification.py (1)
run
(364-386)wren-ai-service/src/pipelines/indexing/db_schema.py (2)
run
(27-59)run
(365-379)wren-ai-service/src/pipelines/indexing/sql_pairs.py (3)
run
(31-49)run
(58-73)run
(193-213)wren-ai-service/src/pipelines/retrieval/instructions.py (2)
run
(23-34)run
(177-187)wren-ai-service/src/web/v1/services/instructions.py (1)
Instruction
(17-22)
wren-ai-service/src/pipelines/indexing/sql_pairs.py (2)
wren-ai-service/src/pipelines/indexing/instructions.py (3)
run
(31-52)run
(61-76)run
(149-164)wren-ai-service/src/pipelines/indexing/__init__.py (3)
run
(25-47)run
(57-72)run
(78-87)
🔇 Additional comments (36)
wren-ai-service/src/pipelines/retrieval/historical_question_retrieval.py (1)
125-125
: Type annotation improvementChanged the type annotation of
historical_question_retrieval_similarity_threshold
fromOptional[float]
tofloat
. This improves type safety by ensuring the parameter must be a float and cannot be None, which aligns with the provided default value.wren-ai-service/src/pipelines/retrieval/sql_executor.py (1)
69-69
: Type annotation improvementChanged the type annotation of
engine_timeout
fromOptional[float]
tofloat
. This matches the default value and ensures type consistency, preventing the parameter from being None.wren-ai-service/src/pipelines/generation/relationship_recommendation.py (2)
4-4
: Import changeReplaced imports to remove unnecessary Optional type import.
188-188
: Type annotation improvementChanged the type annotation of
engine_timeout
fromOptional[float]
tofloat
. This ensures type safety by enforcing that the parameter must be a float, consistent with the default value.wren-ai-service/src/pipelines/generation/intent_classification.py (1)
332-333
: Type annotation improvementChanged the type annotations of
table_retrieval_size
andtable_column_retrieval_size
fromOptional[int]
toint
. This improves type clarity by indicating these parameters must be integers and cannot be None, matching their default values.wren-ai-service/src/pipelines/generation/sql_correction.py (1)
104-104
: LGTM: Type hint refinement improves clarityThe change from
Optional[float]
tofloat
for theengine_timeout
parameter improves type safety while maintaining the same default value. This better reflects how the parameter is actually used in the code (no null checks are performed when using it).wren-ai-service/src/web/v1/services/__init__.py (1)
44-45
: LGTM: Type safety improvement prevents potential bugsChanging
language
andtimezone
from optional to non-optional with defaults improves type safety and prevents potentialNone
dereference errors in methods likeshow_current_time()
which accesstimezone.name
without null checks.wren-ai-service/src/pipelines/retrieval/instructions.py (1)
153-154
: LGTM: Type hint improvements match parameter usageChanging
similarity_threshold
andtop_k
fromOptional[float/int]
to non-optional types better reflects their actual usage in thefiltered_documents
function (lines 92-94) where they're used directly without null checks.wren-ai-service/src/pipelines/indexing/db_schema.py (1)
339-339
: LGTM: Consistent type hinting improvementChanging
column_batch_size
fromOptional[int]
toint
is consistent with other changes in this PR and better reflects the parameter usage in thechunk
function (line 306) where it's passed directly without null checks.wren-ai-service/src/pipelines/generation/sql_generation.py (2)
3-3
: Simplified imports by removing unused OptionalThe Optional import is no longer needed after making the engine_timeout parameter non-optional with a default value.
126-126
: Improved type annotation for engine_timeout parameterChanging from
Optional[float]
to justfloat
with a default value of 30.0 is a good practice. Since there's a default value, this parameter is effectively never optional, making the type annotation more accurate.wren-ai-service/src/pipelines/generation/sql_regeneration.py (2)
3-3
: Simplified imports by removing unused OptionalThe Optional import is no longer needed after updating the engine_timeout parameter type.
138-138
: Improved type annotation for engine_timeout parameterChanging from
Optional[float]
to justfloat
with a default value of 30.0 makes the type annotation more accurate since parameters with default values are never None at runtime.wren-ai-service/src/pipelines/generation/utils/sql.py (1)
34-34
: Enhanced type annotation for timeout parameterSimplified the type annotation from
Optional[float]
tofloat
while keeping the default value of 30.0. This more accurately represents that the parameter will always have a value at runtime.wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py (1)
436-437
: Improved type annotations for retrieval size parametersRemoved
Optional
from the type annotations while keeping the default values, making the annotations more accurate. This is part of a consistent pattern across the codebase to improve type safety.wren-ai-service/src/providers/embedder/litellm.py (1)
165-165
: Appropriate type annotation changeChanging from
Optional[float]
tofloat
with a default value is a good practice since the parameter will never beNone
when it has a default value. This improves type consistency and static analysis.wren-ai-service/src/pipelines/retrieval/sql_functions.py (1)
104-105
: Type annotation improvementGood change removing
Optional
from parameters that have default values. This makes the type system more accurate since these parameters can never beNone
when defaults are provided.wren-ai-service/src/pipelines/retrieval/sql_pairs_retrieval.py (1)
121-122
: Type consistency improvementRemoving
Optional
from parameters with default values strengthens the type system. Since thefiltered_documents
function expects non-optional parameters, this change ensures type consistency throughout the pipeline.wren-ai-service/src/pipelines/indexing/instructions.py (4)
31-31
: Type annotation updated correctlyChanging from
Optional[str]
tostr
with a default empty string maintains the existing behavior while improving type clarity. The conditional at line 34 will still work as intended since an empty string is falsy in Python.
86-86
: Type annotation improvedGood change from
Optional[str]
tostr
with a default empty string, consistent with the pattern in other functions.
104-104
: Type annotation consistencyThis change maintains the function's behavior while improving type annotations. The function passes
project_id
tocleaner.run()
which still accepts an optional parameter, so this change is safe.
152-152
: Type annotation improvementChanging from
Optional[str]
tostr
with a default empty string ensures consistency with other parameter type annotations in the codebase while preserving the behavior.wren-ai-service/src/pipelines/indexing/sql_pairs.py (3)
31-31
: Type annotation refinement - removed unnecessary OptionalThe change from
Optional[str] = ""
to juststr = ""
is appropriate here. Since the parameter has a default value, it doesn't need to be marked as Optional.
107-111
: Type annotation refinement - removed unnecessary OptionalThe change from
Optional[str] = ""
to juststr = ""
forproject_id
parameter is appropriate. This improves type consistency by removing unnecessaryOptional
markers on parameters that have default values.
125-129
: Type annotation refinement - removed unnecessary OptionalThe change from
Optional[str] = ""
to juststr = ""
forproject_id
parameter is appropriate. This follows the pattern of removing unnecessary optionality from parameters with default values.wren-ai-service/src/pipelines/generation/utils/chart.py (4)
46-50
: Enhanced data output includes raw dataThe addition of
"raw_data"
to the return dictionary is a good enhancement, providing access to the full dataset along with the sample data. This is useful for downstream components that might need more than just the sample.
53-61
: Well-implemented chart schema preprocessorThe new
ChartSchemaPreprocessor
component is well implemented. It properly:
- Uses the
@component
decorator and defines output types- Safely removes the "config" key using
pop
with a default value ofNone
to avoid KeyError- Returns the processed schema wrapped in a dictionary with the output name as key
This pattern follows Haystack's component contract correctly.
130-131
: Deprecated chart_type field and simplified typeThe change from
Dict
todict
makes the type hint more consistent with Python's built-in types. The deprecation comment forchart_type
is helpful to signal that this field should not be relied upon in new code.
134-142
: Added model kwargs for standardized response formatThe new
CHART_GENERATION_MODEL_KWARGS
constant provides a standardized JSON schema for the response format, which is very useful for ensuring consistent API responses. Using the model's JSON schema fromChartGenerationResults
directly ensures the schema stays in sync with the model.wren-ai-service/src/web/v1/services/chart_adjustment.py (7)
31-34
: Enhanced request model with more flexible control optionsThe changes to the request model represent a significant improvement:
- Replacing
adjustment_option
with a simpleradjustment_command
string allows for more flexible instructions- Adding
data
parameter enables direct data input without requiring SQL execution- Adding
image_url
parameter enables visual input for chart adjustmentThese additions will make the API more versatile and user-friendly.
37-39
: Improved field definitions to prevent mutable default issuesThe changes here represent good practices:
- Using
Field(default_factory=Configuration)
prevents the mutable default issue where all instances would share the same Configuration object- Removing
Optional
fromremove_data_from_chart_schema
since it has a default value- Adding a deprecation comment for
adjustment_option
is helpful for API transitionThese changes will prevent potential bugs and improve code clarity.
85-85
: Deprecated strict chart type literalsChanging from
Literal[...]
tostr = ""
with a deprecation comment makes the API more flexible while clearly indicating the transition away from fixed chart types.
89-89
: Removed "understanding" statusThe removal of the "understanding" status from the status literals simplifies the state machine for chart adjustment requests. The remaining states provide a clearer progression through the pipeline.
99-100
: Added chart validation and correction configurationThe new parameters provide valuable configuration options:
allow_chart_validation
enables/disables the validation featuremax_chart_correction_retries
controls how many correction attempts are madeThis gives operators control over the validation process and its performance impact.
150-159
: Added flexible data input optionsThe new conditional logic for data handling is well-implemented:
- If data is not provided, execute SQL to fetch it
- If data is provided directly, use it and set a flag
- Track the source of data with
data_provided
for later useThis enhances the API's flexibility while maintaining backward compatibility.
180-243
: Well-structured chart validation and correction retry mechanismThe new validation and correction mechanism is well-implemented:
- Only activates when
allow_chart_validation
is true- Uses a bounded retry loop with clear exit conditions
- Properly handles failure cases after exhausting retries
- Maintains the original behavior (cleaning data) upon success
- Uses safe dictionary access via
.get()
with defaultsThis implementation enhances chart quality while preventing infinite loops and handling errors gracefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
wren-ai-service/src/providers/llm/__init__.py (1)
80-88
:from_function
passesmeta=None
– can violate type contractThe
meta
parameter is expected to be aDict[str, Any]
, yetfrom_function
passesNone
, which could lead to anAttributeError
later when callers try to mutatemeta
.-return cls(content, ChatRole.FUNCTION, name=name, image_url=None, meta=None) +return cls(content, ChatRole.FUNCTION, name=name, image_url=None, meta={})
🧹 Nitpick comments (3)
wren-ai-service/src/providers/llm/__init__.py (3)
224-252
: Simplify content type handling logic inconvert_message_to_openai_format
The function has some duplication in the content handling logic between mixed content, text-only, and image-only cases. Consider refactoring for better maintainability.
def convert_message_to_openai_format(message: ChatMessage) -> Dict[str, str]: """ Convert a message to the format expected by OpenAI's Chat API. See the [API reference](https://platform.openai.com/docs/api-reference/chat/create) for details. :returns: A dictionary with the following key: - `role` - `content` - `name` (optional) """ openai_msg = {"role": message.role.value} - if message.content and hasattr(message, "image_url") and message.image_url: - openai_msg["content"] = [ - {"type": "text", "text": message.content}, - {"type": "image_url", "image_url": {"url": message.image_url}}, - ] - elif message.content: - openai_msg["content"] = message.content - elif hasattr(message, "image_url") and message.image_url: - openai_msg["content"] = [ - {"type": "image_url", "image_url": {"url": message.image_url}} - ] + # Build content based on available data + content_parts = [] + + if message.content: + content_parts.append({"type": "text", "text": message.content}) + + if message.image_url: + content_parts.append({"type": "image_url", "image_url": {"url": message.image_url}}) + + if len(content_parts) > 1: + openai_msg["content"] = content_parts + elif len(content_parts) == 1 and content_parts[0]["type"] == "text": + openai_msg["content"] = message.content + elif len(content_parts) == 1: + openai_msg["content"] = [content_parts[0]] if hasattr(message, "name") and message.name: openai_msg["name"] = message.name
237-247
: Consider removing redundanthasattr
checksSince
image_url
is a defined attribute of theChatMessage
class, usinghasattr(message, "image_url")
is unnecessary and can be replaced with just checking ifmessage.image_url
is truthy.- if message.content and hasattr(message, "image_url") and message.image_url: + if message.content and message.image_url: openai_msg["content"] = [ {"type": "text", "text": message.content}, {"type": "image_url", "image_url": {"url": message.image_url}}, ] elif message.content: openai_msg["content"] = message.content -elif hasattr(message, "image_url") and message.image_url: +elif message.image_url: openai_msg["content"] = [ {"type": "image_url", "image_url": {"url": message.image_url}} ]
249-250
: Remove redundanthasattr
check for nameSimilar to the
image_url
attribute,name
is already defined in theChatMessage
class, making thehasattr
check unnecessary.- if hasattr(message, "name") and message.name: + if message.name: openai_msg["name"] = message.name
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting
📒 Files selected for processing (2)
docker/config.example.yaml
(1 hunks)wren-ai-service/src/providers/llm/__init__.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- docker/config.example.yaml
🔇 Additional comments (3)
wren-ai-service/src/providers/llm/__init__.py (3)
18-34
: Great job with the ChatMessage dataclass structureYou've added proper defaults for optional fields
name
andimage_url
, addressing the previous issue flagged in earlier reviews. Usingfield(default_factory=dict)
for themeta
attribute is the correct approach to avoid the mutable default issue.
9-16
: Good implementation of ChatRole enumUsing
str, Enum
as base classes is a good choice for string-based enums. The included documentation is helpful and the defined roles cover the standard OpenAI chat roles.
117-130
: Well-structured StreamingChunk dataclassThe
StreamingChunk
dataclass is concise and well-documented, with proper default handling for themeta
dictionary usingfield(default_factory=dict)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
wren-ai-service/src/pipelines/generation/chart_adjustment.py (1)
155-156
: Theme loading implementation may cause issues in package contextsThe theme loading uses a hardcoded relative path, which could cause issues when the code is packaged or run from different directories.
wren-ai-service/src/pipelines/generation/utils/chart.py (1)
13-15
: 🛠️ Refactor suggestionConsider more robust theme file loading
The theme loading function uses a hardcoded relative path which can cause issues in different environments.
Consider using the importlib.resources approach for more robust path resolution:
-def load_custom_theme() -> Dict[str, Any]: - with open("src/pipelines/generation/utils/theme_powerbi.json", "r") as f: - return orjson.loads(f.read()) +def load_custom_theme() -> Dict[str, Any]: + from importlib import resources + from src.pipelines.generation import utils + + with resources.open_text(utils, "theme_powerbi.json") as f: + return orjson.loads(f.read())
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
wren-ai-service/src/pipelines/generation/chart_adjustment.py
(8 hunks)wren-ai-service/src/pipelines/generation/chart_correction.py
(1 hunks)wren-ai-service/src/pipelines/generation/chart_generation.py
(7 hunks)wren-ai-service/src/pipelines/generation/utils/chart.py
(4 hunks)wren-ai-service/src/web/v1/services/chart.py
(8 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- wren-ai-service/src/web/v1/services/chart.py
- wren-ai-service/src/pipelines/generation/chart_correction.py
🧰 Additional context used
🧬 Code Graph Analysis (2)
wren-ai-service/src/pipelines/generation/chart_generation.py (1)
wren-ai-service/src/pipelines/generation/utils/chart.py (6)
ChartDataPreprocessor
(19-50)ChartGenerationPostProcessor
(64-125)load_custom_theme
(13-15)run
(24-50)run
(58-60)run
(68-125)
wren-ai-service/src/pipelines/generation/chart_adjustment.py (1)
wren-ai-service/src/pipelines/generation/utils/chart.py (7)
ChartDataPreprocessor
(19-50)ChartGenerationPostProcessor
(64-125)ChartSchemaPreprocessor
(54-60)load_custom_theme
(13-15)run
(24-50)run
(58-60)run
(68-125)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: pytest
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
🔇 Additional comments (15)
wren-ai-service/src/pipelines/generation/chart_generation.py (4)
3-3
: Good refactoring of imports to support new functionalityThe imports have been correctly updated to include the necessary components for flexible chart generation, such as general instructions, model parameters, and the theme loading function.
Also applies to: 13-17
26-32
: Improved system prompt with clearer instructionsThe system prompt has been enhanced with better instructions and now includes the general chart generation instructions as a reusable component. Adding the requirement that reasoning language should match the user's language improves the user experience.
93-107
: Enhanced post-processing with conditional data handling and themingThe post-processing step has been improved to:
- Conditionally use either raw data or sample data based on the data_provided flag
- Apply a custom theme to the chart schema
This makes the chart generation more flexible and visually consistent.
146-157
: Improved pipeline run method with data_provided flagThe run method has been updated to use the data_provided flag instead of remove_data_from_chart_schema, which better reflects its purpose of determining whether to use the full dataset or a sample.
wren-ai-service/src/pipelines/generation/chart_adjustment.py (6)
13-19
: Good refactoring of imports to support new functionalityThe imports have been correctly updated to include necessary components for flexible chart adjustment, including the chart schema preprocessor and theme loader.
27-37
: Enhanced system prompt with image support and clear instructionsThe system prompt has been significantly improved to:
- Include general chart generation instructions
- Support image-based chart generation references
- Ensure reasoning language matches user language
These enhancements make the chart adjustment more versatile and user-friendly.
74-80
: Good addition of chart schema preprocessingAdding a dedicated preprocessing step for chart schemas improves modularity and ensures consistent handling of chart configurations before adjustment. This component properly implements the Haystack component pattern.
107-111
: Added support for image references in chart adjustmentThe generate_chart_adjustment function now accepts an image_url parameter, enabling image-based chart adjustments. This significantly enhances the capability of the chart adjustment pipeline.
147-151
: Improved pipeline component configurationThe pipeline now:
- Uses the shared CHART_GENERATION_MODEL_KWARGS for consistent model configuration
- Includes the ChartSchemaPreprocessor component for proper schema handling
This standardization helps maintain consistency across different chart-related pipelines.
167-173
: Enhanced pipeline run method with flexible parametersThe run method has been enhanced with:
- A simpler adjustment_command parameter instead of a structured object
- Support for data_provided flag for conditional data handling
- Support for image_url for image-based chart adjustments
These changes make the pipeline more versatile and easier to use.
Also applies to: 185-187
wren-ai-service/src/pipelines/generation/utils/chart.py (5)
53-61
: Well implemented ChartSchemaPreprocessorThe ChartSchemaPreprocessor correctly implements the Haystack component pattern and safely removes the config key using pop() with a default value to avoid KeyError exceptions.
70-73
: Improved ChartGenerationPostProcessor with theme supportThe post-processor has been enhanced to:
- Correctly specify replies as list[str] for proper type hinting
- Accept an optional custom_theme parameter
- Apply the theme to the chart schema when available
These improvements make the post-processor more flexible and maintain visual consistency.
Also applies to: 88-90
128-136
: Good extraction of general instructions as a constantExtracting the general chart generation instructions into a reusable constant improves code organization and ensures consistency across different pipelines.
139-143
: Simplified ChartGenerationResults modelThe ChartGenerationResults model has been simplified to use a generic dict for chart_schema instead of a strict schema, which increases flexibility. The chart_type field is appropriately marked as deprecated.
145-153
: Well-structured model configurationThe CHART_GENERATION_MODEL_KWARGS constant provides a standardized configuration for the model response format, ensuring consistent JSON schema validation across pipelines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (2)
wren-ai-service/src/pipelines/generation/utils/chart.py (2)
74-89
: 🛠️ Refactor suggestionHandle potential IndexError with empty replies
The code assumes
replies
list is non-empty but doesn't verify this before accessingreplies[0]
.Add a check to prevent IndexError:
def run( self, replies: list[str], sample_data: list[dict], custom_theme: Optional[dict[str, Any]] = None, ): try: + if not replies: + logger.warning("Received empty replies list") + return { + "results": { + "chart_schema": {}, + "reasoning": "", + "chart_type": "", + } + } generation_result = orjson.loads(replies[0])
106-125
: 🛠️ Refactor suggestionRemove or update ValidationError handling
The exception handler for
ValidationError
is present, but the code no longer performs schema validation withjsonschema.validate
. This inconsistency should be addressed.Either remove the unused ValidationError exception handling or update it to reflect the current validation approach:
- except ValidationError as e: - logger.exception(f"Vega-lite schema is not valid: {e}") - - return { - "results": { - "chart_schema": {}, - "reasoning": "", - "chart_type": "", - } - }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
wren-ai-service/src/pipelines/generation/utils/chart.py
(4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: pytest
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
🔇 Additional comments (6)
wren-ai-service/src/pipelines/generation/utils/chart.py (6)
47-47
: Good addition of raw data to the return valueIncluding the complete dataset provides downstream components with access to the full data when needed, enhancing flexibility.
53-61
: Well-implemented preprocessorThe
ChartSchemaPreprocessor
is correctly implemented with proper output typing and safe dictionary key removal.
70-72
: Properly fixed type hints and added custom theme supportThe type hint for
replies
correctly reflects its list nature, and the optional custom theme parameter enables theme customization.
128-139
: Clear and concise chart generation instructionsThe updated instructions are well-structured and provide clear guidance on what should be included/omitted in the Vega-Lite schema.
142-145
: Good simplification of ChartGenerationResults modelUsing a generic dictionary for chart_schema provides more flexibility, and the deprecation notice for chart_type is helpful for future maintenance.
148-156
: Well-structured model configurationThe CHART_GENERATION_MODEL_KWARGS dictionary provides a clean way to specify the JSON schema response format using the Pydantic model.
Summary by CodeRabbit
New Features
Enhancements
Bug Fixes
Documentation
Chores