Omit state from the Assist LLM prompts #141034

allenporter · 2025-03-21T02:44:08Z

Proposed change

Omit state from Assist LLM prompts, instead will rely on the GetHomeState tool for getting the current device state, when needed. This removes the need for a separate stateless API in MCP.

Quality

All changes to tool calling are all within confidence interval because all of the commands in our eval do not require the current state of the home to issue a command. In manual testing, models are able to use the get_home_state tool to get the current state when needed by a question (e.g. what is the temperature in the living room). These types of queries are not currently part of the eval set.

On the assist-mini dataset we see that all models are still within confidence interval.

- model_id: claude-3-haiku
  good_percent: 93.8%
  good: 45
  total: 48
- model_id: gemini-1.5-flash
  good_percent: 95.9%
  good: 47
  total: 49
- model_id: gpt-4o-mini
  good_percent: 98.0%
  good: 48
  total: 49
- model_id: llama3.1
  good_percent: 79.6%
  good: 39
  total: 49
- model_id: qwen2.5
  good_percent: 93.9%
  good: 46
  total: 49

Token Stats

Using token stats computed by #141118

assist

We see a 49% reduction in token count, with quality increased though still in confidence interval. (The "before"was lower and after run was higher and current leader board has accuracy at 91.2% with a CI of +/-6.2, so fairly wide and both the before and after as basically within range)
Before:

- model_id: gemini-1.5-flash
  good_percent: 85.0%
  good: 68
  total: 80
  token_avg:
    input_tokens: 2689.07
    cached_input_tokens: 0.0
    output_tokens: 21.52
    n_count: 151
  token_sum:
    input_tokens: 215126
    cached_input_tokens: 0
    output_tokens: 1722
    n_count: 80
  token_input_cache_ratio: 0.0

After:

---
- model_id: gemini-1.5-flash
  good_percent: 91.2%
  good: 73
  total: 80
  token_avg:
    input_tokens: 1804.8
    cached_input_tokens: 0.0
    output_tokens: 21.95
    n_count: 154
  token_sum:
    input_tokens: 144384
    cached_input_tokens: 0
    output_tokens: 1756
    n_count: 80
  token_input_cache_ratio: 0.0

assist-mini

We see a 16% reduction in token count with quality still in the confidence interval:

Before:

- model_id: gemini-1.5-flash
  good_percent: 98.0%
  good: 48
  total: 49
  token_avg:
    input_tokens: 879.1
    cached_input_tokens: 0.0
    output_tokens: 20.94
    n_count: 97
  token_sum:
    input_tokens: 43076
    cached_input_tokens: 0
    output_tokens: 1026
    n_count: 49
  token_input_cache_ratio: 0.0

After:

- model_id: gemini-1.5-flash
  good_percent: 95.9%
  good: 47
  total: 49
  token_avg:
    input_tokens: 753.86
    cached_input_tokens: 0.0
    output_tokens: 20.88
    n_count: 97
  token_sum:
    input_tokens: 36939
    cached_input_tokens: 0
    output_tokens: 1023
    n_count: 49
  token_input_cache_ratio: 0.0

Type of change

Dependency upgrade
Bugfix (non-breaking change which fixes an issue)
New integration (thank you!)
New feature (which adds functionality to an existing integration)
Deprecation (breaking change to happen in the future)
Breaking change (fix/feature causing existing functionality to break)
Code quality improvements to existing code or addition of tests

Additional information

This PR fixes or closes issue: fixes #
This PR is related to issue:
Link to documentation pull request:
Link to developer documentation pull request:
Link to frontend pull request:

Checklist

The code change is tested and works locally.
Local tests pass. Your PR cannot be merged unless tests pass
There is no commented out code in this PR.
I have followed the development checklist
I have followed the perfect PR recommendations
The code has been formatted using Ruff (ruff format homeassistant tests)
Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

Documentation added/updated for www.home-assistant.io

If the code communicates with devices, web services, or third-party tools:

The manifest file has all fields filled out correctly.
Updated and included derived files by running: python3 -m script.hassfest.
New or updated dependencies have been added to requirements_all.txt.
Updated by running python3 -m script.gen_requirements_all.
For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.

To help with the load of incoming pull requests:

I have reviewed two other open pull requests in this repository.

IvanLH · 2025-03-22T18:44:18Z

homeassistant/helpers/llm.py

@@ -316,7 +316,7 @@ async def async_get_api_instance(self, llm_context: LLMContext) -> APIInstance:
        """Return the instance of the API."""
        if llm_context.assistant:
            exposed_entities: dict | None = _get_exposed_entities(
-                self.hass, llm_context.assistant
+                self.hass, llm_context.assistant, include_state=False


Am I correct in my understanding that the change affects only MCP server right now, since it looks like it's the only one using this flag?

No, all are changed.
(1) All prompts now have no state
(2) MCP now uses the assist API only
(3) State is provided via new get_home_state tools via #140971

allenporter added 2 commits March 21, 2025 02:38

Omit state from the Assist LLM prompts

8a545dc

Add back the stateful prompt

1f806db

allenporter requested a review from a team as a code owner March 21, 2025 02:44

home-assistant bot added bugfix cla-signed core integration: mcp_server labels Mar 21, 2025

allenporter marked this pull request as draft March 21, 2025 02:44

home-assistant bot added by-code-owner Quality Scale: silver labels Mar 21, 2025

Merge branch 'dev' into assist-stateless

d885169

allenporter marked this pull request as ready for review March 22, 2025 16:26

allenporter mentioned this pull request Mar 22, 2025

Add Gemini/OpenAI token stats to the conversation trace #141118

Merged

19 tasks

IvanLH reviewed Mar 22, 2025

View reviewed changes

balloob approved these changes Mar 22, 2025

View reviewed changes

balloob merged commit 4e2dfba into home-assistant:dev Mar 22, 2025
48 checks passed

github-actions bot locked and limited conversation to collaborators Mar 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Omit state from the Assist LLM prompts #141034

Omit state from the Assist LLM prompts #141034

allenporter commented Mar 21, 2025 •

edited

Loading

IvanLH Mar 22, 2025

allenporter Mar 22, 2025

Omit state from the Assist LLM prompts #141034

Omit state from the Assist LLM prompts #141034

Conversation

allenporter commented Mar 21, 2025 • edited Loading

Proposed change

Quality

Token Stats

assist

assist-mini

Type of change

Additional information

Checklist

IvanLH Mar 22, 2025

Choose a reason for hiding this comment

allenporter Mar 22, 2025

Choose a reason for hiding this comment

allenporter commented Mar 21, 2025 •

edited

Loading