Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Omit state from the Assist LLM prompts #141034

Merged
merged 3 commits into from
Mar 22, 2025

Conversation

allenporter
Copy link
Contributor

@allenporter allenporter commented Mar 21, 2025

Proposed change

Omit state from Assist LLM prompts, instead will rely on the GetHomeState tool for getting the current device state, when needed. This removes the need for a separate stateless API in MCP.

Quality

All changes to tool calling are all within confidence interval because all of the commands in our eval do not require the current state of the home to issue a command. In manual testing, models are able to use the get_home_state tool to get the current state when needed by a question (e.g. what is the temperature in the living room). These types of queries are not currently part of the eval set.

On the assist-mini dataset we see that all models are still within confidence interval.

- model_id: claude-3-haiku
  good_percent: 93.8%
  good: 45
  total: 48
- model_id: gemini-1.5-flash
  good_percent: 95.9%
  good: 47
  total: 49
- model_id: gpt-4o-mini
  good_percent: 98.0%
  good: 48
  total: 49
- model_id: llama3.1
  good_percent: 79.6%
  good: 39
  total: 49
- model_id: qwen2.5
  good_percent: 93.9%
  good: 46
  total: 49

Token Stats

Using token stats computed by #141118

assist

We see a 49% reduction in token count, with quality increased though still in confidence interval. (The "before"was lower and after run was higher and current leader board has accuracy at 91.2% with a CI of +/-6.2, so fairly wide and both the before and after as basically within range)
Before:

- model_id: gemini-1.5-flash
  good_percent: 85.0%
  good: 68
  total: 80
  token_avg:
    input_tokens: 2689.07
    cached_input_tokens: 0.0
    output_tokens: 21.52
    n_count: 151
  token_sum:
    input_tokens: 215126
    cached_input_tokens: 0
    output_tokens: 1722
    n_count: 80
  token_input_cache_ratio: 0.0

After:

---
- model_id: gemini-1.5-flash
  good_percent: 91.2%
  good: 73
  total: 80
  token_avg:
    input_tokens: 1804.8
    cached_input_tokens: 0.0
    output_tokens: 21.95
    n_count: 154
  token_sum:
    input_tokens: 144384
    cached_input_tokens: 0
    output_tokens: 1756
    n_count: 80
  token_input_cache_ratio: 0.0

assist-mini

We see a 16% reduction in token count with quality still in the confidence interval:

Before:

- model_id: gemini-1.5-flash
  good_percent: 98.0%
  good: 48
  total: 49
  token_avg:
    input_tokens: 879.1
    cached_input_tokens: 0.0
    output_tokens: 20.94
    n_count: 97
  token_sum:
    input_tokens: 43076
    cached_input_tokens: 0
    output_tokens: 1026
    n_count: 49
  token_input_cache_ratio: 0.0

After:

- model_id: gemini-1.5-flash
  good_percent: 95.9%
  good: 47
  total: 49
  token_avg:
    input_tokens: 753.86
    cached_input_tokens: 0.0
    output_tokens: 20.88
    n_count: 97
  token_sum:
    input_tokens: 36939
    cached_input_tokens: 0
    output_tokens: 1023
    n_count: 49
  token_input_cache_ratio: 0.0

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Deprecation (breaking change to happen in the future)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

  • This PR fixes or closes issue: fixes #
  • This PR is related to issue:
  • Link to documentation pull request:
  • Link to developer documentation pull request:
  • Link to frontend pull request:

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • I have followed the perfect PR recommendations
  • The code has been formatted using Ruff (ruff format homeassistant tests)
  • Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.

To help with the load of incoming pull requests:

@allenporter allenporter marked this pull request as ready for review March 22, 2025 16:26
@@ -316,7 +316,7 @@ async def async_get_api_instance(self, llm_context: LLMContext) -> APIInstance:
"""Return the instance of the API."""
if llm_context.assistant:
exposed_entities: dict | None = _get_exposed_entities(
self.hass, llm_context.assistant
self.hass, llm_context.assistant, include_state=False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct in my understanding that the change affects only MCP server right now, since it looks like it's the only one using this flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, all are changed.
(1) All prompts now have no state
(2) MCP now uses the assist API only
(3) State is provided via new get_home_state tools via #140971

@balloob balloob merged commit 4e2dfba into home-assistant:dev Mar 22, 2025
48 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Mar 23, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants