Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more prometheus labels #128607

Draft
wants to merge 55 commits into
base: dev
Choose a base branch
from

Conversation

jzucker2
Copy link
Contributor

@jzucker2 jzucker2 commented Oct 17, 2024

Breaking change

Proposed change

This is a first pass right now to make the prometheus labels more dynamic (following up from #113849)

This is an example from some updated unit tests:

# HELP state_change_total The number of state changes
# TYPE state_change_total counter
state_change_total{area="Test Area",device="Test Device",domain="sensor",entity="sensor.outside_temperature_device",friendly_name="Outside Temperature Device",object_id="outside_temperature_device",platform="test"} 1.0
# HELP state_change_created The number of state changes
# TYPE state_change_created gauge
state_change_created{area="Test Area",device="Test Device",domain="sensor",entity="sensor.outside_humidity_device",friendly_name="Outside Humidity Device",object_id="outside_humidity_device",platform="test"} 1.7325890955666492e+09
# HELP entity_available Entity is available (not in the unavailable or unknown state)
# TYPE entity_available gauge
entity_available{area="Test Area",device="Test Climate Device",domain="climate",entity="climate.ecobee_device",friendly_name="Ecobee Device",object_id="ecobee_device",platform="test"} 1.0
# HELP last_updated_time_seconds The last_updated timestamp
# TYPE last_updated_time_seconds gauge
last_updated_time_seconds{area="Test Area",device="Test Climate Device",domain="climate",entity="climate.ecobee_device",friendly_name="Ecobee Device",object_id="ecobee_device",platform="test"} 1.732589095580595e+09
# HELP sensor_temperature_celsius Sensor data measured in celsius
# TYPE sensor_temperature_celsius gauge
sensor_temperature_celsius{area="",device="",domain="sensor",entity="sensor.outside_temperature",friendly_name="Outside Temperature",object_id="outside_temperature",platform="test"} 15.6
sensor_temperature_celsius{area="Other Test Area",device="Other Test Device",domain="sensor",entity="sensor.outside_temperature_other_device",friendly_name="Outside Temperature Other Device",object_id="outside_temperature_other_device",platform="test"} 33.0
# HELP battery_level_percent Battery level as a percentage of its capacity
# TYPE battery_level_percent gauge
battery_level_percent{area="Other Test Area",device="Other Test Device",domain="sensor",entity="sensor.outside_temperature_other_device",friendly_name="Outside Temperature Other Device",object_id="outside_temperature_other_device",platform="test"} 21.0
# HELP sensor_humidity_percent Sensor data measured in percent
# TYPE sensor_humidity_percent gauge
sensor_humidity_percent{area="Test Area",device="Test Device",domain="sensor",entity="sensor.outside_humidity_device",friendly_name="Outside Humidity Device",object_id="outside_humidity_device",platform="test"} 56.0
# HELP sensor_state State of the sensor
# TYPE sensor_state gauge
sensor_state{area="Test Area",device="Test Device",domain="sensor",entity="sensor.trend_gradient_device",friendly_name="Trend Gradient Device",object_id="trend_gradient_device",platform="test"} 0.903
# HELP climate_target_temperature_celsius Target temperature in degrees Celsius
# TYPE climate_target_temperature_celsius gauge
climate_target_temperature_celsius{area="Test Area",device="Test Climate Device",domain="climate",entity="climate.ecobee_device",friendly_name="Ecobee Device",object_id="ecobee_device",platform="test"} 17.0
# HELP climate_current_temperature_celsius Current temperature in degrees Celsius
# TYPE climate_current_temperature_celsius gauge
climate_current_temperature_celsius{area="Test Area",device="Test Climate Device",domain="climate",entity="climate.ecobee_device",friendly_name="Ecobee Device",object_id="ecobee_device",platform="test"} 24.0
# HELP climate_action HVAC action
# TYPE climate_action gauge
climate_action{action="cooling",area="Test Area",device="Test Climate Device",domain="climate",entity="climate.ecobee_device",friendly_name="Ecobee Device",object_id="ecobee_device",platform="test"} 1.0
# HELP climate_fan_mode Fan mode enum
# TYPE climate_fan_mode gauge
climate_fan_mode{area="Test Area",device="Test Climate Device",domain="climate",entity="climate.ecobee_device",friendly_name="Ecobee Device",mode="auto",object_id="ecobee_device",platform="test"} 1.0
climate_fan_mode{area="Test Area",device="Test Climate Device",domain="climate",entity="climate.ecobee_device",friendly_name="Ecobee Device",mode="on",object_id="ecobee_device",platform="test"} 0.0

I've also got my old frankenstein's monster branch from my original attempt here jzucker2#1

Useful recent change: #133219

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Deprecation (breaking change to happen in the future)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

  • This PR fixes or closes issue: fixes #
  • This PR is related to issue:
  • Link to documentation pull request:

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • I have followed the perfect PR recommendations
  • The code has been formatted using Ruff (ruff format homeassistant tests)
  • Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.

To help with the load of incoming pull requests:

@home-assistant
Copy link

Hey there @knyar, mind taking a look at this pull request as it has been labeled with an integration (prometheus) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of prometheus can trigger bot actions by commenting:

  • @home-assistant close Closes the pull request.
  • @home-assistant rename Awesome new title Renames the pull request.
  • @home-assistant reopen Reopen the pull request.
  • @home-assistant unassign prometheus Removes the current integration label and assignees on the pull request, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the pull request.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the pull request.

@jzucker2 jzucker2 marked this pull request as draft October 17, 2024 21:52
@@ -90,6 +92,7 @@
CONF_FILTER = "filter"
CONF_REQUIRES_AUTH = "requires_auth"
CONF_PROM_NAMESPACE = "namespace"
CONF_INCLUDE_EXTRA_LABELS = "include_extra_labels"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding labels is backwards compatible, so I don't think we need a new configuration parameter for this.

@jzucker2
Copy link
Contributor Author

I did a major pass to add test cases for device and area labels for the majority of the existing tests.

I still need to do another pass to add tests for things like:

  • Changing a device/area id
  • Changing a device/area name
  • Deleting/adding a device
  • Deleting/adding an area

Greatly increases the surface area but it will be really nice to sort metrics more easily after this.

@jantman
Copy link
Contributor

jantman commented Dec 31, 2024

This looks extremely useful, thanks so much! Any likelihood of getting the device_class attribute added to labels? I have a whole bunch of Shelly devices that expose various binary sensors for overheating, over voltage, over current, etc. with a device_class of problem. It would be really nice to be able to create an alert rule in prometheus that just triggers if anything with device_class="problem" changes to an on/1 state.

@jzucker2
Copy link
Contributor Author

This looks extremely useful, thanks so much! Any likelihood of getting the device_class attribute added to labels? I have a whole bunch of Shelly devices that expose various binary sensors for overheating, over voltage, over current, etc. with a device_class of problem. It would be really nice to be able to create an alert rule in prometheus that just triggers if anything with device_class="problem" changes to an on/1 state.

Definitely interested in doing that. My first attempt a year ago included that. I think it's likely at this point I won't be able to include that on this pass though and that will be part of a follow up PR. I really want to ship a basic set of additional labels in January and then start adding a few more after.

I'm currently writing something on the order of 500 lines of test changes for every 1 line of app logic change. And that's a little bit de-motivating. So I'm wary of adding too many logic changes to a single PR at this point. But I think that's very reasonable and achievable as an end goal.

@knyar
Copy link
Contributor

knyar commented Jan 24, 2025

This looks extremely useful, thanks so much! Any likelihood of getting the device_class attribute added to labels?

I believe at the moment device class is already used to generate metric names for sensors:

def _sensor_attribute_metric(state: State, unit: str | None) -> str | None:
"""Get metric based on device class attribute."""
metric = state.attributes.get(ATTR_DEVICE_CLASS)
if metric is not None:
return f"sensor_{metric}_{unit}"
return None

Is that not working for your use case? Or do you need device class propagated to Prometheus for entities that are not sensors?

@cedi
Copy link

cedi commented Jan 24, 2025

Hey @jzucker2 let me know how I can help you with the test cases. I really appreciate the work you put in here and I'd like to help you getting this across the finish line :)

@jzucker2
Copy link
Contributor Author

Hey @jzucker2 let me know how I can help you with the test cases. I really appreciate the work you put in here and I'd like to help you getting this across the finish line :)

Sorry! I missed this! I'd love help with the test cases. I will clean this up this week and merge in latest dev and let's discuss how to handle the test cases?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants