Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sensor for unevenly spaced rolling time series average (EWMA exponentially weighted moving average) #115407

Closed
maia opened this issue Apr 11, 2024 · 10 comments

Comments

@maia
Copy link

maia commented Apr 11, 2024

The problem

I have a number of temperature sensors (BLE from Govee) that only broadcast new temperatures. So if the temperature does not change at all for e.g. 15 minutes, it does not broadcast anything. This causes a problem with e.g. the average_step sensor.

A statistics sensor should be configurable to not only use the values in the specified time frame but also the last known value, even if it is outside of the time frame. This would help when creating any sensor that ideally should never be unavailable. For example a heating system that is based on the outdoor temperature will cause issues if the outdoor temperature is unavailable.

In case this is not ideal for e.g. the average_step sensor I suggest to create a sensor for irregular time series, which probably is of the exponential weighted average type. There's lots of papers explaining these, e.g:

Eckner (2019): Algorithms for Unevenly Spaced Time Series: Moving Averages and Other Rolling Operators
http://eckner.com/papers/Algorithms%20for%20Unevenly%20Spaced%20Time%20Series.pdf

Also maybe this is of use in the implementation: https://stackoverflow.com/questions/56956832/fast-ema-calculation-on-large-dataset-with-irregular-time-intervals

Here's an example for the issue:

- platform: statistics
  name: "Schlafzimmer average temperature"
  entity_id: sensor.schlafzimmer_govee_temperatursensor_temperature
  state_characteristic: average_step
  max_age:
    minutes: 5

average_step

What version of Home Assistant Core has the issue?

core-2024.4.2

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

statistics

Link to integration documentation on our website

https://www.home-assistant.io/integrations/statistics/

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

No response

@home-assistant
Copy link

Hey there @ThomDietrich, mind taking a look at this issue as it has been labeled with an integration (statistics) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of statistics can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Renames the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign statistics Removes the current integration label and assignees on the issue, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


statistics documentation
statistics source
(message by IssueLinks)

@ThomDietrich
Copy link
Contributor

ThomDietrich commented Apr 11, 2024

Hello @maia, thanks for reporting this improvement idea! I believe your wish was already implemented by @brenank in #88655

Just fyi, I didn't have the time to continue development of the statistics component the last 12 months. This topic has been discussed in different forms and I agree that multiple improvements to that end would be great.
Please be aware that the solution depends on the nature of the source sensor, the characteristic used, and the use case for your statistics sensor. Off the top of my head I think there are at least three improvements that could be implemented (all of them configurable and as resource conscious as possible):

  1. Additionally listen for the new state_reported event https://developers.home-assistant.io/blog/2024/03/20/state_reported_timestamp
  2. Use the last known value, even outside the configured max_age
  3. A new user-callable service that forces the current state of the source sensor into the statistics sensor

For example a heating system that is based on the outdoor temperature will cause issues if the outdoor temperature is unavailable.

Disagree. If a source sensor is unavailable, the statistics sensor must be unavailable as well. Proper home automation logic should not unknowingly act on incorrect or outdated data. Likewise a statistics sensor shouldn't wrongfully present a value which is in all likelihood incorrect. Your heating control should be able to cope with an unavailable temperature sensor. You could either fall back to a secondary temperature sensor or go into a failsafe mode.
If you as the user are aware of these risks and for your specific usecase it still makes sense to rely on the latest sensor value, this is beyond the scope of the statistics sensor and you could define a template sensor to fulfill your need.
Happy to discuss.

It didn't become clear to me how your first part of the message relates to the topic of exponentially weighted moving average. Could you reiterate? Fyi I did consider its implementation in the past (theoretically, thanks for sharing the links with implementation ideas) but 1. couldn't think of a strong use case and 2. I wasn't sure which parameters to provide and how. Overall it felt too scientific for the typical HA use cases.

@maia
Copy link
Author

maia commented Apr 12, 2024

@ThomDietrich Thanks a lot for your detailed reply! I completely understand that the amount of available time for such projects isn't constant and occasionally too many other things have priority in life. I hope you'll find time to work on the sensor soon (and all your other endeavours).

Regarding the issue of the statistics sensor switching to unavailable when there's no data point within the max age here's a specific use case:

I use Versatile Thermostat to control the valve of my radiators (TRV). It relies on two inputs: the outdoor temperature and the room temperature and will adjust either the valve opening percentage or the target temperature at the TRV (depending on the model) based on these two values in comparison with the desired room temperature. Also it attempts to detect open windows by looking at the temperature change in the last minutes (and if the window seems to be open, it will turn off the TRV until the room temperature is stable again).

So ideally I can provide two values that have as little noise as possible and never are unavailable – because when either measurement is unavailable, the TRV is put in failsafe mode (=turned off). For the outdoor temperature I use the average of four sensors: two local outdoor sensors (one via Thread, one via Zigbee) and two weather providers, I'm using https://github.com/Limych/ha-average to calculate this average of multiple sensors (another thing I'd put on the wishlist for the statistics sensor).

Regarding the room temperature I have the issue of a sensor that is too "nervous" and would really benefit from being smoothed. But as it is a BLE sensor it attempts to save energy by not broadcasting unchanged values. So sometimes it does not report anything for maybe half an hour:

Bildschirmfoto 2024-04-12 um 09 43 32

As for the exponentially weighted moving average: Even high quality sensors produce noise. A single observation is rarely the exact description of reality. It can therefore improve data quality to not trust the "latest" measurement but rather try to identify the most likely reality within the recent measurements. Like a linear regression calculates the line of best fit. Or like the Western Electric Rules were used in the 1960s to identify stable trends and sudden changes in trend. Using a exponentially weighted moving average model (ARIMA, but there probably are similarly useful models) gives most weight to the most recent measurements, but does not cut off suddenly at the end of a specified time frame. This allows to quickly react to rapid changes but also to not freak out when there aren't many measurements in the time frame. It is also easy to handle as one only needs to save the timestamp and value of the last calculated value and not the entire series of the window.

When using a EWMA model for a room temperature sensor it can be translated into "10 minutes ago the sensor measured 21.0°C, 5 minutes ago it also measured 21.0°C, 1 minute ago it measured 20.0°C, now it measured 20.5°C, so what's the most likely current temperature? It's probably not 20.25°C but rather slightly higher, as ten and five minutes ago we had higher measurements and they might still have a little bit of influence, also the measurement of 20.0°C might have been noise."

EDIT: This might be of interest:

@kepstin
Copy link
Contributor

kepstin commented Apr 12, 2024

I think the current calculation methods available in the statistics integration are fine - there's some reasonable moving window average methods which correctly handle irregular intervals. EWMA would be a nice addition, though. The algorithm itself isn't the problem, there's other issues in play. An implementation of EWMA would still suffer from many of those issues.

To re-state those issues:

  • statistics does not yet support the recently added "state_report" events Add State.last_reported #113511, so if the sensor has a new value reported but the value is unchanged from the previous value, statistics does not get updated.
  • statistics only updates when the sensor it is referencing gets updated. Therefore it does not generate interpolated values while waiting for the sensor to report a new value (the suggested solution is a service that can be used to trigger a new calculation manually, which could e.g. be run by an automation).
  • When using max_age, the sensor value outside the time window of but closest to the window is discarded, even if that sensor value might be needed to correctly interpolate values for average calculation. (This is mentioned in Statistics sensor doesn't handle zero values correctly #67627 (comment)) This leads to "jumps" in the value when an old sample exits the window. This wouldn't be an issue with EWMH, since that algorithm doesn't require storing a window of samples, but it's a problem with the existing average modes.

@kepstin
Copy link
Contributor

kepstin commented Apr 14, 2024

Actually, come to think of it, since the EWMA doesn't require storing a bunch of past state, it can be implemented easily in a template sensor. I've done that here: https://gist.github.com/kepstin/7eb43eeabf97348a256fb33fd4d85a57 using the algorithm linked earlier about how to implement EWMA for unevenly spaced data.

It works well, and you can use arbitrary triggers to decide when it should update.

@maia
Copy link
Author

maia commented Apr 14, 2024

Actually, come to think of it, since the EWMA doesn't require storing a bunch of past state, it can be implemented easily in a template sensor. I've done that here: https://gist.github.com/kepstin/7eb43eeabf97348a256fb33fd4d85a57 using the algorithm linked earlier about how to implement EWMA for unevenly spaced data.

It works well, and you can use arbitrary triggers to decide when it should update.

Oh wow, that's a great solution! Of course it would be ideal if people could use EWMA within the statistics sensor without having to write their own jinja, but your suggestion is certainly of help!

As for the issues you stated, I understand that these are independent and will need some attention somewhere in the future.

@maia
Copy link
Author

maia commented May 12, 2024

@kepstin As a followup: What should the following do? It seems that it currently updates the EWMA on every state_changed event, additionally to the cycle once per minute. Initially I assumed this should just identify the moments when an entity goes unavailable.

https://gist.github.com/kepstin/7eb43eeabf97348a256fb33fd4d85a57#file-configuration-yml-L2-L5

- trigger:
      - id: state_changed
        platform: state
        entity_id: sensor.147334558305449_indoor_temperature
        to: null

@kepstin
Copy link
Contributor

kepstin commented May 13, 2024

In order to calculate the EWMA function correctly for a sensor which reports updates only when the value changed, the template has to calculate the EWMA using the sensor's previous value up to the time when the change was reported so future updates have the correct starting value. That's what this trigger is for.

If you have a short interval in the time pattern trigger it doesn't make a whole lot of difference - but if you have a sensor which might update multiple times in an interval, this is necessary for the EWMA calculation to be accurate.

@maia
Copy link
Author

maia commented May 17, 2024

@kepstin I'm sorry for the late reply. I understand, if the values can change within the regular interval of one minute, one needs to take these into account and can't just ignore them. But I wonder, what is the to: null for? When reading the code I thought this would only trigger when the entity goes unavailable, but it seems to me that this line has no effect at all, as it will trigger updates on any change and not only when changing to null.

@issue-triage-workflows
Copy link

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates.
Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍
This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

@issue-triage-workflows issue-triage-workflows bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 22, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Sep 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants