-
-
Notifications
You must be signed in to change notification settings - Fork 33.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sensor for unevenly spaced rolling time series average (EWMA exponentially weighted moving average) #115407
Comments
Hey there @ThomDietrich, mind taking a look at this issue as it has been labeled with an integration ( Code owner commandsCode owners of
(message by CodeOwnersMention) statistics documentation |
Hello @maia, thanks for reporting this improvement idea! I believe your wish was already implemented by @brenank in #88655 Just fyi, I didn't have the time to continue development of the statistics component the last 12 months. This topic has been discussed in different forms and I agree that multiple improvements to that end would be great.
Disagree. If a source sensor is unavailable, the statistics sensor must be unavailable as well. Proper home automation logic should not unknowingly act on incorrect or outdated data. Likewise a statistics sensor shouldn't wrongfully present a value which is in all likelihood incorrect. Your heating control should be able to cope with an unavailable temperature sensor. You could either fall back to a secondary temperature sensor or go into a failsafe mode. It didn't become clear to me how your first part of the message relates to the topic of exponentially weighted moving average. Could you reiterate? Fyi I did consider its implementation in the past (theoretically, thanks for sharing the links with implementation ideas) but 1. couldn't think of a strong use case and 2. I wasn't sure which parameters to provide and how. Overall it felt too scientific for the typical HA use cases. |
@ThomDietrich Thanks a lot for your detailed reply! I completely understand that the amount of available time for such projects isn't constant and occasionally too many other things have priority in life. I hope you'll find time to work on the sensor soon (and all your other endeavours). Regarding the issue of the statistics sensor switching to unavailable when there's no data point within the max age here's a specific use case: I use Versatile Thermostat to control the valve of my radiators (TRV). It relies on two inputs: the outdoor temperature and the room temperature and will adjust either the valve opening percentage or the target temperature at the TRV (depending on the model) based on these two values in comparison with the desired room temperature. Also it attempts to detect open windows by looking at the temperature change in the last minutes (and if the window seems to be open, it will turn off the TRV until the room temperature is stable again). So ideally I can provide two values that have as little noise as possible and never are unavailable – because when either measurement is unavailable, the TRV is put in failsafe mode (=turned off). For the outdoor temperature I use the average of four sensors: two local outdoor sensors (one via Thread, one via Zigbee) and two weather providers, I'm using https://github.com/Limych/ha-average to calculate this average of multiple sensors (another thing I'd put on the wishlist for the statistics sensor). Regarding the room temperature I have the issue of a sensor that is too "nervous" and would really benefit from being smoothed. But as it is a BLE sensor it attempts to save energy by not broadcasting unchanged values. So sometimes it does not report anything for maybe half an hour: ![]() As for the exponentially weighted moving average: Even high quality sensors produce noise. A single observation is rarely the exact description of reality. It can therefore improve data quality to not trust the "latest" measurement but rather try to identify the most likely reality within the recent measurements. Like a linear regression calculates the line of best fit. Or like the Western Electric Rules were used in the 1960s to identify stable trends and sudden changes in trend. Using a exponentially weighted moving average model (ARIMA, but there probably are similarly useful models) gives most weight to the most recent measurements, but does not cut off suddenly at the end of a specified time frame. This allows to quickly react to rapid changes but also to not freak out when there aren't many measurements in the time frame. It is also easy to handle as one only needs to save the timestamp and value of the last calculated value and not the entire series of the window. When using a EWMA model for a room temperature sensor it can be translated into "10 minutes ago the sensor measured 21.0°C, 5 minutes ago it also measured 21.0°C, 1 minute ago it measured 20.0°C, now it measured 20.5°C, so what's the most likely current temperature? It's probably not 20.25°C but rather slightly higher, as ten and five minutes ago we had higher measurements and they might still have a little bit of influence, also the measurement of 20.0°C might have been noise." EDIT: This might be of interest: |
I think the current calculation methods available in the statistics integration are fine - there's some reasonable moving window average methods which correctly handle irregular intervals. EWMA would be a nice addition, though. The algorithm itself isn't the problem, there's other issues in play. An implementation of EWMA would still suffer from many of those issues. To re-state those issues:
|
Actually, come to think of it, since the EWMA doesn't require storing a bunch of past state, it can be implemented easily in a template sensor. I've done that here: https://gist.github.com/kepstin/7eb43eeabf97348a256fb33fd4d85a57 using the algorithm linked earlier about how to implement EWMA for unevenly spaced data. It works well, and you can use arbitrary triggers to decide when it should update. |
Oh wow, that's a great solution! Of course it would be ideal if people could use EWMA within the statistics sensor without having to write their own jinja, but your suggestion is certainly of help! As for the issues you stated, I understand that these are independent and will need some attention somewhere in the future. |
@kepstin As a followup: What should the following do? It seems that it currently updates the EWMA on every https://gist.github.com/kepstin/7eb43eeabf97348a256fb33fd4d85a57#file-configuration-yml-L2-L5 - trigger:
- id: state_changed
platform: state
entity_id: sensor.147334558305449_indoor_temperature
to: null |
In order to calculate the EWMA function correctly for a sensor which reports updates only when the value changed, the template has to calculate the EWMA using the sensor's previous value up to the time when the change was reported so future updates have the correct starting value. That's what this trigger is for. If you have a short interval in the time pattern trigger it doesn't make a whole lot of difference - but if you have a sensor which might update multiple times in an interval, this is necessary for the EWMA calculation to be accurate. |
@kepstin I'm sorry for the late reply. I understand, if the values can change within the regular interval of one minute, one needs to take these into account and can't just ignore them. But I wonder, what is the |
There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. |
The problem
I have a number of temperature sensors (BLE from Govee) that only broadcast new temperatures. So if the temperature does not change at all for e.g. 15 minutes, it does not broadcast anything. This causes a problem with e.g. the
average_step
sensor.A
statistics
sensor should be configurable to not only use the values in the specified time frame but also the last known value, even if it is outside of the time frame. This would help when creating any sensor that ideally should never be unavailable. For example a heating system that is based on the outdoor temperature will cause issues if the outdoor temperature isunavailable
.In case this is not ideal for e.g. the
average_step
sensor I suggest to create a sensor for irregular time series, which probably is of the exponential weighted average type. There's lots of papers explaining these, e.g:Eckner (2019): Algorithms for Unevenly Spaced Time Series: Moving Averages and Other Rolling Operators
http://eckner.com/papers/Algorithms%20for%20Unevenly%20Spaced%20Time%20Series.pdf
Also maybe this is of use in the implementation: https://stackoverflow.com/questions/56956832/fast-ema-calculation-on-large-dataset-with-irregular-time-intervals
Here's an example for the issue:
What version of Home Assistant Core has the issue?
core-2024.4.2
What was the last working version of Home Assistant Core?
No response
What type of installation are you running?
Home Assistant OS
Integration causing the issue
statistics
Link to integration documentation on our website
https://www.home-assistant.io/integrations/statistics/
Diagnostics information
No response
Example YAML snippet
No response
Anything in the logs that might be useful for us?
No response
Additional information
No response
The text was updated successfully, but these errors were encountered: