Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unifi Protect Notification Videos randomly fail. #132524

Closed
ianalexander opened this issue Dec 6, 2024 · 20 comments
Closed

Unifi Protect Notification Videos randomly fail. #132524

ianalexander opened this issue Dec 6, 2024 · 20 comments

Comments

@ianalexander
Copy link

ianalexander commented Dec 6, 2024

The Problem

Using 2024.12, I have an automation setup using the suggested implementation in the docs. About ~20% of the time, this integration fails to generate a video for the notification, and instead returns this error:
IMG_7183

Things I've Tried

A. I've tried pausing the automation 30s prior to the notification sent, testing if this was some kind of issue with the event_id not being ready to generate a video. This did not work.
B. I've tried implementing a cool down, only allowing one notification to be sent every 60s, testing if this was some kind of concurrent request issue. This did not work.

Current Hypothesises

  1. I have both motion and smart detection setup on my cameras. Perhaps there is some kind of id collision happening while one event finishes, but another is ongoing?
  2. Somehow, the integration is sending a malformed request to the uiprotect library that fails to generate the video
  3. Somehow, the proxy URL fails to send the video

Other observations

  1. After the notification has failed with the image shown above, I can manually curl the api endpoint for the video and successfully download the video, which is what made me initially think this might be some kind of race issue.
  2. event_id is always successfully obtained, so I believe this is not related to issue UniFi Protect - No Longer Consistently Providing {event_id} as attribute from detection sensor for thumbnails #130577
  3. There is not a lot of logging happening in the integration, so I don't have a ton of visibility whats happening in the actual integration:

class VideoEventProxyView(ProtectProxyView):
"""View to proxy video clips for events from UniFi Protect."""

I recognize this is probably not enough information. Any help on how I can better debug this issue would be great! Thank you for reading this issue.


What version of Home Assistant Core has the issue?

core-2024.12.0

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

unifiprotect

Link to integration documentation on our website

https://www.home-assistant.io/integrations/unifiprotect

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

No response

@home-assistant
Copy link

home-assistant bot commented Dec 6, 2024

Hey there @RaHehl, mind taking a look at this issue as it has been labeled with an integration (unifiprotect) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of unifiprotect can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Renames the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign unifiprotect Removes the current integration label and assignees on the issue, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


unifiprotect documentation
unifiprotect source
(message by IssueLinks)

@ianalexander
Copy link
Author

ianalexander commented Dec 6, 2024

Automation I'm using:

alias: Camera Motion Video Notifications
description: Sends a notification with video upon motion detection.
triggers:
  - entity_id:
      - binary_sensor.g5_dome_motion
      - binary_sensor.g5_bullet_motion
      - binary_sensor.g5_dome_object_detected
      - binary_sensor.g5_bullet_object_detected
    trigger: state
    from: "on"
    to: "off"
actions:
  - action: notify.mobile_app
    data:
      title: "{{ state_attr(trigger.entity_id, 'friendly_name') }}"
      message: >-
        at {{ as_local(states[trigger.entity_id].last_changed).strftime('%I:%M
        %p') }}
      data:
        video: >-
          /api/unifiprotect/video/{{ config_entry_id(trigger.entity_id) }}/{{
          trigger.from_state.attributes.event_id }}
mode: single
max_exceeded: silent

Debug logs during a bad notification:

2024-12-06 13:24:22.206 DEBUG (MainThread) [uiprotect.api] emitting message: WSAction.ADD:ModelType.EVENT:67536b8600ac0303e40031c0:['model', 'id', 'type', 'start', 'end', 'score', 'heatmap_id', 'camera_id', 'smart_detect_types', 'smart_detect_event_ids', 'thumbnail_id', 'user_id', 'timestamp', 'metadata', 'deleted_at', 'deletion_type', 'category', 'sub_category']
2024-12-06 13:24:22.207 DEBUG (MainThread) [uiprotect.utils] event WS msg: {'model': <ModelType.EVENT: 'event'>, 'id': '67536b8600ac0303e40031c0', 'type': <EventType.MOTION: 'motion'>, 'start': datetime.datetime(2024, 12, 6, 21, 24, 18, 776000, tzinfo=datetime.timezone.utc), 'end': None, 'score': 0, 'heatmap_id': None, 'camera_id': '6455ae9801c98e03e4001b8f', 'smart_detect_types': [], 'smart_detect_event_ids': [], 'thumbnail_id': None, 'user_id': None, 'timestamp': None, 'metadata': {'client_platform': None, 'reason': None, 'app_update': None, 'light_id': None, 'light_name': None, 'type': None, 'sensor_id': None, 'sensor_name': None, 'sensor_type': None, 'doorlock_id': None, 'doorlock_name': None, 'from_value': None, 'to_value': None, 'mount_type': None, 'status': None, 'alarm_type': None, 'device_id': None, 'mac': None, 'license_plate': None, 'detected_thumbnails': None, 'nfc': None, 'fingerprint': None}, 'deleted_at': None, 'deletion_type': None, 'category': None, 'sub_category': None}
2024-12-06 13:24:22.207 DEBUG (MainThread) [homeassistant.components.unifiprotect.data] Updating device: Garage (E438830D7D43)
2024-12-06 13:24:22.207 DEBUG (MainThread) [homeassistant.components.unifiprotect.entity] Updating state [Garage Motion (E438830D7D43)] [True, False, {}] -> (True, True, {'event_id': '67536b8600ac0303e40031c0', 'event_score': 0})
2024-12-06 13:24:22.212 DEBUG (MainThread) [uiprotect.api] emitting message: WSAction.UPDATE:ModelType.CAMERA:6455ae9801c98e03e4001b8f:['is_motion_detected', 'is_recording']
2024-12-06 13:24:22.212 DEBUG (MainThread) [homeassistant.components.unifiprotect.data] Updating device: Garage (E438830D7D43)
2024-12-06 13:24:23.190 DEBUG (MainThread) [uiprotect.data.bootstrap] Unknown model type: automation
2024-12-06 13:24:23.214 DEBUG (MainThread) [uiprotect.data.bootstrap] Unknown model type: automation
2024-12-06 13:24:27.640 DEBUG (MainThread) [uiprotect.api] emitting message: WSAction.UPDATE:ModelType.CAMERA:6455ae9801c98e03e4001b8f:['is_motion_detected', 'is_recording']
2024-12-06 13:24:27.640 DEBUG (MainThread) [homeassistant.components.unifiprotect.data] Updating device: Garage (E438830D7D43)
2024-12-06 13:24:27.667 DEBUG (MainThread) [uiprotect.api] emitting message: WSAction.UPDATE:ModelType.EVENT:67536b8600ac0303e40031c0:['end', 'score']
2024-12-06 13:24:27.667 DEBUG (MainThread) [uiprotect.utils] event WS msg: {'model': <ModelType.EVENT: 'event'>, 'id': '67536b8600ac0303e40031c0', 'type': <EventType.MOTION: 'motion'>, 'start': datetime.datetime(2024, 12, 6, 21, 24, 18, 776000, tzinfo=datetime.timezone.utc), 'end': datetime.datetime(2024, 12, 6, 21, 24, 26, 63000, tzinfo=datetime.timezone.utc), 'score': 100, 'heatmap_id': None, 'camera_id': '6455ae9801c98e03e4001b8f', 'smart_detect_types': [], 'smart_detect_event_ids': [], 'thumbnail_id': None, 'user_id': None, 'timestamp': None, 'metadata': {'client_platform': None, 'reason': None, 'app_update': None, 'light_id': None, 'light_name': None, 'type': None, 'sensor_id': None, 'sensor_name': None, 'sensor_type': None, 'doorlock_id': None, 'doorlock_name': None, 'from_value': None, 'to_value': None, 'mount_type': None, 'status': None, 'alarm_type': None, 'device_id': None, 'mac': None, 'license_plate': None, 'detected_thumbnails': None, 'nfc': None, 'fingerprint': None}, 'deleted_at': None, 'deletion_type': None, 'category': None, 'sub_category': None}
2024-12-06 13:24:27.668 DEBUG (MainThread) [homeassistant.components.unifiprotect.data] Updating device: Garage (E438830D7D43)
2024-12-06 13:24:27.668 DEBUG (MainThread) [homeassistant.components.unifiprotect.entity] Updating state [Garage Motion (E438830D7D43)] [True, True, {'event_id': '67536b8600ac0303e40031c0', 'event_score': 0}] -> (True, False, {})
2024-12-06 13:24:27.766 DEBUG (MainThread) [uiprotect.api] Request url: https://<mydomain>/proxy/protect/api/events/67536b8600ac0303e40031c0
2024-12-06 13:24:27.924 DEBUG (MainThread) [uiprotect.api] Request url: https://<mydomain>/proxy/protect/api/video/export
2024-12-06 13:24:28.683 DEBUG (MainThread) [uiprotect.api] emitting message: WSAction.UPDATE:ModelType.EVENT:67536b8600ac0303e40031c0:['metadata']
2024-12-06 13:24:28.684 DEBUG (MainThread) [uiprotect.utils] event WS msg: {'model': <ModelType.EVENT: 'event'>, 'id': '67536b8600ac0303e40031c0', 'type': <EventType.MOTION: 'motion'>, 'start': datetime.datetime(2024, 12, 6, 21, 24, 18, 776000, tzinfo=datetime.timezone.utc), 'end': datetime.datetime(2024, 12, 6, 21, 24, 26, 63000, tzinfo=datetime.timezone.utc), 'score': 100, 'heatmap_id': None, 'camera_id': '6455ae9801c98e03e4001b8f', 'smart_detect_types': [], 'smart_detect_event_ids': [], 'thumbnail_id': None, 'user_id': None, 'timestamp': None, 'metadata': {'client_platform': None, 'reason': None, 'app_update': None, 'light_id': None, 'light_name': None, 'type': None, 'sensor_id': None, 'sensor_name': None, 'sensor_type': None, 'doorlock_id': None, 'doorlock_name': None, 'from_value': None, 'to_value': None, 'mount_type': None, 'status': None, 'alarm_type': None, 'device_id': None, 'mac': None, 'license_plate': None, 'detected_thumbnails': [{'clock_best_wall': datetime.datetime(2024, 12, 6, 21, 24, 21, 776000, tzinfo=datetime.timezone.utc), 'type': 'motion', 'cropped_id': '67536b8c011a0303e40031c3', 'attributes': None, 'name': None}], 'nfc': None, 'fingerprint': None}, 'deleted_at': None, 'deletion_type': None, 'category': None, 'sub_category': None}
2024-12-06 13:24:28.684 DEBUG (MainThread) [homeassistant.components.unifiprotect.data] Updating device: Garage (E438830D7D43)% 

@ianalexander
Copy link
Author

ianalexander commented Dec 7, 2024

Thinking out loud here... when the automation requests the /api/unifiprotect/video/{nvr_id}/{event_id} the following happens:

  1. the event_id is converted into an event:

    try:
    event = await data.api.get_event(event_id)

  2. event.start and event.end are passed to the uiprotect method camera.get_video():

    try:
    await camera.get_video(event.start, event.end, iterator_callback=iterator)

  3. get_video() ultimately calls get_camera_video() which says that exports can be plus or minus a few seconds:
    https://github.com/uilibs/uiprotect/blob/432da70b4becec3b0cfbef32b8ab3bf3ffa1fb2c/src/uiprotect/api.py#L1483-L1508

I wonder if there is something going on with the start and end time conversion that causes this request to fail? This would explain why manually curl'ing the endpoint after the event works, but perhaps for the brief moment there is some kind of issue with the start and end times that causes this to fail?

Or -- perhaps exports fail when there is an ongoing event?

@ianalexander
Copy link
Author

ianalexander commented Dec 7, 2024

Requesting a 10 second clip that ends 2 seconds in the past via the uiprotect CLI will randomly fail with error code 500:

uiprotect cameras <camera-id> save-video bash.mp4 $(date -v -8H -v -12S -u +"%Y-%m-%dT%H:%M:%S") $(date -v -2S -v -8H -u +"%Y-%m-%dT%H:%M:%S")

Those bash commands will evaluate to a datetime that is 12 seconds in the past, and 2 second in the past.

This is making me rethink that perhaps requesting a video clip with an end time too close to now() will fail sometimes.

Perhaps this is a feature request to add logging to uiprotect here to add status code logging of this request to help with debugging:
https://github.com/uilibs/uiprotect/blob/432da70b4becec3b0cfbef32b8ab3bf3ffa1fb2c/src/uiprotect/api.py#L1543-L1549

@ianalexander
Copy link
Author

ianalexander commented Dec 7, 2024

@bdraco and @RaHehl -- would it be okay if I added a pull request to uiprotect that added some additional logging to help debug this issue? I am happy to implement this logging in one of two ways:

A\ specific logging within the get_camera_video() function in uiprotect
B\ general request logging within the request() function in uiprotect

(My recommendation would be more general logging, via option (B), but I am open to your advice here.)

In either scenario I would log 2 new things: (1) the request parameters, and (2) the response status code. The goal of this PR would be to more visibly log when requesting a video fails, along with details of the request that failed. Right now, if get_camera_video() fails, it does so silently -- returning nothing to home assistant and generating the image at the top of this issue.

@RaHehl
Copy link
Contributor

RaHehl commented Dec 8, 2024

@lutzvahl You’re likely the most familiar with this at the moment. Could you kindly take a look at the potential issue with the new endpoints introduced in #129980?

@RaHehl
Copy link
Contributor

RaHehl commented Dec 8, 2024

@ianalexander I can't speak for @bdraco, but my thought is that debug logging is generally a good idea as long as it's disabled by default.

@bdraco
Copy link
Member

bdraco commented Dec 8, 2024

Its a bit hard to tell if debug logging is a good idea or if it will have a significant performance impact (even when disabled if it has expensive argument formatting, although this is rare but happens more often than you would expect) until I see an open PR

@ianalexander
Copy link
Author

ianalexander commented Dec 9, 2024

I've spent a little more time debugging this today. Here is the current status of what I know, and am still trying to debug:

  1. I've identified what I believe to be the root cause of the issue: the Protect Server will occasionally return a 500 error when requesting a video export. The current uiprotect implementation will fail silently, which at a minimum I think I should probably add something to make this more obvious.
    try:
    await camera.get_video(event.start, event.end, iterator_callback=iterator)
    except ClientError as err:
    return _404(err)

I've added some logging in my fork and confirmed that this happens when the server returns 500. In uiprotect, get_camera_video() will happily return None without making any noise -- home assistant then renders the error message above. I think ideally we should at least warn the user that we tried to get a video, but the server was unable to deliver.

  1. I've built a proof-of-concept to implement retries in get_camera_video() following the pattern in get_image_with_retry(). This seems to help some of the time, but occasionally the server will just continue to send 500's with each retry for 10 whole seconds. It's definitely a step in the right direction IMO, and increases the overall robustness of the function.

  2. The core reason why some responses just return 500 is still unknown. I have found no predictability when the server will decide to send back a 500. I have noticed that the official Unifi Protect Web UI has moved onto a new API pattern: sending a request to /proxy/protect/api/video/prepare, and then /proxy/protect/api/video/download. Perhaps this pattern is more robust, and should be considered for future work?

@lutzvahl - given your recent work in this area, maybe you could advise on the best path forward? I would propose two options to consider:

A\ I can submit this fork that implements retries for get_camera_video() as a PR, and we can continue to discuss how to get this merged on that PR. Or,
B\ I can submit a smaller PR that only implements logging to alert the user when the server has returned 500, while being cautious to not implement some expensive argument formatting as @bdraco suggested.

I would recommend (A), as I am currently using this fork in my live home assistant instance and it's definitely working better for me. I still get the occasional error from unifi protect, but the overall failure rate is much less since it will retry if the initial response is 500.

@RaHehl
Copy link
Contributor

RaHehl commented Dec 9, 2024

Thank you for the valuable and detailed analysis of the issue. I really appreciate the time and effort you’ve put into debugging this.

Personally, I don't think implementing retries is the right solution. Ideally, we should aim to reproduce this issue using the Protect Web UI or app and then escalate it to the Unifi team for resolution. From my own experience, I know this can be a challenging process, but addressing the root cause is the proper way forward. Implementing retries feels more like a patch than a solution, and it risks masking the underlying issue rather than resolving it.

Alternatively, as you mentioned, it might be worth exploring the new API pattern used by the Protect Web UI (/proxy/protect/api/video/prepare and /proxy/protect/api/video/download). However, I’m not deeply familiar with that part of the implementation, so I can’t confidently say if it’s the better path forward without further investigation.

@lutzvahl
Copy link
Contributor

lutzvahl commented Dec 9, 2024

Thanks @ianalexander for raising the issue and looking into this.

IMO we should not add a video_retry() because the _get_image_with_retry() fulfils the use case to provide the image once the event is over. This assumption is not true for the video case.

Let's look into the new unifi protect APIs used to download an event video:

/proxy/protect/api/video/download?camera=672497ae0047d903e4001f4f&filename=xxxx.mp4&&lens=0&start=1733688041746&type=rotating
once 200 returned

/proxy/protect/api/video/download?camera=672497ae0047d903e4001f4f&filename=xxxx.mp4
Note the download one is without start / end params hence the name needs to be unique.

@ianalexander Can you create a PR to change the get_camera_video() API function? Happy to help with the implementation

@ianalexander
Copy link
Author

Thanks for the kind feedback -- you all are the experts so I think it okay not to do retries. After thinking about it more, you're right that it doesn't really solve the underlying issue.

@ianalexander Can you create a PR to change the get_camera_video() API function? Happy to help with the implementation

Yes, I can take a stab. This will be my first contribution so I will definitely need your guidance on getting this merged.

Ideally, we should aim to reproduce this issue using the Protect Web UI or app and then escalate it to the Unifi team for resolution.

While I think that switching to the new API calls should work, I don't know for certain it will. @RaHehl in the event we do need to communicate with the Unifi team would you be able to do that?

@ianalexander
Copy link
Author

Well, I spoke too soon. I did successfully implement the prepare and download API calls, but to the same problem: prepare will randomly return a 500.

This curl command will maybe ~5-10% of the time return 500. It happens more frequently if I do ~3 or so quickly one after another:

curl \
-vvv \
-H "cookie: TOKEN=<snip>; Domain=<snip>; expires=Sun, 05 Jan 2025 20:53:59 GMT; HttpOnly; partitioned; Path=/; SameSite=none; Secure" \
-H "x-csrf-token: <snip>" \
"<snip>/proxy/protect/api/video/prepare?camera=6455ae9801c98e03e4001b8f&start=${$(($(date +%s000)-17000))}&end=${$(($(date +%s000)-5000))}&type=rotating&channel=0&filename=test123-${$(($(date +%s000)-5000))}.mp4"

Those variables in start and end will return a millisecond timestamp 17 seconds in the past, and 5 seconds in the past, respectively, for a 12 second clip.

Is this a "my system" problem? Or can any of you reproduce this issue?

@ianalexander
Copy link
Author

ianalexander commented Dec 10, 2024

After SSH'ing into by UDM and tail'ing logs while making the curl request above that returns 500 occasionally here are the error messages I see:

/volume1/.srv/unifi-protect/logs/errors.log:

2024-12-10T03:54:25.614Z - error: Failed to prepare video export No files found matching criteria. 
Error: No files found matching criteria.
    at /usr/share/unifi-protect/app/webpack:/unifi-protect/src/middleware/video/export/index.ts:106:58
2024-12-10T03:54:25.616Z - error: 18 ms | 500 GET /api/video/prepare?camera=6455ae9801c98e03e4001b8f&start=1733802853000&end=1733802863000&type=rotating&channel=0&filename=test123-1733802863000.mp4

/volume1/.srv/unifi-protect/logs/video.export.log:

2024-12-10T03:56:13.315Z - info: getRotatingPlaybacks  #3 old files start=1733802961000 end=1733802971000 sortDirection=ASC

/volume1/.srv/unifi-protect/logs/api.log:

2024-12-10T03:56:13.324Z - error: Failed to prepare video export No files found matching criteria. 
Error: No files found matching criteria.
    at /usr/share/unifi-protect/app/webpack:/unifi-protect/src/middleware/video/export/index.ts:106:58

This is so strange because it seems to happen randomly. Sometimes it will peacefully export, other times it will start throwing 500.

edit: Accidentally closed, then re-opened

@marcoisabella
Copy link

marcoisabella commented Dec 13, 2024

Have you notice that the issue happens only when you are connected to the same network where your HA and UI are?
Basically internal network vs external.

Try. I have the same exact issue and I start thinking it is a dns/reverse proxy issue.

@marcoisabella
Copy link

marcoisabella commented Dec 13, 2024

Scratch that, I added a 2s delay and now it seems to work.

Edit: it still happens. Not sure what else to try.

@ianalexander
Copy link
Author

Still can't pin down the issue here. Two more observations

  • The media browser works: Using the media browser to browse to Media -> Unifi Protect -> Camera -> Motion Events -> Last 24 hours shows a list that upon clicking, generates a /api/unifiprotect/video/<nvr>/<camera>/<start>/<end>?authSig=<sig> URL that seems to work every single time. This makes me think the issue lies in the arguments that are passed to the view, as the underlying request seems to work.
  • The media browser truncates microseconds: See the code snippet below. I tested copying this logic into the /video/<event_id> method and perhaps it helped a little, but did not completely solve the issue.

return url_format.format(
nvr_id=event.api.bootstrap.nvr.id,
camera_id=event.camera_id,
start=event.start.replace(microsecond=0).isoformat(),
end=event.end.replace(microsecond=0).isoformat(),
)

With this in mind, I think this narrows down the issue to one of two potential root causes:
A. Bad URL arguments: The arguments passed to the /api/ method cause a request to be generated that triggers a 500. Perhaps something to do with truncating microseconds? But when I tested truncating the microseconds in the /video/<event_id> method, this didn't solve it.
B. Race condition: Something about triggering the URL too close to the event finishing causes that 500 on the server.

I no longer think the issue lies in get_camera_video(), because seemingly the media browser can use this method fine without any issue.

@marcoisabella
Copy link

Do you think there will be a fix on the UniFi Protect integration, or it is something we need to figure out on our own?

@ianalexander
Copy link
Author

I've spent more time debugging this -- but can't seem to figure out exactly what the issue is.

I've submitted a PR to the uiprotect integration that adds better error handling and logging which I can use to help get more information and try and figure out where the underlying issue is.

As I mentioned above, today uiprotect fails silently if the protect server returns any status code other than 200. With this PR, an irregular status code will raise an error, and hopefully this will help others try and debug this issue and identify the root cause.

I still believe the two potential causes I wrote in my last comment are the best guesses, but I am unable to reliably reproduce the issue: #132524 (comment)

For anyone else who has this issue: the first step to fixing is to find a reliable way to reproduce the issue. Any help in this area is appreciated!

@issue-triage-workflows
Copy link

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates.
Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍
This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

@issue-triage-workflows issue-triage-workflows bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants