feat: Added initial working "Health Check" feature #5720

XGhozt · 2025-03-23T11:56:14Z

[x] I have read and understand the pull request rules.

Description

I've added a new option under Settings -> Notifications called "Health Check". Admins can select a monitor to be designated as the health check monitor. If the selected monitor is offline, all notifications will be paused and downtime ignored. This is ideal for monitoring connectivity to the internet. Particularly for homelab administrators who want to monitor services and not get spammed when their internet is down.

Fixes #774

Type of change

Please delete any options that are not relevant.

User interface (UI)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

Checklist

My code follows the style guidelines of this project
I ran ESLint and other linters for modified files
I have performed a self-review of my own code and tested it
I have commented my code, particularly in hard-to-understand areas (including JSDoc for methods)
My changes generates no new warnings
My code needed automated testing. I have added them (this is optional task)

Screenshots (if any)

CommanderStorm

Please in the future open a discussion beforehand. Lets take your proposal as a design discussion.
Lets have this discussion now.

Checking before the monitoring loop if id == UP (buggy) of the last entry of the heartbeat table does not work and I am not sure if this is the right approach.

Could you have a look at these pointers:

In v2, the aggregator does aggregate heatbeats to get around limitations of sqlite and improve performance. Please look up: if we aggregate heatbeats, does the table might get emptied fully, or just partially? If it is fully, your approach does not work and we would need to add a marker.
make sure that the monitors get visually paused and it is clear to the user that this is not unpausable. In the current design, they just stop

This brings me to how I think we should build this:

at the end of the monitoring loop, we should check if we are the heartbeat monior. If we are, and a marker in the Aggregator of each other monitor should be made that a node is paused by the sytem until the next scheduled check. The graph should reflect this
for status changes (be it maintenance, pausing, resuming) we should add a notification that the system level healthcheck has been adivated or something similar.

This being a larger change, I would like to see unittests.

Since that would be a major feature change, I would also like to see a senior maintainer have a look at the updated proposal (after looking at what I suggested and the existing code, how does this play togeter? What is best for users/performance/..) before you implement it

XGhozt · 2025-03-25T04:54:04Z

Thanks @CommanderStorm -- I just recently started using Uptime Kuma, so I am also new to the code base. I'd happily open a discussion if you point me in the right direction. The Github Discussions for this project are disabled. If there was some other process I missed, again I am new to the Uptime Kuma. Any direction or insight would be great, I'd like to get this implemented and I'm not giving up!

Also, oops, the healthCheckMonitor.id === UP was suppose to be healthCheckMonitor.status === UP

make sure that the monitors get visually paused and it is clear to the user that this is not unpausable. In the current design, they just stop

My goal was to completely halt the entire system because we don't want to impact the uptime percentage or graphs while internet is down... right? The biggest problem is people like myself who self host in a homelab sometimes have connection issues and this triggers hundreds of notifications to go out and all the services to have a big red outages and less than 100% uptime.

So my question is, should we add a marker at all? It can't actually check the monitor if the internet is down, and therefor we'd just be filling the history with essentially empty heartbeats. What would adding hundreds of heartbeats on every monitor accomplish? I am asking so I understand the goal, I'm confident I can implement whatever the Uptime Kuma community wants.

People were writing scripts to shut down the container completely just to avoid the notifications and impacts to the monitor history. So my approach was.. direct. However, I was unsure about querying the last heartbeat because I believe this also ignores the configuration of the monitor.

The reason I did it this way is because I poked around to see how Uptime Kuma determines what that little "Up" or "Down" tag is shown when viewing a monitor, and it's just referencing statusList() which looks at the lastHeartbeatList object which ultimately checks lastHeartBeat.status === UP. So, is there a better way to do this?

for status changes (be it maintenance, pausing, resuming) we should add a notification that the system level healthcheck has been adivated or something similar.

100% agree with this. I was actually considering it but I didn't see anything else similar doing this already. My thought was a full-width bar the top of the page. I still am not clear on the best way to determine if a monitor is actually down.

This being a larger change, I would like to see unittests.

No problem, I'll add this.

XGhozt · 2025-03-25T06:03:36Z

I've added the alert, but I'm curious if there is a better way to check the status.
(In my example, "Internet" is the name of the monitor)

Added initial working "Health Check" feature

18a2c81

XGhozt mentioned this pull request Mar 23, 2025

ping reliable service to prevent false DOWN on internet connection loss #774

Open

XGhozt changed the title ~~Added initial working "Health Check" feature~~ feat: Added initial working "Health Check" feature Mar 23, 2025

CommanderStorm requested changes Mar 24, 2025

View reviewed changes

CommanderStorm added feature-request Request for new features to be added A:notifications Issues or PRs related to notifications A:settings Issues or PRs related to Settings page and application configration labels Mar 24, 2025

CommanderStorm marked this pull request as draft March 24, 2025 23:33

XGhozt added 2 commits March 24, 2025 21:32

Merge branch 'louislam:master' into health-check

4042085

Fixed typo

b8f3541

Added health check alert when the health check monitor is down

7a94fc8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Added initial working "Health Check" feature #5720

feat: Added initial working "Health Check" feature #5720

XGhozt commented Mar 23, 2025

CommanderStorm left a comment

XGhozt commented Mar 25, 2025

XGhozt commented Mar 25, 2025

feat: Added initial working "Health Check" feature #5720

Are you sure you want to change the base?

feat: Added initial working "Health Check" feature #5720

Conversation

XGhozt commented Mar 23, 2025

Description

Type of change

Checklist

Screenshots (if any)

CommanderStorm left a comment

Choose a reason for hiding this comment

XGhozt commented Mar 25, 2025

XGhozt commented Mar 25, 2025