Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Added initial working "Health Check" feature #5720

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

XGhozt
Copy link

@XGhozt XGhozt commented Mar 23, 2025

[x] I have read and understand the pull request rules.

Description

I've added a new option under Settings -> Notifications called "Health Check". Admins can select a monitor to be designated as the health check monitor. If the selected monitor is offline, all notifications will be paused and downtime ignored. This is ideal for monitoring connectivity to the internet. Particularly for homelab administrators who want to monitor services and not get spammed when their internet is down.

Fixes #774

Type of change

Please delete any options that are not relevant.

  • User interface (UI)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Checklist

  • My code follows the style guidelines of this project
  • I ran ESLint and other linters for modified files
  • I have performed a self-review of my own code and tested it
  • I have commented my code, particularly in hard-to-understand areas (including JSDoc for methods)
  • My changes generates no new warnings
  • My code needed automated testing. I have added them (this is optional task)

Screenshots (if any)

image

@XGhozt XGhozt changed the title Added initial working "Health Check" feature feat: Added initial working "Health Check" feature Mar 23, 2025
Copy link
Collaborator

@CommanderStorm CommanderStorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please in the future open a discussion beforehand. Lets take your proposal as a design discussion.
Lets have this discussion now.

Checking before the monitoring loop if id == UP (buggy) of the last entry of the heartbeat table does not work and I am not sure if this is the right approach.

Could you have a look at these pointers:

  • In v2, the aggregator does aggregate heatbeats to get around limitations of sqlite and improve performance. Please look up: if we aggregate heatbeats, does the table might get emptied fully, or just partially? If it is fully, your approach does not work and we would need to add a marker.
  • make sure that the monitors get visually paused and it is clear to the user that this is not unpausable. In the current design, they just stop

This brings me to how I think we should build this:

  • at the end of the monitoring loop, we should check if we are the heartbeat monior. If we are, and a marker in the Aggregator of each other monitor should be made that a node is paused by the sytem until the next scheduled check. The graph should reflect this
  • for status changes (be it maintenance, pausing, resuming) we should add a notification that the system level healthcheck has been adivated or something similar.

This being a larger change, I would like to see unittests.

Since that would be a major feature change, I would also like to see a senior maintainer have a look at the updated proposal (after looking at what I suggested and the existing code, how does this play togeter? What is best for users/performance/..) before you implement it

@CommanderStorm CommanderStorm added feature-request Request for new features to be added A:notifications Issues or PRs related to notifications A:settings Issues or PRs related to Settings page and application configration labels Mar 24, 2025
@CommanderStorm CommanderStorm marked this pull request as draft March 24, 2025 23:33
@XGhozt
Copy link
Author

XGhozt commented Mar 25, 2025

Thanks @CommanderStorm -- I just recently started using Uptime Kuma, so I am also new to the code base. I'd happily open a discussion if you point me in the right direction. The Github Discussions for this project are disabled. If there was some other process I missed, again I am new to the Uptime Kuma. Any direction or insight would be great, I'd like to get this implemented and I'm not giving up!

Also, oops, the healthCheckMonitor.id === UP was suppose to be healthCheckMonitor.status === UP

make sure that the monitors get visually paused and it is clear to the user that this is not unpausable. In the current design, they just stop

My goal was to completely halt the entire system because we don't want to impact the uptime percentage or graphs while internet is down... right? The biggest problem is people like myself who self host in a homelab sometimes have connection issues and this triggers hundreds of notifications to go out and all the services to have a big red outages and less than 100% uptime.

So my question is, should we add a marker at all? It can't actually check the monitor if the internet is down, and therefor we'd just be filling the history with essentially empty heartbeats. What would adding hundreds of heartbeats on every monitor accomplish? I am asking so I understand the goal, I'm confident I can implement whatever the Uptime Kuma community wants.

People were writing scripts to shut down the container completely just to avoid the notifications and impacts to the monitor history. So my approach was.. direct. However, I was unsure about querying the last heartbeat because I believe this also ignores the configuration of the monitor.

The reason I did it this way is because I poked around to see how Uptime Kuma determines what that little "Up" or "Down" tag is shown when viewing a monitor, and it's just referencing statusList() which looks at the lastHeartbeatList object which ultimately checks lastHeartBeat.status === UP. So, is there a better way to do this?

for status changes (be it maintenance, pausing, resuming) we should add a notification that the system level healthcheck has been adivated or something similar.

100% agree with this. I was actually considering it but I didn't see anything else similar doing this already. My thought was a full-width bar the top of the page. I still am not clear on the best way to determine if a monitor is actually down.

This being a larger change, I would like to see unittests.

No problem, I'll add this.

@XGhozt
Copy link
Author

XGhozt commented Mar 25, 2025

I've added the alert, but I'm curious if there is a better way to check the status.
(In my example, "Internet" is the name of the monitor)

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A:notifications Issues or PRs related to notifications A:settings Issues or PRs related to Settings page and application configration feature-request Request for new features to be added
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ping reliable service to prevent false DOWN on internet connection loss
2 participants