-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Added initial working "Health Check" feature #5720
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please in the future open a discussion beforehand. Lets take your proposal as a design discussion.
Lets have this discussion now.
Checking before the monitoring loop if id == UP
(buggy) of the last entry of the heartbeat
table does not work and I am not sure if this is the right approach.
Could you have a look at these pointers:
- In v2, the aggregator does aggregate heatbeats to get around limitations of sqlite and improve performance. Please look up: if we aggregate heatbeats, does the table might get emptied fully, or just partially? If it is fully, your approach does not work and we would need to add a marker.
- make sure that the monitors get visually paused and it is clear to the user that this is not unpausable. In the current design, they just stop
This brings me to how I think we should build this:
- at the end of the monitoring loop, we should check if we are the heartbeat monior. If we are, and a marker in the Aggregator of each other monitor should be made that a node is paused by the sytem until the next scheduled check. The graph should reflect this
- for status changes (be it maintenance, pausing, resuming) we should add a notification that the system level healthcheck has been adivated or something similar.
This being a larger change, I would like to see unittests.
Since that would be a major feature change, I would also like to see a senior maintainer have a look at the updated proposal (after looking at what I suggested and the existing code, how does this play togeter? What is best for users/performance/..) before you implement it
Thanks @CommanderStorm -- I just recently started using Uptime Kuma, so I am also new to the code base. I'd happily open a discussion if you point me in the right direction. The Github Discussions for this project are disabled. If there was some other process I missed, again I am new to the Uptime Kuma. Any direction or insight would be great, I'd like to get this implemented and I'm not giving up! Also, oops, the
My goal was to completely halt the entire system because we don't want to impact the uptime percentage or graphs while internet is down... right? The biggest problem is people like myself who self host in a homelab sometimes have connection issues and this triggers hundreds of notifications to go out and all the services to have a big red outages and less than 100% uptime. So my question is, should we add a marker at all? It can't actually check the monitor if the internet is down, and therefor we'd just be filling the history with essentially empty heartbeats. What would adding hundreds of heartbeats on every monitor accomplish? I am asking so I understand the goal, I'm confident I can implement whatever the Uptime Kuma community wants. People were writing scripts to shut down the container completely just to avoid the notifications and impacts to the monitor history. So my approach was.. direct. However, I was unsure about querying the last heartbeat because I believe this also ignores the configuration of the monitor. The reason I did it this way is because I poked around to see how Uptime Kuma determines what that little "Up" or "Down" tag is shown when viewing a monitor, and it's just referencing
100% agree with this. I was actually considering it but I didn't see anything else similar doing this already. My thought was a full-width bar the top of the page. I still am not clear on the best way to determine if a monitor is actually down.
No problem, I'll add this. |
[x] I have read and understand the pull request rules.
Description
I've added a new option under Settings -> Notifications called "Health Check". Admins can select a monitor to be designated as the health check monitor. If the selected monitor is offline, all notifications will be paused and downtime ignored. This is ideal for monitoring connectivity to the internet. Particularly for homelab administrators who want to monitor services and not get spammed when their internet is down.
Fixes #774
Type of change
Please delete any options that are not relevant.
Checklist
Screenshots (if any)