Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group DOWN with all monitors UP #369

Open
RomRider opened this issue Apr 2, 2025 · 8 comments
Open

Group DOWN with all monitors UP #369

RomRider opened this issue Apr 2, 2025 · 8 comments

Comments

@RomRider
Copy link

RomRider commented Apr 2, 2025

Describe the bug
Related to #357 as it is not fixed on my side.

Group is DOWN after some time out of nowhere, with all monitors UP:

  • 2x DNS
  • 1x SQL
  • 3x API

Group timeout is 30s, children monitor max timeout 10s, all running every minute. Nothing in the logs. Restarting Kener fixes the issue for some time and then it breaks again.

Image

Image

Version
3.2.12

Environment
kubernetes

Database
sqlite

Expected behavior
Group should be UP

@rajnandan1
Copy link
Owner

How much time does it reoccur?

@RomRider
Copy link
Author

RomRider commented Apr 2, 2025

Once it starts behaving like that, unless kener is restarted, it stays DOWN.
Container was started for 37h before it broke again.

@rajnandan1
Copy link
Owner

@RomRider have you set up automatic triggers and incident creating for your group monitor??

@RomRider
Copy link
Author

RomRider commented Apr 2, 2025

No, just for the children inside. I'm not interested in being notified for the group.

@mschirrmeister
Copy link

I am noticing the same issue.
I have 2 DNS monitors which are always UP, no issues. The group has both monitors in it and is on the home page. After my container is up for a certain amount of time, the group stop working. I don't see anything in my container logs when I test the group.
Not sure if there is a debug setting that we can increase to see why it times out?

@rajnandan1
Copy link
Owner

I have changes the time out from exponential to linear, has kept it for somedays, I will releasing it, Hopefully that should fix this. it will be released in v3.2.13

@rajnandan1
Copy link
Owner

I have released a new version, please upgade to it and let me know if the issue still remains. Will close this issue after

@mschirrmeister
Copy link

mschirrmeister commented Apr 8, 2025

I have upgraded this morning and can let you know in 2-3 days. Sometimes it took a while until it stopped working.

From what I can see a group test takes also much much longer, than an individual monitor test. When I click on Test Monitor for my DNS monitors it takes around 3ms (+/-). When I do the test with the group monitor it 200-300ms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants