Replies: 3 comments 2 replies
-
@ospii we need details on what specifically your clients do. An public repo with an executable way to run this workload. All these logs say is that We cannot and will not guess what that might be without an executable example. |
Beta Was this translation helpful? Give feedback.
-
@ospii moved to a discussion until we have a way to reproduce. |
Beta Was this translation helpful? Give feedback.
-
@ospii you are very likely to be running into #12358 It will be fixed in 4.0.3 |
Beta Was this translation helpful? Give feedback.
-
Describe the bug
After upgrading to 4.0.2 some queues started to crash after 4-72 hours of runtime. Some queues might recover and some not. Attached logs from the hosts.
rabbit01.log
is the one handling clients and after Erlang/rabbitmq-things in the log : queue is left in a state where basic.get causes an error.rabbit02.log
andrabbit03.log
are logs from the cluster members just hanging around.Queue's strange state is also visible on the management UI. All values are "?" and "-1" in runtime metrics.
Average rate of around 1 message sec around the clock, has a dead-letter-exchange, durable, quorum.
Temp fix : Delete queue, it gets redeclared and all is good until next crash.
Also tried : A new vhost which has not seen previous rabbitmq versions. No change.
rabbit01.log
rabbit02.log
rabbit03.log
Reproduction steps
After several rounds of up/downgrading between 4.0.2 and 3.13.7, rebuilding cluster from scratch and deleting queues/vhosts and blasting it with several thousand messages per second on my test cluster I was able to reproduce the error once, but of course without any notes where I was exactly 🤕 I can continue trying to pinpoint if needed.
Expected behavior
basic.get works for clients
Additional context
I have zero experience with Erlang so everything you've read might be totally misinterpreted 😄
Cluster has three nodes el9 derivate with rpm repository installation
erlang-26.2.5.3-1.el9.x86_64
rabbitmq-server-4.0.2-1.el8.noarch (el8?)
Cluster statistics for 1 day
Publish rate for a day : 10/s
Churn created and closed : 1.3s/
About 350 AMQP 0-9-1 clients
Beta Was this translation helpful? Give feedback.
All reactions