Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JetStream stream source stalls/is out of sync after sourced stream is re-created for unrelated reason [v2.10.20] #6206

Open
facefirstMH opened this issue Dec 3, 2024 · 1 comment
Assignees
Labels
defect Suspected defect such as a bug or regression

Comments

@facefirstMH
Copy link

facefirstMH commented Dec 3, 2024

Observed behavior

Setup: basic hub spoke with leafnodes.
hub is a single jetstream enabled server in its own domain, so are the leaf nodes.
On the hub, there is a stream named match_events that sources from the match_events stream in each leaf node.

Symptoms:
Sourcing from match_events on one leafnode stopped working.
Despite new messages being added to it, no data is sent to match_events in the hub.

Cause:
I believe we ran into a crash during recovery that sounds similar to one fixed in a release candidate of 2.10.23.
In order to recover from this issue, we renamed the data directory, which resulted in data loss but allowed the system to become functional again.

On the leaf node with the issue, sequence numbers in the stream are less than 80.
Create time is right after the time when the jetstream data directory got renamed and everything started to seemingly function again.

On the hub node, stream info showed the edge connected with low lag.
But when subscribing to the consumer create message of that leaf, we noticed that the requested start seq no is 995 and hence way bigger than 80.
Meaning 995 was the sequence number the previous incarnation of match_events had on that leaf and 79 is the value of the current incarnation
This would explain why there seems to be no data flow ... yet.

Failed workaround:
We edited the match_events stream in the hub to remove the source corresponding to the leaf.
Then we re-added it. This did not seem to work as we recorded the consumer create messages and the sequence number remained at the old value it had before the source got removed & re-added.
I then edited the source to set the op_start_seq to 1. This didn't work either, as presumably a higher sequence number was still in memory.
A few hours later we restarted the hub server. After that the sequence number was reset to one.


Logs:

Hub consumer create messages:

[#13] Received on "$JS.edge-652efb2da425e854a1e8d203.API.CONSUMER.CREATE.match_events" with reply "$JSC.R.FEkNK9Is"
{"stream_name":"match_events","config":{"deliver_policy":"by_start_sequence","opt_start_seq":995,"ack_policy":"none","ack_wait":79200000000000,"max_deliver":1,"replay_policy":"instant","idle_heartbeat":1000000000,"flow_control":true,"deliver_subject":"deliver.edge-652efb2da425e854a1e8d203.match_events.S.l2lkXkVm","inactive_threshold":10000000000,"num_replicas":0,"direct":true},"action":""}

[#21] Received on "$JS.edge-652efb2da425e854a1e8d203.API.CONSUMER.CREATE.match_events" with reply "$JSC.R.3L70SiTW"
{"stream_name":"match_events","config":{"deliver_policy":"by_start_sequence","opt_start_seq":995,"ack_policy":"none","ack_wait":79200000000000,"max_deliver":1,"replay_policy":"instant","idle_heartbeat":1000000000,"flow_control":true,"deliver_subject":"deliver.edge-652efb2da425e854a1e8d203.match_events.S.qyIUgE5D","inactive_threshold":10000000000,"num_replicas":0,"direct":true},"action":""}

Leafnode match_events stream:

> nats stream ls -a
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                           Streams                                                           │
├─────────────────────┬────────────────────────────────────────────┬─────────────────────┬───────────┬─────────┬──────────────┤
│ Name                │ Description                                │ Created             │ Messages  │ Size    │ Last Message │
├─────────────────────┼────────────────────────────────────────────┼─────────────────────┼───────────┼─────────┼──────────────┤
...
│ match_events        │                                            │ 2024-11-27 17:53:49 │ 79        │ 15 MiB  │ 25m10s       │
...
╰─────────────────────┴────────────────────────────────────────────┴─────────────────────┴───────────┴─────────┴──────────────╯

> nats stream info match_events
Information for Stream match_events created 2024-11-27 17:53:49

              Subjects: faceio.match.>
              Replicas: 1
               Storage: File

Options:

             Retention: Limits
       Acknowledgments: true
        Discard Policy: Old
      Duplicate Window: 2m0s
     Allows Msg Delete: true
          Allows Purge: true
        Allows Rollups: false

Limits:

      Maximum Messages: unlimited
   Maximum Per Subject: unlimited
         Maximum Bytes: unlimited
           Maximum Age: 7d0h0m0s
  Maximum Message Size: unlimited
     Maximum Consumers: unlimited

Cluster Information:

                  Name: edge-652efb2da425e854a1e8d203
                Leader: edge-652efb2da425e854a1e8d203

State:

              Messages: 79
                 Bytes: 15 MiB
        First Sequence: 1 @ 2024-11-27 18:13:14
         Last Sequence: 79 @ 2024-12-02 17:57:52
      Active Consumers: 1
    Number of Subjects: 79
>

Hub consumer create messages for failed workaround:
The first message was received after re-creating the source.
The second message was received after editing opt_start_seq of the source.
In both cases, opt_start_seq in the request remains at the last old value.

>nats sub "$JS.edge-652efb2da425e854a1e8d203.API.CONSUMER.CREATE.match_events"
10:44:28 Subscribing on $JS.edge-652efb2da425e854a1e8d203.API.CONSUMER.CREATE.match_events
[#1] Received on "$JS.edge-652efb2da425e854a1e8d203.API.CONSUMER.CREATE.match_events" with reply "$JSC.R.XoBsIZ61"
{"stream_name":"match_events","config":{"deliver_policy":"by_start_sequence","opt_start_seq":995,"ack_policy":"none","ack_wait":79200000000000,"max_deliver":1,"replay_policy":"instant","idle_heartbeat":1000000000,"flow_control":true,"deliver_subject":"deliver.edge-652efb2da425e854a1e8d203.match_events.S.58t1cta4","inactive_threshold":10000000000,"num_replicas":0,"direct":true},"action":""}
[#2] Received on "$JS.edge-652efb2da425e854a1e8d203.API.CONSUMER.CREATE.match_events" with reply "$JSC.R.h5dA0pYu"
{"stream_name":"match_events","config":{"deliver_policy":"by_start_sequence","opt_start_seq":995,"ack_policy":"none","ack_wait":79200000000000,"max_deliver":1,"replay_policy":"instant","idle_heartbeat":1000000000,"flow_control":true,"deliver_subject":"deliver.edge-652efb2da425e854a1e8d203.match_events.S.j4aQssSi","inactive_threshold":10000000000,"num_replicas":0,"direct":true},"action":""}

Hub consumer create messages after hub nats server restart show opt_start_seq=1.

[#7] Received on "$JS.edge-652efb2da425e854a1e8d203.API.CONSUMER.CREATE.match_events" with reply "$JSC.R.snChCWwx"
{"stream_name":"match_events","config":{"deliver_policy":"by_start_sequence","opt_start_seq":995,"ack_policy":"none","ack_wait":79200000000000,"max_deliver":1,"replay_policy":"instant","idle_heartbeat":1000000000,"flow_control":true,"deliver_subject":"deliver.edge-652efb2da425e854a1e8d203.match_events.S.6U6FCNWD","inactive_threshold":10000000000,"num_replicas":0,"direct":true},"action":""}
15:19:43 Disconnected due to: read tcp 127.0.0.1:32592->127.0.0.1:4222: wsarecv: An existing connection was forcibly closed by the remote host., will attempt reconnect
15:20:09 Reconnected [nats://ff-queue-user:LtJQaah4Y2PUGAUyx8_oLZ@127.0.0.1:4222]
[#8] Received on "$JS.edge-652efb2da425e854a1e8d203.API.CONSUMER.CREATE.match_events" with reply "$JSC.R.hwd0hutm"
{"stream_name":"match_events","config":{"deliver_policy":"by_start_sequence","opt_start_seq":1,"ack_policy":"none","ack_wait":79200000000000,"max_deliver":1,"replay_policy":"instant","idle_heartbeat":1000000000,"flow_control":true,"deliver_subject":"deliver.edge-652efb2da425e854a1e8d203.match_events.S.ZoqTXdlB","inactive_threshold":10000000000,"num_replicas":0,"direct":true},"action":""}

Hub match_events stream taken right after consumer create [#2] from above, showing start seqno 1

Information for Stream match_events created 2024-11-15 10:45:14

              Subjects: faceio.match.>
              Replicas: 1
               Storage: File

Options:

             Retention: Limits
       Acknowledgments: true
        Discard Policy: Old
      Duplicate Window: 2m0s
     Allows Msg Delete: true
          Allows Purge: true
        Allows Rollups: false

Limits:

      Maximum Messages: unlimited
   Maximum Per Subject: unlimited
         Maximum Bytes: unlimited
           Maximum Age: 7d0h0m0s
  Maximum Message Size: unlimited
     Maximum Consumers: unlimited

Replication:

               Sources: ...
                        match_events, Start Seq: 1, API Prefix: $JS.edge-652efb2da425e854a1e8d203.API, Delivery Prefix: deliver.edge-652efb2da425e854a1e8d203.match_events
                        ...

Source Information:

...

           Stream Name: match_events
                   Lag: 0
             Last Seen: 934ms
       Ext. API Prefix: $JS.edge-652efb2da425e854a1e8d203.API
  Ext. Delivery Prefix: deliver.edge-652efb2da425e854a1e8d203.match_events

...

State:

              Messages: 40,303
                 Bytes: 7.4 GiB
        First Sequence: 505,520 @ 2024-11-26 11:02:59
         Last Sequence: 545,822 @ 2024-12-03 11:02:56
      Active Consumers: 1
    Number of Subjects: 37,538


Leafnode nats-server logs showing restart loop:
nats server basically starts and exits right away. (within same second. different from being killed because it's taking too long)

[15052] 2024/11/27 17:43:00.113089 [INF] Starting nats-server
[15052] 2024/11/27 17:43:00.113621 [INF]   Version:  2.10.20
[15052] 2024/11/27 17:43:00.113621 [INF]   Git:      [7140387]
[15052] 2024/11/27 17:43:00.113621 [INF]   Cluster:  edge-652efb2da425e854a1e8d203
[15052] 2024/11/27 17:43:00.113621 [INF]   Name:     edge-652efb2da425e854a1e8d203
[15052] 2024/11/27 17:43:00.113621 [INF]   Node:     cSXjN80e
[15052] 2024/11/27 17:43:00.113621 [INF]   ID:       NCPMUJYVKW6V742BE4GN4RH2V6LCIFTB5MESCYDKJKURJMXCVQWJQXOQ
[15052] 2024/11/27 17:43:00.113671 [INF] Using configuration file: C:\Program Files\FaceFirst\FaceFirst-MessageBroker-Edge\conf\edge.conf
[15052] 2024/11/27 17:43:00.114322 [INF] Starting http monitor on 127.0.0.1:8223
[15052] 2024/11/27 17:43:00.114322 [INF] Starting JetStream
[15052] 2024/11/27 17:43:00.114827 [INF]     _ ___ _____ ___ _____ ___ ___   _   __  __
[15052] 2024/11/27 17:43:00.114884 [INF]  _ | | __|_   _/ __|_   _| _ \ __| /_\ |  \/  |
[15052] 2024/11/27 17:43:00.114884 [INF] | || | _|  | | \__ \ | | |   / _| / _ \| |\/| |
[15052] 2024/11/27 17:43:00.114884 [INF]  \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_|  |_|
[15052] 2024/11/27 17:43:00.114884 [INF] 
[15052] 2024/11/27 17:43:00.114884 [INF]          https://docs.nats.io/jetstream
[15052] 2024/11/27 17:43:00.114884 [INF] 
[15052] 2024/11/27 17:43:00.114884 [INF] ---------------- JETSTREAM ----------------
[15052] 2024/11/27 17:43:00.114884 [INF]   Max Memory:      23.94 GB
[15052] 2024/11/27 17:43:00.114884 [INF]   Max Storage:     1.00 TB
[15052] 2024/11/27 17:43:00.114884 [INF]   Store Directory: "D:\ProgramData\FaceFirst\FaceFirst-MessageBroker-Edge\data-store\jetstream"
[15052] 2024/11/27 17:43:00.114884 [INF]   Domain:          edge-652efb2da425e854a1e8d203
[15052] 2024/11/27 17:43:00.114884 [INF] -------------------------------------------
[15052] 2024/11/27 17:43:00.115438 [INF]   Starting restore for stream 'FF-QUEUE-ACC > KV_enrollments'
[15052] 2024/11/27 17:43:00.166680 [INF]   Restored 69,478 messages for stream 'FF-QUEUE-ACC > KV_enrollments' in 51ms
[15052] 2024/11/27 17:43:00.166680 [INF]   Starting restore for stream 'FF-QUEUE-ACC > KV_media'
[15052] 2024/11/27 17:43:00.208948 [WRN] Filestore [KV_media] loadBlock error: message block data missing
[15052] 2024/11/27 17:43:00.211651 [WRN] Filestore [KV_media] Detected skew in subject-based total (64) vs raw total (58), rebuilding
[15052] 2024/11/27 17:43:00.238941 [INF]   Restored 58 messages for stream 'FF-QUEUE-ACC > KV_media' in 72ms
[15052] 2024/11/27 17:43:00.239938 [INF]   Starting restore for stream 'FF-QUEUE-ACC > KV_settings'
[15052] 2024/11/27 17:43:00.239938 [INF]   Restored 42 messages for stream 'FF-QUEUE-ACC > KV_settings' in 0s
[15052] 2024/11/27 17:43:00.240936 [INF]   Starting restore for stream 'FF-QUEUE-ACC > KV_visitors'
[17012] 2024/11/27 17:45:03.817424 [INF] Starting nats-server
[17012] 2024/11/27 17:45:03.818469 [INF]   Version:  2.10.20
[17012] 2024/11/27 17:45:03.818469 [INF]   Git:      [7140387]
[17012] 2024/11/27 17:45:03.818469 [INF]   Cluster:  edge-652efb2da425e854a1e8d203
[17012] 2024/11/27 17:45:03.818469 [INF]   Name:     edge-652efb2da425e854a1e8d203
[17012] 2024/11/27 17:45:03.818469 [INF]   Node:     cSXjN80e
[17012] 2024/11/27 17:45:03.818469 [INF]   ID:       NBZL4XFOC64CKXUWPM7ZCS7NNDI5X25LSMHXWZOCCB7XVM4UTOP6TD66
[17012] 2024/11/27 17:45:03.818469 [INF] Using configuration file: C:\Program Files\FaceFirst\FaceFirst-MessageBroker-Edge\conf\edge.conf
[17012] 2024/11/27 17:45:03.819454 [INF] Starting http monitor on 127.0.0.1:8223
[17012] 2024/11/27 17:45:03.819454 [INF] Starting JetStream
[17012] 2024/11/27 17:45:03.819977 [INF]     _ ___ _____ ___ _____ ___ ___   _   __  __
[17012] 2024/11/27 17:45:03.819977 [INF]  _ | | __|_   _/ __|_   _| _ \ __| /_\ |  \/  |
[17012] 2024/11/27 17:45:03.819977 [INF] | || | _|  | | \__ \ | | |   / _| / _ \| |\/| |
[17012] 2024/11/27 17:45:03.819977 [INF]  \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_|  |_|
[17012] 2024/11/27 17:45:03.819977 [INF] 
[17012] 2024/11/27 17:45:03.819977 [INF]          https://docs.nats.io/jetstream
[17012] 2024/11/27 17:45:03.819977 [INF] 
[17012] 2024/11/27 17:45:03.819977 [INF] ---------------- JETSTREAM ----------------
[17012] 2024/11/27 17:45:03.819977 [INF]   Max Memory:      23.94 GB
[17012] 2024/11/27 17:45:03.819977 [INF]   Max Storage:     1.00 TB
[17012] 2024/11/27 17:45:03.819977 [INF]   Store Directory: "D:\ProgramData\FaceFirst\FaceFirst-MessageBroker-Edge\data-store\jetstream"
[17012] 2024/11/27 17:45:03.819977 [INF]   Domain:          edge-652efb2da425e854a1e8d203
[17012] 2024/11/27 17:45:03.819977 [INF] -------------------------------------------
[17012] 2024/11/27 17:45:03.821034 [INF]   Starting restore for stream 'FF-QUEUE-ACC > KV_enrollments'
[17012] 2024/11/27 17:45:03.878169 [INF]   Restored 69,478 messages for stream 'FF-QUEUE-ACC > KV_enrollments' in 57ms
[17012] 2024/11/27 17:45:03.878169 [INF]   Starting restore for stream 'FF-QUEUE-ACC > KV_media'
[17012] 2024/11/27 17:45:03.879175 [WRN] Filestore [KV_media] loadBlock error: message block data missing
[17012] 2024/11/27 17:45:03.882158 [WRN] Filestore [KV_media] Detected skew in subject-based total (64) vs raw total (58), rebuilding
[17012] 2024/11/27 17:45:03.911169 [INF]   Restored 58 messages for stream 'FF-QUEUE-ACC > KV_media' in 33ms
[17012] 2024/11/27 17:45:03.911169 [INF]   Starting restore for stream 'FF-QUEUE-ACC > KV_settings'
[17012] 2024/11/27 17:45:03.911169 [INF]   Restored 42 messages for stream 'FF-QUEUE-ACC > KV_settings' in 0s
[17012] 2024/11/27 17:45:03.912166 [INF]   Starting restore for stream 'FF-QUEUE-ACC > KV_visitors'
[14928] 2024/11/27 17:47:06.923509 [INF] Starting nats-server
[14928] 2024/11/27 17:47:06.924012 [INF]   Version:  2.10.20
[14928] 2024/11/27 17:47:06.924068 [INF]   Git:      [7140387]
[14928] 2024/11/27 17:47:06.924068 [INF]   Cluster:  edge-652efb2da425e854a1e8d203
[14928] 2024/11/27 17:47:06.924068 [INF]   Name:     edge-652efb2da425e854a1e8d203
[14928] 2024/11/27 17:47:06.924068 [INF]   Node:     cSXjN80e
[14928] 2024/11/27 17:47:06.924068 [INF]   ID:       NBNOJN5SXYFVUKV73CCNICSKU22K2TBCLHQW2GG5ATAQAHIWI2GMWYDA
[14928] 2024/11/27 17:47:06.924109 [INF] Using configuration file: C:\Program Files\FaceFirst\FaceFirst-MessageBroker-Edge\conf\edge.conf
[14928] 2024/11/27 17:47:06.924644 [INF] Starting http monitor on 127.0.0.1:8223
[14928] 2024/11/27 17:47:06.924644 [INF] Starting JetStream
[14928] 2024/11/27 17:47:06.925157 [INF]     _ ___ _____ ___ _____ ___ ___   _   __  __
[14928] 2024/11/27 17:47:06.925157 [INF]  _ | | __|_   _/ __|_   _| _ \ __| /_\ |  \/  |
[14928] 2024/11/27 17:47:06.925157 [INF] | || | _|  | | \__ \ | | |   / _| / _ \| |\/| |
[14928] 2024/11/27 17:47:06.925157 [INF]  \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_|  |_|
[14928] 2024/11/27 17:47:06.925157 [INF] 
[14928] 2024/11/27 17:47:06.925157 [INF]          https://docs.nats.io/jetstream
[14928] 2024/11/27 17:47:06.925157 [INF] 
[14928] 2024/11/27 17:47:06.925157 [INF] ---------------- JETSTREAM ----------------
[14928] 2024/11/27 17:47:06.925157 [INF]   Max Memory:      23.94 GB
[14928] 2024/11/27 17:47:06.925157 [INF]   Max Storage:     1.00 TB
[14928] 2024/11/27 17:47:06.925157 [INF]   Store Directory: "D:\ProgramData\FaceFirst\FaceFirst-MessageBroker-Edge\data-store\jetstream"
[14928] 2024/11/27 17:47:06.925157 [INF]   Domain:          edge-652efb2da425e854a1e8d203
[14928] 2024/11/27 17:47:06.925157 [INF] -------------------------------------------
[14928] 2024/11/27 17:47:06.926284 [INF]   Starting restore for stream 'FF-QUEUE-ACC > KV_enrollments'
[14928] 2024/11/27 17:47:06.977355 [INF]   Restored 69,478 messages for stream 'FF-QUEUE-ACC > KV_enrollments' in 51ms
[14928] 2024/11/27 17:47:06.977355 [INF]   Starting restore for stream 'FF-QUEUE-ACC > KV_media'
[14928] 2024/11/27 17:47:06.978352 [WRN] Filestore [KV_media] loadBlock error: message block data missing
[14928] 2024/11/27 17:47:06.981374 [WRN] Filestore [KV_media] Detected skew in subject-based total (64) vs raw total (58), rebuilding
[14928] 2024/11/27 17:47:07.008302 [INF]   Restored 58 messages for stream 'FF-QUEUE-ACC > KV_media' in 31ms
[14928] 2024/11/27 17:47:07.009270 [INF]   Starting restore for stream 'FF-QUEUE-ACC > KV_settings'
[14928] 2024/11/27 17:47:07.009270 [INF]   Restored 42 messages for stream 'FF-QUEUE-ACC > KV_settings' in 0s
[14928] 2024/11/27 17:47:07.009270 [INF]   Starting restore for stream 'FF-QUEUE-ACC > KV_visitors'
[11864] 2024/11/27 17:49:10.535479 [INF] Starting nats-server
[11864] 2024/11/27 17:49:10.536034 [INF]   Version:  2.10.20
[11864] 2024/11/27 17:49:10.536034 [INF]   Git:      [7140387]
[11864] 2024/11/27 17:49:10.536034 [INF]   Cluster:  edge-652efb2da425e854a1e8d203
[11864] 2024/11/27 17:49:10.536034 [INF]   Name:     edge-652efb2da425e854a1e8d203
[11864] 2024/11/27 17:49:10.536034 [INF]   Node:     cSXjN80e
[11864] 2024/11/27 17:49:10.536034 [INF]   ID:       NCP5BHETIHWEQ6UR7G6GKEVMF23YRHRMDP56QJ5HHQX7ROZ7MB6PBXD3
[11864] 2024/11/27 17:49:10.536034 [INF] Using configuration file: C:\Program Files\FaceFirst\FaceFirst-MessageBroker-Edge\conf\edge.conf
[11864] 2024/11/27 17:49:10.537076 [INF] Starting http monitor on 127.0.0.1:8223
[11864] 2024/11/27 17:49:10.537076 [INF] Starting JetStream
[11864] 2024/11/27 17:49:10.537076 [INF]     _ ___ _____ ___ _____ ___ ___   _   __  __
[11864] 2024/11/27 17:49:10.537076 [INF]  _ | | __|_   _/ __|_   _| _ \ __| /_\ |  \/  |
[11864] 2024/11/27 17:49:10.537076 [INF] | || | _|  | | \__ \ | | |   / _| / _ \| |\/| |
[11864] 2024/11/27 17:49:10.537076 [INF]  \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_|  |_|
[11864] 2024/11/27 17:49:10.537076 [INF] 
[11864] 2024/11/27 17:49:10.537076 [INF]          https://docs.nats.io/jetstream
[11864] 2024/11/27 17:49:10.537076 [INF] 
[11864] 2024/11/27 17:49:10.537076 [INF] ---------------- JETSTREAM ----------------
[11864] 2024/11/27 17:49:10.537585 [INF]   Max Memory:      23.94 GB
[11864] 2024/11/27 17:49:10.537585 [INF]   Max Storage:     1.00 TB
[11864] 2024/11/27 17:49:10.537585 [INF]   Store Directory: "D:\ProgramData\FaceFirst\FaceFirst-MessageBroker-Edge\data-store\jetstream"
[11864] 2024/11/27 17:49:10.537612 [INF]   Domain:          edge-652efb2da425e854a1e8d203
[11864] 2024/11/27 17:49:10.537612 [INF] -------------------------------------------
[11864] 2024/11/27 17:49:10.538210 [INF]   Starting restore for stream 'FF-QUEUE-ACC > KV_enrollments'
[11864] 2024/11/27 17:49:10.592603 [INF]   Restored 69,478 messages for stream 'FF-QUEUE-ACC > KV_enrollments' in 54ms
[11864] 2024/11/27 17:49:10.592603 [INF]   Starting restore for stream 'FF-QUEUE-ACC > KV_media'
[11864] 2024/11/27 17:49:10.593601 [WRN] Filestore [KV_media] loadBlock error: message block data missing
[11864] 2024/11/27 17:49:10.596593 [WRN] Filestore [KV_media] Detected skew in subject-based total (64) vs raw total (58), rebuilding
[11864] 2024/11/27 17:49:10.623551 [INF]   Restored 58 messages for stream 'FF-QUEUE-ACC > KV_media' in 31ms
[11864] 2024/11/27 17:49:10.624519 [INF]   Starting restore for stream 'FF-QUEUE-ACC > KV_settings'
[11864] 2024/11/27 17:49:10.624519 [INF]   Restored 42 messages for stream 'FF-QUEUE-ACC > KV_settings' in 0s
[11864] 2024/11/27 17:49:10.625523 [INF]   Starting restore for stream 'FF-QUEUE-ACC > KV_visitors'
[7624] 2024/11/27 17:51:13.625652 [INF] Starting nats-server
[7624] 2024/11/27 17:51:13.625652 [INF]   Version:  2.10.20
[7624] 2024/11/27 17:51:13.625652 [INF]   Git:      [7140387]
[7624] 2024/11/27 17:51:13.625652 [INF]   Cluster:  edge-652efb2da425e854a1e8d203
[7624] 2024/11/27 17:51:13.625652 [INF]   Name:     edge-652efb2da425e854a1e8d203
[7624] 2024/11/27 17:51:13.625652 [INF]   Node:     cSXjN80e
[7624] 2024/11/27 17:51:13.626155 [INF]   ID:       NAI7ED2WRL3F2QVFHDTH7VKDAMONEN5ONZHVLMFLMDV4GVT3VGRSHPKP
[7624] 2024/11/27 17:51:13.626160 [INF] Using configuration file: C:\Program Files\FaceFirst\FaceFirst-MessageBroker-Edge\conf\edge.conf
[7624] 2024/11/27 17:51:13.626671 [INF] Starting http monitor on 127.0.0.1:8223
[7624] 2024/11/27 17:51:13.626671 [INF] Starting JetStream
[7624] 2024/11/27 17:51:13.627179 [INF]     _ ___ _____ ___ _____ ___ ___   _   __  __
[7624] 2024/11/27 17:51:13.627179 [INF]  _ | | __|_   _/ __|_   _| _ \ __| /_\ |  \/  |
[7624] 2024/11/27 17:51:13.627179 [INF] | || | _|  | | \__ \ | | |   / _| / _ \| |\/| |
[7624] 2024/11/27 17:51:13.627179 [INF]  \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_|  |_|
[7624] 2024/11/27 17:51:13.627179 [INF] 
[7624] 2024/11/27 17:51:13.627179 [INF]          https://docs.nats.io/jetstream
[7624] 2024/11/27 17:51:13.627179 [INF] 
[7624] 2024/11/27 17:51:13.627179 [INF] ---------------- JETSTREAM ----------------
[7624] 2024/11/27 17:51:13.627179 [INF]   Max Memory:      23.94 GB
[7624] 2024/11/27 17:51:13.627179 [INF]   Max Storage:     1.00 TB
[7624] 2024/11/27 17:51:13.627179 [INF]   Store Directory: "D:\ProgramData\FaceFirst\FaceFirst-MessageBroker-Edge\data-store\jetstream"
[7624] 2024/11/27 17:51:13.627179 [INF]   Domain:          edge-652efb2da425e854a1e8d203
[7624] 2024/11/27 17:51:13.627179 [INF] -------------------------------------------
[7624] 2024/11/27 17:51:13.628405 [INF]   Starting restore for stream 'FF-QUEUE-ACC > KV_enrollments'
[7624] 2024/11/27 17:51:13.678519 [INF]   Restored 69,478 messages for stream 'FF-QUEUE-ACC > KV_enrollments' in 50ms
[7624] 2024/11/27 17:51:13.678519 [INF]   Starting restore for stream 'FF-QUEUE-ACC > KV_media'
[7624] 2024/11/27 17:51:13.679484 [WRN] Filestore [KV_media] loadBlock error: message block data missing
[7624] 2024/11/27 17:51:13.682459 [WRN] Filestore [KV_media] Detected skew in subject-based total (64) vs raw total (58), rebuilding
[7624] 2024/11/27 17:51:13.710384 [INF]   Restored 58 messages for stream 'FF-QUEUE-ACC > KV_media' in 32ms
[7624] 2024/11/27 17:51:13.711381 [INF]   Starting restore for stream 'FF-QUEUE-ACC > KV_settings'
[7624] 2024/11/27 17:51:13.711381 [INF]   Restored 42 messages for stream 'FF-QUEUE-ACC > KV_settings' in 0s
[7624] 2024/11/27 17:51:13.711381 [INF]   Starting restore for stream 'FF-QUEUE-ACC > KV_visitors'

Expected behavior

The problem is that jetstream at the core does not differentiate between the old and new version of the stream match_events.

If the source stream where to receive and keep track of the remote stream's create time and included that time in the consumer create, the remote server can check if the time submitted differs from the create time of the stream and return an error instead.
This error could be the signal for the hub jetstream to start over by resetting the sequence number, obtain and update the new timestamp, create the new consumer again.

This probably has implications for mirroring as well. There a differing timestamp should probably result in deleting the current content prior to starting at 0 again.

Server and client version

2.10.20 & 2.10.23

Host environment

No response

Steps to reproduce

No response

@facefirstMH facefirstMH added the defect Suspected defect such as a bug or regression label Dec 3, 2024
@jnmoyne jnmoyne self-assigned this Dec 5, 2024
@wallyqs wallyqs changed the title Jetstream stream source stalls/is out of sync after sourced stream is re-created for unrelated reason Jetstream stream source stalls/is out of sync after sourced stream is re-created for unrelated reason [v2.10.20] Dec 11, 2024
@wallyqs wallyqs changed the title Jetstream stream source stalls/is out of sync after sourced stream is re-created for unrelated reason [v2.10.20] JetStream stream source stalls/is out of sync after sourced stream is re-created for unrelated reason [v2.10.20] Dec 11, 2024
@facefirstMH
Copy link
Author

This happened with nats-server v2.10.23 as well.

My earlier attempt at a workaround included a nats-server restart that appeared to have fixed it after re-creating the source.
Now I believe this was just a timing coincidence as that leaf nodes data probably was expired by ttl in the meantime.

What works reliably is delete the source, purge that leafs data, create the source again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect Suspected defect such as a bug or regression
Projects
None yet
Development

No branches or pull requests

3 participants