You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I deployed a mongodb-dev helm chart with the 1.0.2 driver. After successful deployment I killed the server hosting the MongoDB to simulate a server failure and observe recovery behavior.
Based on my testing, the MongoDB database does not recover without significant manual intervention because Flex driver does not force an unmap / remap of the LUN to the new worker node.
Issue 1) Deleting the POD from CLI hangs. K8S notices the delete, puts the POD into Terminating and starts a new POD on another worker. The creation of that new POD fails as the Flex driver does not force unmap the LUN - even with worker node down and POD Terminating - and does not map the LUN to the new worker. Question is, why does the POD deletion hang - could that be in the Flex driver?
Issue 2) No forced unmap / re-map.
The only way to cleanly, without manual intervention, get the MongoDB running again is to restart the "failed" worker node. At that point the "Delete POD" command completes and the LUN is successfully re-mapped. I believe that in "real life" it is somewhat unlikely that a server which "died" comes back in a short time frame --> any application relying on that MongoDB would be hanging.
Screenshot of new MongoDB POD log illustrating the issue:
The text was updated successfully, but these errors were encountered:
I deployed a mongodb-dev helm chart with the 1.0.2 driver. After successful deployment I killed the server hosting the MongoDB to simulate a server failure and observe recovery behavior.
Based on my testing, the MongoDB database does not recover without significant manual intervention because Flex driver does not force an unmap / remap of the LUN to the new worker node.
Issue 1) Deleting the POD from CLI hangs. K8S notices the delete, puts the POD into Terminating and starts a new POD on another worker. The creation of that new POD fails as the Flex driver does not force unmap the LUN - even with worker node down and POD Terminating - and does not map the LUN to the new worker. Question is, why does the POD deletion hang - could that be in the Flex driver?
Issue 2) No forced unmap / re-map.
The only way to cleanly, without manual intervention, get the MongoDB running again is to restart the "failed" worker node. At that point the "Delete POD" command completes and the LUN is successfully re-mapped. I believe that in "real life" it is somewhat unlikely that a server which "died" comes back in a short time frame --> any application relying on that MongoDB would be hanging.
Screenshot of new MongoDB POD log illustrating the issue:
The text was updated successfully, but these errors were encountered: