Server death does not automatically release LUN mapping #6

dannert · 2019-11-29T21:07:36Z

I deployed a mongodb-dev helm chart with the 1.0.2 driver. After successful deployment I killed the server hosting the MongoDB to simulate a server failure and observe recovery behavior.

Based on my testing, the MongoDB database does not recover without significant manual intervention because Flex driver does not force an unmap / remap of the LUN to the new worker node.

Issue 1) Deleting the POD from CLI hangs. K8S notices the delete, puts the POD into Terminating and starts a new POD on another worker. The creation of that new POD fails as the Flex driver does not force unmap the LUN - even with worker node down and POD Terminating - and does not map the LUN to the new worker. Question is, why does the POD deletion hang - could that be in the Flex driver?
Issue 2) No forced unmap / re-map.

The only way to cleanly, without manual intervention, get the MongoDB running again is to restart the "failed" worker node. At that point the "Delete POD" command completes and the LUN is successfully re-mapped. I believe that in "real life" it is somewhat unlikely that a server which "died" comes back in a short time frame --> any application relying on that MongoDB would be hanging.

Screenshot of new MongoDB POD log illustrating the issue:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Server death does not automatically release LUN mapping #6

Server death does not automatically release LUN mapping #6

dannert commented Nov 29, 2019

Server death does not automatically release LUN mapping #6

Server death does not automatically release LUN mapping #6

Comments

dannert commented Nov 29, 2019