A while ago, the VMs that hosted my OpenShift platform were rebooted. For most of it, all nodes and containers came back gracefully. However, there were a few containers that did not start because they had Persistent Volumes Claims to underlying SSD storage that kept failing with the following message:
MountVolume.MountDevice failed for volume "pvc-f6220425-22dd-11e8-ad31-000d3af45c1a" : azureDisk - mountDevice:FormatAndMount failed with failed to mount the volume as "ext4", it already contains mpath_member.
After some digging, I found the following: https://bugzilla.redhat.com/show_bug.cgi?id=1550271
Comment #14 in particular contains some critical steps that helped me to solve this issue.
On each of my app and infra nodes I followed these steps:
1) Add the “find_multipaths yes” line in /etc/multipath.conf
:
$ cat /etc/multipath.conf # LIO iSCSI # TODO: Add env variables for tweaking devices { device { vendor "LIO-ORG" user_friendly_names "yes" path_grouping_policy "failover" path_selector "round-robin 0" failback immediate path_checker "tur" prio "const" no_path_retry 120 rr_weight "uniform" } } blacklist { } defaults { find_multipaths "yes" #### ADD THIS LINE #### }
2) Remove the device from the wwids file. Example:
$ multipath -w /dev/sdd wwid '3600224800e3810d32614a31726daf7c8' removed
3) Restart multipath service:
$ systemctl restart multipathd
Regarding step 2 above, some commands that can help you to find the device that needs to be removed are:
$ lsblk sdd 8:48 0 10G 0 disk └─3600224808cbb2ebf8b08ee87764f24d8 253:3 0 10G 0 mpath sde 8:64 0 10G 0 disk └─3600224805e7c9eed650ebf3acdf00000 253:1 0 10G 0 mpath $ multipath -ll