Came in to work today to find a VM stuck at the “In Progress” status while taking a snapshot. We use vDR as a complementary subset to our backup plans, and vDR had the unfortunate task of calling the snapshot which is now hung.
The official error read “An error occurred while quiescing the virtual machine. See the virtual machine’s event log for details.” One problem with that, I couldn’t log into the system. The snapshot was far enough along to freeze the IO, so I had to jump into CLI and kill the task.
To kill the task for a VM, jump into the CLI for the host (in this case it was through the iDRAC and local terminal… bad, I know) and run a: ps | grep vmx command to see all the processes while searching for vmx’s
Locate the Parent Process ID (the second column) for the hung VM, and run: kill *parent process ID* to end the process. In this case, it was: kill 465724
***DISCLAIMER: Be very careful doing this, if you don’t kill the proper process, it can do harm to your ESXi host***
Instantly, the task remaining in progress should change to have a status of “The attempted operation cannot be performed in the current state (Powered off).”
Now, it’s time to check out that error. In this case I received a “Volume Shadow Copy Service error: Unexpected error DeviceIoControl” with the rest of the error seemingly pointing at the generic floppy drive. I know this is the error because it’s pointing dead at the VMware Snapshot Provider and in a state of “DoSnapshotSet”
That’s incredibly weird. So I uninstall both the floppy drive and it’s controller, also remove it from the VM’s settings while it’s powered off. I boot the system back up, the floppy drive has reinstalled itself. Very odd, so I just uninstall the drive and then disable the controller.
So it’s snapshot time again, right? Well, not really. I retried the snapshot, it freezes again. Time for some googling after I kill the parent process for the VM again.
What I came up with is that there is apparently something with Windows Server 2008 R2 systems having SQL 2008. This was a frequent topic over many VMware Communities posts, however no one really every had an answer on what was going on internal to the VM which would cause this problem. I know we have 4 or 5 SQL servers running and this is the first system we’ve run into this problem on.
Anyways, the best workaround I found was to disable the disk.EnableUUID parameter on the VM. Please note that by disabling this, you effectively disable VSS for the snapshot (ie. no quiescing). So I maintain that this is only a workaround and not yet a true solution
To do this, shut down the VM. Right click, Edit Settings, hit the Options tab, and click on “Configuration Parameters”
In the Configuration Parameter pop-up screen, look for the “disk.EnableUUID” setting and change the value to read “false”
Click OK a couple times and boot the system up. Once it’s booted up, try giving it a snapshot while checking the option for “Quiesce guest file system”. This time, everything was successful. I ran the test and I also had the vDR appliance run another snapshot to get that one up to date
Hopefully I can do some more research and turn up some better answers, and at worst I’ll create a support ticket and see if VMware Support can point me in a better direction