Skip to content

Snapshot stuck “In Progress” Workaround

Came in to work today to find a VM stuck at the “In Progress” status while taking a snapshot. We use vDR as a complementary subset to our backup plans, and vDR had the unfortunate task of calling the snapshot which is now hung.

The official error read “An error occurred while quiescing the virtual machine. See the virtual machine’s event log for details.” One problem with that, I couldn’t log into the system. The snapshot was far enough along to freeze the IO, so I had to jump into CLI and kill the task.

To kill the task for a VM, jump into the CLI for the host (in this case it was through the iDRAC and local terminal… bad, I know) and run a: ps | grep vmx command to see all the processes while searching for vmx’s
Login & Command

Locate the Parent Process ID (the second column) for the hung VM, and run: kill *parent process ID* to end the process. In this case, it was: kill 465724
***DISCLAIMER: Be very careful doing this, if you don’t kill the proper process, it can do harm to your ESXi host***
Kill Parent Process

Instantly, the task remaining in progress should change to have a status of “The attempted operation cannot be performed in the current state (Powered off).”
New Status

Now, it’s time to check out that error. In this case I received a “Volume Shadow Copy Service error: Unexpected error DeviceIoControl” with the rest of the error seemingly pointing at the generic floppy drive. I know this is the error because it’s pointing dead at the VMware Snapshot Provider and in a state of “DoSnapshotSet”
Windows Error

That’s incredibly weird. So I uninstall both the floppy drive and it’s controller, also remove it from the VM’s settings while it’s powered off. I boot the system back up, the floppy drive has reinstalled itself. Very odd, so I just uninstall the drive and then disable the controller.
Device Manager

So it’s snapshot time again, right? Well, not really. I retried the snapshot, it freezes again. Time for some googling after I kill the parent process for the VM again.

What I came up with is that there is apparently something with Windows Server 2008 R2 systems having SQL 2008. This was a frequent topic over many VMware Communities posts, however no one really every had an answer on what was going on internal to the VM which would cause this problem. I know we have 4 or 5 SQL servers running and this is the first system we’ve run into this problem on.

Anyways, the best workaround I found was to disable the disk.EnableUUID parameter on the VM. Please note that by disabling this, you effectively disable VSS for the snapshot (ie. no quiescing). So I maintain that this is only a workaround and not yet a true solution

To do this, shut down the VM. Right click, Edit Settings, hit the Options tab, and click on “Configuration Parameters”
Config Parameters

In the Configuration Parameter pop-up screen, look for the “disk.EnableUUID” setting and change the value to read “false”
Enable UUID False

Click OK a couple times and boot the system up. Once it’s booted up, try giving it a snapshot while checking the option for “Quiesce guest file system”. This time, everything was successful. I ran the test and I also had the vDR appliance run another snapshot to get that one up to date

Hopefully I can do some more research and turn up some better answers, and at worst I’ll create a support ticket and see if VMware Support can point me in a better direction

Published inVMware

8 Comments

  1. I’ve seen this kind of behaviour when iSCSI RDM’s are mapped via the software initiator from within windows, which is surprisingly common in SQL server environments. This configuration isnt supported by Vmware, and most VM admins I know are aware of this limitation, yet this sometimes doesnt get communicated down to the SQL admins who have been known to map their own iSCSI LUNS without actually telling the VM admin.

    Hard to say if this is what you’re seeing, but I thought it was worth mentioning ..

    Regards
    John

    • Problem Problem

      Unfortunately, that isn’t the problem I’m having here. All the disks are attached with VMDKs.
      I do appreciate you bringing it up though, definitely something that could be causing some chaos.

  2. Kristof De Mey Kristof De Mey

    Did you try disabling installation of the device via gpo ? You can add any device to the list so it does not get installed. We are having the same issue as we speak, i’m going to try adding it to the list. I wont be able to reboot before tomorrow.. If this fails i’ll definetly try your solution (or workaround;) )

    • Kyle Ruddy Kyle Ruddy

      The floppy?
      No, I have not tried that.

      Another idea I was given by doing some other research was to uninstall VMware tools, reboot, install VMware tools, reboot and try it again.

      It worked on one system, so I have an outage scheduled for this weekend for another system. However, I’m seeing that snapshots work (both quiesced and not quiesced) but the snapshots which are done by vDR do not.

      There’s no information in the logs regarding the problem from either the VM side or the ESXi host side.

  3. Kristof De Mey Kristof De Mey

    Adding the floppy to the local GPO in win 2K8 did the trick for me. After reboot the floppy did not get installed again.

    you can do this by opening an mmc, add then gpo snapinn, go to Local computer policy,Computer Configuration,administrative templates, system,device installation, device installation restrictions, Prevent installation of devices that match any of these device id’s. Device id’s can be found via Device mgmt.

    I must say that the snapshot issue was not the main issue here, it was rather the issues with thevolume shadow copy serverice wining about the floppy i needed to solve, but since this was also on win2K8r2 and sql 2008 on vmware esx ….

    Your hint pointed me in the right direction, thank you for that!

  4. how can set the disk.enableUUID to false without shut down the virtual machine.

  5. User User

    Beware that there is also a VMWare BIOS setting to disable the floppy. Just removing the device doesn’t set it to false.

  6. Pruthvi Pruthvi

    did any one face this issue on a linux DB server

Leave a Reply