Yesterday I got an alert that a virtual machine needed consolidation. I opened my vSphere client and found it was in fact running on -000017.vmdk and looking at the directory I saw all of the snapshots. However, when I opened Snapshot Manager I saw no snapshots. My infrastructure uses Veeam B&R to run backups and traditionally if there is a snapshot on a VM the backup will fail. In my case, perhaps because VMware only kind of knew about it, it had been working. From the looks of it when Veeam talked with VMware’s API to consolidate it wasn’t happening properly. To delete all of the snapshots and consolidate down to one flat disk I found VMware KB article 1002310 that stated this could be resolved by the GUI or CLI by taking a new snapshot and deleting all snapshots. If that didn’t work it suggested that I quiesce the guest file system when I take a snapshot and try again. Well, this plan didn’t do the trick for me. I tried it with the VM off as suggested and that also did not work. Fortunately this was a VM that could be off without user impact. The next suggestion was cloning the VM and using the clone and decommissioning the VM with the snapshot problem. The last suggested “workaround” in the KB was to use vCenter Converter Standalone to essentially treat it like a P2V because it would only see the disks from the guest OS perspective. I wasn’t giddy about doing that. I decided to go with the clone method and it worked well for me. So, I thought I would share this KB with my blog crew. Be sure to consider data changes during the clone and take precautions such as disabling the network or shutting down the VM. For most production VMs this would probably need to be done during maintenance hours. I hope this is helpful for someone, but I imagine you may make it to the KB article just fine on your own. This client was using Veeam 6.5 and I upgraded them to Veeam 7.0 R2 after this issue and it worked properly last night.
The KB that shares all of this with details: Committing snapshots when there are no snapshot entries in the Snapshot Manager (1002310)
If you have a VMware VM that is running on a snapshot and the Snapshot Manager doesn’t see them you have a few ways to fix this.
- Create a new snapshot (check the Quiesce quest file system) and then Delete all snapshots. If this doesn’t work while the VM is running try with the VM turned off if possible.
- Create a clone of the VM and use the cloned VM and decommission the one with the snapshot problems.
- Run a P2V on the VM with the vCenter Standalone Converter and use the converted VM and decommission the one with snapshot issues
Other helpful KB articles when troubleshooting locks and snapshot issues
- Investigating virtual machine file locks on ESXi/ESX (10051)
- Determining if there are leftover delta files or snapshots that VMware vSphere or Infrastructure Client cannot detect (1005049)
- Commands to monitor snapshot deletion in ESX 2.5/3.x/4.x and ESXi 3.x/4.x/5.x (1007566)
Other helpful recommendations for keeping a handle on snapshots
First, vCenter should be set to send you notifications of alerts. Secondly, I recommend adding your own custom alarm for a VM that is running on a snapshot too long. Snapshots are meant to be brief and as I mentioned above having a leftover snapshot can even cause backups to fail. I have heard of VMs running on hundreds of snapshots and you can imagine what a mess that would be to deal with. The maximum supported snapshots in a chain is 32, but it is recommended to not have more than 2-3 snapshots. Depending on the IO of your VM consolidations can be painful if you let it go to long. So, create an alarm to notify you when a VM is running on a large snapshot. This will notify you if you just forget to consolidate a snapshot. See this helpful KB on Configuring VMware vCenter Server to send alarms when virtual machines are running from snapshots (1018029). I recommend creating a rule at a high level to alert if a VM runs on a large snapshot for your environment so that you will be notified of that. This is tricky to determine the threshold, but considering backups happen on off-hours and IO should be relatively low I would start with a lower threshold. Every environment is different, but I will set mine as low as 1GB at times.