How to Clean Up After a Failed Hyper-V Checkpoint
Hyper-V’s checkpointing system typically does a perfect job of coordinating all its moving parts. However, it sometimes fails to completely clean up afterward. That can cause parts of a checkpoint, often called “lingering checkpoints”, to remain. You can easily take care of these leftover bits, but you must proceed with caution. A misstep can cause a full failure that will require you to rebuild your virtual machine. Read on to find out how to clean up after a failed checkpoint.
Avoid Mistakes When Cleaning up a Hyper-V Checkpoint
The most common mistake is starting your repair attempt by manually merging the AVHDX file into its parent. If you do that, then you cannot use any of Hyper-V’s tools to clean up. You will have no further option except to recreate the virtual machine’s files. The “A” in “AVHDX” stands for “automatic”. An AVHDX file is only one part of a checkpoint. A manual file merge violates the overall integrity of a checkpoint and renders it unusable. A manual merge of the AVHDX files should almost be the last thing that you try.
Also, do not start off by deleting the virtual machine. That may or may not trigger a cleanup of AVHDX files. Don’t take the gamble.
Before you try anything, check your backup application. If it is in the middle of a backup or indicates that it needs attention from you, get through all of that first. Interrupting a backup can cause all sorts of problems.
How to Cleanup a Failed Hyper-V Checkpoint
We have multiple options to try, from simple and safe to difficult and dangerous. Start with the easy things first and only try something harder if that doesn’t work.
Method 1: Delete the Checkpoint
If you can, right-click the checkpoint in Hyper-V Manager and use the Delete Checkpoint or Delete Checkpoint Subtree option:
This usually does not work on lingering checkpoints, but it never hurts to try.
Sometimes the checkpoint does not present a Delete option in Hyper-V Manager.
Sometimes, the checkpoint doesn’t even appear.
In any of these situations, PowerShell can usually see and manipulate the checkpoint.
Easiest way:
1
|
Remove-VMCheckpoint -VMName demovm
|
You can remove all checkpoints on a host at once:
1
|
Remove-VMCheckpoint -VMName *
|
If the script completes without error, you can verify in Hyper-V Manager that it successfully removed all checkpoints. You can also use PowerShell:
1
|
Get-VMCheckpoint
|
This clears up the majority of leftover checkpoints.
Method 2: Create a New Checkpoint and Delete It
Everyone has had one of those toilets that won’t stop running. Sometimes, you get lucky, and you just need to jiggle the handle to remind the mechanism that it needs to drop the flapper ALL the way over the hole. Method 3 is something of a “jiggle the handle” fix. We just tap Hyper-V’s checkpointing system on the shoulder and remind it what to do.
In the Hyper-V Manager interface, right-click on the virtual machine (not a checkpoint), and click Checkpoint:
Now, at the root of all of the VM’s checkpoints, right-click on the topmost and click Delete checkpoint subtree:
If this option does not appear, then our “jiggle the handle” fix won’t work. Try to delete the checkpoint that you just made, if possible.
The equivalent PowerShell is Checkpoint-VM -VMName demovm followed by Remove-VMCheckpoint -VMName demovm.
Regroup Before Proceeding
I do not know how pass-through disks or vSANs affect these processes. If you have any and the above didn’t work, I recommend shutting the VM down, disconnecting those devices, and starting the preceding steps over. You can reconnect your devices afterward.
If your checkpoint persists after trying the above, then you now face some potentially difficult choices. If you can, I would first try shutting down the virtual machine, restarting the Hyper-V Virtual Machine Management service, and trying the above steps while the VM stays off. This is a bit more involved “jiggle the handle” type of fix, but it’s also easy. If you want to take a really long shot, you can also restart the host. I do not expect that to have any effect, but I have not yet seen everything.
Take a Backup!
Up to this point, we have followed non-destructive procedures. The remaining fixes involve potential data loss. If possible, back up your virtual machine. Unfortunately, you might only have this problem because of a failed backup. In that case, export the virtual machine. I would personally shut the VM down beforehand so as to only capture the most recent data.
If you have a good backup or an export, then you cannot lose anything else except time.
Method 3: Reload the Virtual Machine’s Configuration
This method presents a moderate risk of data loss. It is easy to make a mistake. Check your backup! This is a more involved “jiggle the handle” type of fix.
Procedure:
- Shut the VM down
- Take note of the virtual machine’s configuration file location, its virtual disk file names and locations, and the virtual controller positions that connect them (IDE 1 position 0, SCSI 2 position 12, etc.)
- On each virtual disk, follow the AVHDX tree, recording each file name, until you find the parent VHDX. In Hyper-V Manager, do this with the Inspect button on the VM’s disk sheet, then the Inspect Parent on each subsequent dialog box that opens.
- Modify the virtual machine to remove all of its hard disks. If the virtual machine is clustered, you’ll need to do this in Failover Cluster Manager (or PowerShell). It will prompt to create a checkpoint, but since you already tried that, I would skip it.
- Export the virtual machine configuration
- Delete the virtual machine. If the VM is clustered, record any special clustering properties (like Preferred Hosts), and delete it from Failover Cluster Manager.
- Import the virtual machine configuration from step 5 into the location you recorded in step 3. When prompted, choose the Restore option.
- This will bring back the VM with its checkpoints. Start at method 1 and try to clean them up.
- Reattach the VHDX. If, for some reason, the checkpoint process did not merge the disks, do that manually first. If you need instructions, look at the section after the final method.
- Re-establish clustering, if applicable.
We use this method to give Hyper-V one final chance to rethink the error of its ways. After this, we start invoking manual processes.
Method 4: Restore the VM Configuration and Manually Merge the Disks
For this one to work, you need a single good backup of the virtual machine. It does not need to be recent. We only care about its configuration. This process has a somewhat greater level of risk as method 4. Once we introduce the manual merge process, the odds of human error increase dramatically.
- Follow steps 1, 2, and 3 from method 3 (turn VM off and record configuration information). If you are not certain about the state of your backup, follow steps 5 and 6 (export and delete the VM). If you have confidence in your backup, or if you already followed step 4 and still have the export, then you can skip step 5 (export the VM).
- Manually merge the VM’s virtual hard disk(s) (see the section after the methods for directions). Move the final VHDX(s) to a safe location. It can be temporary.
- Restore the virtual machine from backup. I don’t think that I’ve ever seen a Hyper-V backup application that will allow you to only restore the virtual machine configuration files, but if one exists and you happen to have it, use that feature.
- Follow whatever steps your backup application needs to make the restored VM usable. For instance, Altaro VM Backup for Hyper-V restores your VM as a clone with a different name and in a different location unless you override the defaults.
- Remove the restored virtual disks from the VM (see step 4 of Method 3). Then, delete the restored virtual hard disk file(s) (they’re older and perfectly safe on backup).
- Copy or move the merged VHDX file from step 2 back to its original location.
- On the virtual machine’s Settings dialog, add the VHDX(s) back to the controllers and locations that you recorded in step 1.
. - Check on any supporting tools that identify VMs by ID instead of name (like backup). Rejoin the cluster, if applicable.
This particular method can be time-consuming since it involves restoring virtual disks that you don’t intend to keep. As a tradeoff, it retains the major configuration data of the virtual machine. Altaro VM Backup for Hyper-V will use a different VM ID from the original to prevent collisions, but it retains all of the VM’s hardware IDs and other identifiers such as the BIOS GUID. I assume that other Hyper-V backup tools exhibit similar behavior. Keeping hardware IDs means that your applications that use them for licensing purposes will not trigger an activation event after you follow this method.
Method 5: Rebuild the VM’s Configuration and Manually Merge the Disks
If you’ve gotten to this point, then you have reached the “nuclear option”. The risk of data loss is about the same as method 5. This process is faster to perform but has a lot of side effects that will almost certainly require more post-recovery action on your part.
- Access the VM’s settings page and record every detail that you can from every property sheet. That means CPU, memory, network, disk, file location settings… everything. You definitely must gather the VHDX/AVHDX connection and parent-child-grandchild (etc.) order (method 3, step 3). If your organization utilizes special BIOSGUID settings and other advanced VM properties, then record those as well. I assume that if such fields are important to you that you already know how to retrieve them. If not, you can use my free tool.
- Check your backups and/or make an export.
- Delete the virtual machine (Method 3 step 6 has a screenshot, mind the note about failover clustering as well).
- Recreate the virtual machine from the data that you collected in step 1, with the exception of the virtual hard disk files. Leave those unconnected for now.
- Follow the steps in the next section to merge the AVHDX files into the root VHDX
- Connect the VHDX files to the locations that you noted in step 1 (Method 5 step 7 has a screenshot).
- Check on any supporting tools that identify VMs by ID instead of name (like backup). Rejoin the cluster, if applicable.
- In the VM’s guest operating system, check for and deal with any problems that arise from changing all of the hardware IDs.
Since you don’t have to perform a restore operation, it takes less time to get to the end of this method than method 5. Unfortunately, swapping out all of your hardware IDs can have negative impacts. Windows will need to activate again, and it will not re-use the previous licensing instance. Other software may react similarly, or worse.
How to Manually Merge AVHDX Files
I put this part of the article near the end for a reason. I cannot over-emphasize that you should not start here.
Prerequisites for Merging AVHDX Files
If you precisely followed one of the methods above that redirected you here, then you already satisfied these requirements. Go over them again anyway. If you do not perform your merges in precisely the correct order, you will permanently orphan data.
- Merge the files in their original location. I had you merge the files before moving or copying them for a reason. Each differencing disk (the AVHDXs) contains the FULL path of their parent. If you relocate them, they will throw errors when you attempt to merge them. If you can’t get them back to their original location, then read below for steps on updating each of the files.
- You will have the best results if you merge the files in the order that they were created. A differencing disk knows about its parent, but no parent virtual disk file knows about its children. If you merge them out of order, you can correct it — with some effort. But, if any virtual hard disk file changes while it has children, you will have no way to recover the data in those children.
If merged in the original location and in the correct order, AVHDX merging poses no risks.
Manual AVHDX Merge Process in PowerShell
I recommend that you perform merges with PowerShell because you can do it more quickly. Starting with the AVHDX that the virtual machine used as its active disk, issue the following command:
1
|
Merge-VHD -Path ‘C:\LocalVMs\demovm\Virtual Hard Disks\demo-data_8EFF0E79-2711-4115-A704-45046FE6C536.avhdx’
|
Once that finishes, move to the next file in your list. Use tab completion! Double-check the file names from your list!
Once you have nothing left but the root VHDX, you can attach it to the virtual machine.
Manual AVHDX Merge Process in Hyper-V Manager
Hyper-V Manager has a wizard for merging differencing disks. If you have more than a couple of disks to merge, you will find this process tedious.
- In Hyper-V Manager, click Edit disk in the far right pane.
- Click Next on the wizard’s intro page if it appears.
- Browse to the last AVHDX file in the chain.
- Choose the Merge option and click Next.
- Choose to merge directly to the parent disk and click Next.
- Click Finish on the last screen.
- Repeat until you only have the root VHDX left. Reattach it to the VM.
Fixing Parent Problems with AVHDX Files
In this section, I will show you how to correct invalid parent chains. If you have merged virtual disk files in the incorrect order or moved them out of their original location, you can correct it.
1
|
Set-VHD -Path c:\temp\demo-data_8eff0e79-2711-4115-a704-45046fe6c536.avhdx -ParentPath c:\temp\demo-data_535bbf6b-5190-4383-ae19-ab7a6d44b2eb.avhdx
|
The above cmdlet will work if the disk files have moved from their original locations. If you had a disk chain of A->B->C and merged B into A, then you can use the above to set the parent of C to A, provided that nothing else happened to A in the interim.
The virtual disk system uses IDs to track valid parentage. If a child does not match to a parent, you will get the following error:
1
2
|
Set–VHD : Failed to set new parent for the virtual disk.
There exists ID mismatch between the differencing virtual hard disk and the parent disk.
|
You could use the IgnoreIdMismatch switch to ignore this message, but a merge operation will almost certainly cause damage.
Alternatively, if you go through the Edit Disk wizard as shown in the manual merge instructions above, then at step 4, you can sometimes choose to reconnect the disk. Sometimes though, the GUI crashes. I would not use this tool.
Errors Encountered on AVHDX Files with an Invalid Parent
The errors that you get when you have an AVHDX with an invalid parent usually do not help you reach that conclusion.
In PowerShell:
1
2
3
4
5
|
Merge-VHD : Failed to merge the virtual disk.
The system failed to merge ‘c:\temp\demo-data_8EFF0E79-2711-4115-A704-45046FE6C536.avhdx’.
Failed to open virtual disk ‘c:\temp\demo-data_8EFF0E79-2711-4115-A704-45046FE6C536.avhdx’ because a problem was
encountered opening a virtual hard disk in the chain of differencing disks, ”: ‘The system cannot find the file
specified.’ (0x80070002).
|
Because it lists the child AVHDX in both locations, along with an empty string where the parent name should appear, it might seem that the child file has the problem.
In Hyper-V Manager, you will get an error about “one of the command line parameters”. It will follow that up with a really unhelpful “Property ‘MaxInternalSize’ does not exist in class ‘Msvm_VirtualHardDiskSettingData’. All of this just means that it can’t find the parent disk.
Use Set-VHD as shown above to correct these errors.
Other Checkpoint Cleanup Work
Checkpoints involve more than AVHDX files. Checkpoints also grab the VM configuration and sometimes its memory contents. To root these out, look for folders and files whose names contain GUIDs that do not belong to the VM or any surviving checkpoint. You can safely delete them all. If you do not feel comfortable doing this, then use Storage Migration to move the VM elsewhere. It will only move active files. You can safely delete any files that remain.
What Causes Checkpoints to Linger?
I do not know that anyone has ever determined the central cause of this problem. We do know that Hyper-V-aware backups will trigger Hyper-V’s checkpointing mechanism to create a special backup checkpoint. Once the program notifies VSS that the backup has completed, it should automatically merge the checkpoint. Look in the event viewer for any clues as to why that didn’t happen.